Skip to main content

What is the Map API?

The Map API is like a website explorer - it discovers all URLs on a domain and returns them with their page titles. Think of it as creating a sitemap or table of contents for any website.

URL Discovery

Find all pages on a website automatically

Site Structure

Understand website hierarchy and organization
Perfect for: Planning bulk scrapes, building sitemaps, content audits, SEO analysis, and website documentation.
Lightweight & Fast: Returns only URLs and titles - no content extraction. This makes it 10x faster than scraping each page individually.

Why Use Map Before Crawling?

1

Map: Discover URLs

Use Map API to find all pagesCost: $0.002 (one-time)Speed: 1-5 seconds
2

Filter URLs

Pick which pages you actually needFilter by keyword, path, or pattern
3

Crawl: Get Content

Use Crawl API on selected pages onlyCost: $0.001 per pageSave money: Only crawl what you need!
Smart workflow: Map first (0.002)FilterCrawlselectedpages(0.002) → Filter → Crawl selected pages (0.001 each). This saves money and time!

Pricing (Super Affordable)

Flat Fee Per Site

$0.002 per map request= $2 for 1,000 sites mapped
Cost is the same whether you discover 10 URLs or 5,000 URLs. It’s a flat fee per website, not per URL discovered.

Before You Start

Authentication

All requests require your API key in the Authorization header:
Authorization: Bearer YOUR_LLMLAYER_API_KEY
Keep your API key secure! Never expose it in client-side code. Always call from your backend.

Your First Map (2-Minute Start)

Let’s discover all pages on a website!
import { LLMLayerClient } from 'llmlayer';

// 1. Create a client
const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

// 2. Map a website
const response = await client.map({
  url: 'https://example.com'
});

// 3. See all discovered URLs
console.log(`Found ${response.links.length} pages:\n`);

for (const link of response.links) {
  console.log(`📄 ${link.title}`);
  console.log(`   ${link.url}\n`);
}

console.log(`Cost: $${response.cost}`);  // $0.002
Done! You just discovered all pages on a website in seconds. Now you know exactly what’s there before crawling or scraping.

How It Works

The Map API discovers URLs using multiple strategies:
1

Check Sitemap

First looks for sitemap.xml (fastest method)
2

Crawl Links

If no sitemap, crawls the site following links
3

Extract Titles

Gets page titles without downloading full content
4

Return List

Returns complete URL list with titles
Smart discovery: The API automatically chooses the best method. If a sitemap exists, it uses that (super fast). Otherwise, it crawls to find pages.

Basic Usage

Map a Website (All Pages)

Discover all pages on a domain.
const response = await client.map({
  url: 'https://docs.example.com'
});

console.log(`Discovered ${response.links.length} pages`);

// Show first 5 pages
response.links.slice(0, 5).forEach(link => {
  console.log(`${link.title}${link.url}`);
});

Advanced Options

Filter URLs by Keyword

Only discover URLs containing specific keywords.
// Only find blog posts
const response = await client.map({
  url: 'https://example.com',
  search: 'blog'  // Only URLs containing "blog"
});

console.log(`Found ${response.links.length} blog pages`);

for (const link of response.links) {
  console.log(`${link.title}${link.url}`);
}
Use search to narrow results:
  • search: "blog" - Find all blog posts
  • search: "docs" - Find all documentation
  • search: "api" - Find API-related pages
  • search: "2024" - Find pages from 2024

Include Subdomains

Discover URLs across all subdomains (blog.example.com, docs.example.com, etc.)
const response = await client.map({
  url: 'https://example.com',
  includeSubdomains: true  // Include blog.*, docs.*, api.*, etc.
});

console.log(`Found ${response.links.length} pages across all subdomains`);

// Group by subdomain
const bySubdomain = new Map();

for (const link of response.links) {
  const hostname = new URL(link.url).hostname;
  if (!bySubdomain.has(hostname)) {
    bySubdomain.set(hostname, []);
  }
  bySubdomain.get(hostname).push(link);
}

// Show breakdown
for (const [subdomain, links] of bySubdomain) {
  console.log(`\n${subdomain}: ${links.length} pages`);
}

Ignore Sitemap (Force Crawling)

Force the API to crawl instead of using sitemap.xml.
const response = await client.map({
  url: 'https://example.com',
  ignoreSitemap: true  // Don't use sitemap.xml, crawl instead
});

console.log(`Crawled and found ${response.links.length} pages`);
When to use this:
  • Sitemap is outdated or incomplete
  • You want to discover hidden pages
  • Testing actual site structure vs sitemap

Set URL Limit

Limit the number of URLs discovered (default: 5000).
const response = await client.map({
  url: 'https://large-site.com',
  limit: 100  // Stop after finding 100 URLs
});

console.log(`Limited to ${response.links.length} pages`);
Use limit to:
  • Sample large websites quickly
  • Test before full mapping
  • Control response size

Set Timeout

Control how long to wait (default: 15000ms / 15 seconds).
const response = await client.map({
  url: 'https://example.com',
  timeoutMs: 30000  // Wait up to 30 seconds
});

console.log(`Found ${response.links.length} pages`);

Real-World Examples

Example 1: Plan a Bulk Scrape

Map a site, filter URLs, then scrape only what you need.
import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

async function planScrape(baseUrl: string, keyword: string) {
  console.log(`🗺️  Mapping ${baseUrl}...\n`);

  // 1. Map the site
  const mapResponse = await client.map({
    url: baseUrl,
    search: keyword  // Filter by keyword
  });

  console.log(`✅ Found ${mapResponse.links.length} matching pages`);
  console.log(`💰 Map cost: $${mapResponse.cost}\n`);

  // 2. Review what you found
  console.log('Pages to scrape:');
  mapResponse.links.forEach((link, i) => {
    console.log(`${i + 1}. ${link.title}`);
    console.log(`   ${link.url}`);
  });

  // 3. Calculate scraping cost
  const scrapeCost = mapResponse.links.length * 0.002;
  const totalCost = mapResponse.cost + scrapeCost;

  console.log(`\n💰 Cost Estimate:`);
  console.log(`   Map: $${mapResponse.cost}`);
  console.log(`   Scrape ${mapResponse.links.length} pages: $${scrapeCost.toFixed(3)}`);
  console.log(`   Total: $${totalCost.toFixed(3)}`);

  return mapResponse.links;
}

// Use it
const pages = await planScrape('https://docs.example.com', 'api');

// Now scrape only the pages you want
console.log('\n📥 Scraping selected pages...');
for (const page of pages.slice(0, 5)) {  // Scrape first 5
  const content = await client.scrape({
    url: page.url,
    formats: ['markdown']
  });
  console.log(`✅ Scraped: ${page.title}`);
}
Output:
🗺️  Mapping https://docs.example.com...

✅ Found 23 matching pages
💰 Map cost: $0.002

Pages to scrape:
1. API Reference
   https://docs.example.com/api/reference
2. API Authentication
   https://docs.example.com/api/auth
...

💰 Cost Estimate:
   Map: $0.002
   Scrape 23 pages: $0.023
   Total: $0.025

📥 Scraping selected pages...
✅ Scraped: API Reference
✅ Scraped: API Authentication
...

Example 2: Build a Sitemap

Generate a sitemap from any website.
import fs from 'fs';

async function generateSitemap(url: string) {
  console.log(`🗺️  Mapping ${url}...\n`);

  const response = await client.map({ url });

  console.log(`✅ Found ${response.links.length} pages\n`);

  // Create XML sitemap
  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${response.links.map(link => `  <url>
    <loc>${link.url}</loc>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>`).join('\n')}
</urlset>`;

  // Save to file
  fs.writeFileSync('sitemap.xml', sitemap);

  console.log('✅ Saved sitemap.xml');
  console.log(`📊 ${response.links.length} URLs included`);
  console.log(`💰 Cost: $${response.cost}`);

  return sitemap;
}

await generateSitemap('https://example.com');

Example 3: Content Audit Tool

Analyze website structure and find pages by category.
async function auditWebsite(url: string) {
  console.log(`🔍 Auditing ${url}...\n`);

  const response = await client.map({ url });

  console.log(`Total pages: ${response.links.length}\n`);

  // Categorize by path
  const categories = {
    blog: [],
    docs: [],
    api: [],
    products: [],
    other: []
  };

  for (const link of response.links) {
    const path = new URL(link.url).pathname.toLowerCase();

    if (path.includes('/blog')) categories.blog.push(link);
    else if (path.includes('/doc')) categories.docs.push(link);
    else if (path.includes('/api')) categories.api.push(link);
    else if (path.includes('/product')) categories.products.push(link);
    else categories.other.push(link);
  }

  // Report
  console.log('📊 Content Breakdown:');
  console.log(`   Blog posts: ${categories.blog.length}`);
  console.log(`   Documentation: ${categories.docs.length}`);
  console.log(`   API pages: ${categories.api.length}`);
  console.log(`   Products: ${categories.products.length}`);
  console.log(`   Other: ${categories.other.length}`);

  // Find pages without titles
  const noTitle = response.links.filter(l => !l.title || l.title.trim() === '');
  if (noTitle.length > 0) {
    console.log(`\n⚠️  ${noTitle.length} pages missing titles:`);
    noTitle.slice(0, 5).forEach(l => console.log(`   - ${l.url}`));
  }

  // Longest/shortest paths
  const sorted = [...response.links].sort((a, b) =>
    new URL(a.url).pathname.length - new URL(b.url).pathname.length
  );

  console.log('\n📏 URL Analysis:');
  console.log(`   Shortest: ${new URL(sorted[0].url).pathname}`);
  console.log(`   Longest: ${new URL(sorted[sorted.length - 1].url).pathname}`);

  console.log(`\n💰 Audit cost: $${response.cost}`);

  return categories;
}

const audit = await auditWebsite('https://example.com');

Request Parameters (Complete Reference)

Endpoint: POST /api/v2/map

Required Parameters

url
string
required
The website URL to map. Must be a valid HTTP(S) URL.Examples:
  • https://example.com
  • https://docs.example.com
  • example.com (missing protocol)

Optional Parameters

ignoreSitemap
boolean
default:"false"
Skip sitemap.xml and force crawling to discover URLs.Use when:
  • Sitemap is outdated
  • You want actual site structure
  • Testing completeness
ignoreSitemap: true  // Force crawling
includeSubdomains
boolean
default:"false"
Include URLs from all subdomains (blog., docs., api.*, etc.)
includeSubdomains: true  // Map blog.example.com, docs.example.com, etc.
Filter discovered URLs by keyword. Only returns URLs containing this string.Examples:
search: "blog"     // Only blog URLs
search: "api"      // Only API URLs
search: "2024"     // Only URLs with "2024"
limit
integer
default:"5000"
Maximum number of URLs to discover. Stops after reaching this limit.Range: 1 - 5000
limit: 100  // Stop after 100 URLs
timeout
integer
default:"15000"
Timeout in milliseconds (how long to wait).Default: 15000 (15 seconds)
timeout: 30000  // 30 seconds
This is the operation timeout, not the HTTP request timeout. It controls how long the mapping operation runs.

Response Format

Response Structure

{
  "links": [
    {
      "url": "https://example.com/",
      "title": "Home Page"
    },
    {
      "url": "https://example.com/about",
      "title": "About Us"
    },
    {
      "url": "https://example.com/products",
      "title": "Products"
    }
  ],
  "statusCode": 200,
  "cost": 0.002
}

Response Fields

Array of discovered URLs with titles.Each link object contains:
  • url (string): Full URL
  • title (string): Page title
[
  {"url": "https://...", "title": "Page Title"},
  {"url": "https://...", "title": "Another Page"}
]
statusCode
integer
HTTP status code (200 for success)
200
cost
number
Cost in USD ($0.001 per request)
0.001

Error Handling

Error Format

All errors use this structure:
{
  "detail": {
    "error_type": "map_error",
    "error_code": "map_failed",
    "message": "Failed to map the provided URL",
    "details": {
      "url": "https://example.com",
      "error": "Connection timeout"
    }
  }
}

Common Errors

Missing or invalid API key
{
  "error_code": "missing_llmlayer_api_key",
  "message": "Provide LLMLayer API key via 'Authorization: Bearer <token>'"
}
Fix: Add your API key to the Authorization header.
Invalid or malformed URL
{
  "error_code": "invalid_url",
  "message": "The provided URL is not valid"
}
Fix: Ensure URL includes protocol (https://) and is properly formatted.
Failed to map the website
{
  "error_code": "map_failed",
  "message": "Failed to map the provided URL"
}
Common causes:
  • Website is down
  • Website blocks crawlers
  • Network timeout
  • Invalid site structure
Fix: Retry or check if the website is accessible.
Operation timeout
{
  "error_code": "map_timeout",
  "message": "Mapping operation timed out"
}
Fix: Increase timeout or try mapping with a limit.

Robust Error Handling

import {
  LLMLayerClient,
  AuthenticationError,
  InvalidRequest,
  InternalServerError
} from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

async function robustMap(url: string, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.map({ url });

    } catch (error) {
      // Don't retry auth errors
      if (error instanceof AuthenticationError) {
        console.error('❌ Fix your API key');
        throw error;
      }

      // Don't retry invalid URLs
      if (error instanceof InvalidRequest) {
        console.error('❌ Invalid URL:', url);
        throw error;
      }

      // Retry server errors
      if (error instanceof InternalServerError) {
        const waitTime = Math.pow(2, attempt) * 1000;
        console.log(`⏳ Map failed. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));

        if (attempt === maxRetries - 1) {
          console.error('❌ Max retries exceeded');
          throw error;
        }
        continue;
      }

      throw error;
    }
  }
}

// Usage
try {
  const response = await robustMap('https://example.com');
  console.log(`✅ Mapped ${response.links.length} pages`);
} catch (error) {
  console.error('Map failed:', error);
}

Best Practices

💰 Cost Optimization

Use Map before Crawl
  • Map entire site once ($0.001)
  • Filter URLs to only what you need
  • Crawl selected pages ($0.001 each)
  • Save money on unnecessary scraping
Cache results
  • Site structures don’t change often
  • Cache map results for hours/days
  • Re-map only when site updates

⚡ Performance Tips

Use appropriate limits
  • Don’t map 5000 URLs if you only need 50
  • Set limit to control discovery
  • Use search to filter early
Choose right strategy
  • Default (sitemap): Fastest
  • ignoreSitemap: true: More complete
  • includeSubdomains: Comprehensive

✨ Better Results

Filter effectively
  • Use search parameter to narrow results
  • Filter by path, keyword, or pattern
  • Process results programmatically
Understand the data
  • Check links.length before processing
  • Validate URLs before scraping
  • Group by subdomain or path

🛡️ Reliability

Handle errors gracefully
  • Some sites block mapping
  • Network issues happen
  • Implement retry logic
Validate before use
  • Check statusCode === 200
  • Verify links is not empty
  • Handle partial results
Monitor usage
  • Track discovered URL counts
  • Log failed mappings
  • Alert on anomalies

Important Limitations

Cannot map:
  • Sites that require authentication
  • Sites with aggressive bot detection
  • Sites that block crawlers in robots.txt
  • Private or internal networks
Maximum limits:
  • Up to 5,000 URLs per request
  • 15-second default timeout (configurable)
  • Flat fee regardless of URLs found
What works best:
  • Public websites with sitemaps
  • Documentation sites
  • Blogs and news sites
  • E-commerce product catalogs
  • Company websites

Quick Tips

Planning to scrape? Always map first:
// 1. Map to discover (cheap)
const map = await client.map({ url });
// 2. Filter what you need
const filtered = map.links.filter(l => l.url.includes('/blog'));
// 3. Scrape only those (saves money!)
Large site? Use search to narrow results:
search: "docs"  // Only documentation pages
Need subdomains too? Enable the option:
includeSubdomains: true  // Gets blog.*, docs.*, api.*
Sitemap unreliable? Force crawling:
ignoreSitemap: true  // Discover actual site structure

Frequently Asked Questions

Map API:
  • Discovers URLs only (no content)
  • Returns titles but no page content
  • Super fast (1-5 seconds)
  • Very cheap ($0.002)
  • Use for: Discovery, planning, sitemaps
Crawl API:
  • Gets full content from each page
  • Returns markdown, HTML, PDF, screenshots
  • Slower (depends on page count)
  • More expensive ($0.001 per page)
  • Use for: Content extraction, archiving
Best workflow: Map → Filter → Crawl selected pages
The API uses multiple strategies:
  1. Sitemap first (fastest)
    • Checks for /sitemap.xml
    • Uses sitemap index if available
    • Parses all URLs from sitemap
  2. Crawling fallback (if no sitemap or ignoreSitemap: true)
    • Starts from homepage
    • Follows internal links
    • Discovers pages organically
  3. Title extraction
    • Fetches page metadata only
    • Gets title without full content
    • Much faster than scraping
The limit parameter is the total number of URLs returned, not per subdomain.
limit: 100  // Returns maximum 100 URLs total
If you enable includeSubdomains, those 100 URLs can come from any subdomain.
Yes! Just provide the subdomain URL:
await client.map({
  url: 'https://docs.example.com'  // Only docs subdomain
});
This will only map that specific subdomain, not the main domain or other subdomains.
Use ignoreSitemap: true to force actual crawling:
await client.map({
  url: 'https://example.com',
  ignoreSitemap: true  // Skip sitemap, crawl instead
});
This discovers the actual site structure by following links, which may find pages not in the sitemap.
Process the links array with JavaScript/Python:
const response = await client.map({ url });

// Filter by path
const blogPosts = response.links.filter(l =>
  l.url.includes('/blog/')
);

// Filter by title
const apiDocs = response.links.filter(l =>
  l.title.toLowerCase().includes('api')
);

// Filter by date in URL
const recent = response.links.filter(l =>
  l.url.includes('2024') || l.url.includes('2025')
);
No, the Map API cannot access:
  • Password-protected pages
  • Pages behind authentication
  • Private intranets
  • Paywalled content
Only public, accessible websites can be mapped.
The timeout parameter controls how long the mapping operation runs:
timeout: 15000  // 15 seconds (default)
If the operation takes longer:
  • It will stop and return what was found so far
  • Or return an error if nothing was discovered
Large sites may need longer timeouts.

Next Steps


Need Help?

Found a bug or have a feature request? We’d love to hear from you! Join our Discord or email us at [email protected]