Map API - LLMLAYER API Documentation

What is the Map API?

The Map API is like a website explorer - it discovers all URLs on a domain and returns them with their page titles. Think of it as creating a sitemap or table of contents for any website.

URL Discovery

Find all pages on a website automatically

Site Structure

Understand website hierarchy and organization

Perfect for: Planning bulk scrapes, building sitemaps, content audits, SEO analysis, and website documentation.

Lightweight & Fast: Returns only URLs and titles - no content extraction. This makes it 10x faster than scraping each page individually.

Why Use Map Before Crawling?

Map: Discover URLs

Use Map API to find all pagesCost: $0.002 (one-time)Speed: 1-5 seconds

Filter URLs

Pick which pages you actually needFilter by keyword, path, or pattern

Crawl: Get Content

Use Crawl API on selected pages onlyCost: $0.001 per pageSave money: Only crawl what you need!

Smart workflow: Map first (

0.002) → Filter → Crawl selected pages (

0.001 each). This saves money and time!

Pricing (Super Affordable)

Flat Fee Per Site

$0.002 per map request= $2 for 1,000 sites mapped

Cost is the same whether you discover 10 URLs or 5,000 URLs. It’s a flat fee per website, not per URL discovered.

Before You Start

Authentication

All requests require your API key in the Authorization header:

Authorization: Bearer YOUR_LLMLAYER_API_KEY

Keep your API key secure! Never expose it in client-side code. Always call from your backend.

Your First Map (2-Minute Start)

Let’s discover all pages on a website!

import { LLMLayerClient } from 'llmlayer';

// 1. Create a client
const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

// 2. Map a website
const response = await client.map({
  url: 'https://example.com'
});

// 3. See all discovered URLs
console.log(`Found ${response.links.length} pages:\n`);

for (const link of response.links) {
  console.log(`📄 ${link.title}`);
  console.log(`   ${link.url}\n`);
}

console.log(`Cost: $${response.cost}`);  // $0.002

Done! You just discovered all pages on a website in seconds. Now you know exactly what’s there before crawling or scraping.

How It Works

The Map API discovers URLs using multiple strategies:

Check Sitemap

First looks for sitemap.xml (fastest method)

Crawl Links

If no sitemap, crawls the site following links

Extract Titles

Gets page titles without downloading full content

Return List

Returns complete URL list with titles

Smart discovery: The API automatically chooses the best method. If a sitemap exists, it uses that (super fast). Otherwise, it crawls to find pages.

Basic Usage

Map a Website (All Pages)

Discover all pages on a domain.

const response = await client.map({
  url: 'https://docs.example.com'
});

console.log(`Discovered ${response.links.length} pages`);

// Show first 5 pages
response.links.slice(0, 5).forEach(link => {
  console.log(`${link.title} → ${link.url}`);
});

Advanced Options

Filter URLs by Keyword

Only discover URLs containing specific keywords.

// Only find blog posts
const response = await client.map({
  url: 'https://example.com',
  search: 'blog'  // Only URLs containing "blog"
});

console.log(`Found ${response.links.length} blog pages`);

for (const link of response.links) {
  console.log(`${link.title} → ${link.url}`);
}

Use search to narrow results:

search: "blog" - Find all blog posts
search: "docs" - Find all documentation
search: "api" - Find API-related pages
search: "2024" - Find pages from 2024

Include Subdomains

Discover URLs across all subdomains (blog.example.com, docs.example.com, etc.)

const response = await client.map({
  url: 'https://example.com',
  includeSubdomains: true  // Include blog.*, docs.*, api.*, etc.
});

console.log(`Found ${response.links.length} pages across all subdomains`);

// Group by subdomain
const bySubdomain = new Map();

for (const link of response.links) {
  const hostname = new URL(link.url).hostname;
  if (!bySubdomain.has(hostname)) {
    bySubdomain.set(hostname, []);
  }
  bySubdomain.get(hostname).push(link);
}

// Show breakdown
for (const [subdomain, links] of bySubdomain) {
  console.log(`\n${subdomain}: ${links.length} pages`);
}

Ignore Sitemap (Force Crawling)

Force the API to crawl instead of using sitemap.xml.

const response = await client.map({
  url: 'https://example.com',
  ignoreSitemap: true  // Don't use sitemap.xml, crawl instead
});

console.log(`Crawled and found ${response.links.length} pages`);

When to use this:

Sitemap is outdated or incomplete
You want to discover hidden pages
Testing actual site structure vs sitemap

Set URL Limit

Limit the number of URLs discovered (default: 5000).

const response = await client.map({
  url: 'https://large-site.com',
  limit: 100  // Stop after finding 100 URLs
});

console.log(`Limited to ${response.links.length} pages`);

Use limit to:

Sample large websites quickly
Test before full mapping
Control response size

Set Timeout

Control how long to wait (default: 15000ms / 15 seconds).

const response = await client.map({
  url: 'https://example.com',
  timeoutMs: 30000  // Wait up to 30 seconds
});

console.log(`Found ${response.links.length} pages`);

Real-World Examples

Example 1: Plan a Bulk Scrape

Map a site, filter URLs, then scrape only what you need.

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

async function planScrape(baseUrl: string, keyword: string) {
  console.log(`🗺️  Mapping ${baseUrl}...\n`);

  // 1. Map the site
  const mapResponse = await client.map({
    url: baseUrl,
    search: keyword  // Filter by keyword
  });

  console.log(`✅ Found ${mapResponse.links.length} matching pages`);
  console.log(`💰 Map cost: $${mapResponse.cost}\n`);

  // 2. Review what you found
  console.log('Pages to scrape:');
  mapResponse.links.forEach((link, i) => {
    console.log(`${i + 1}. ${link.title}`);
    console.log(`   ${link.url}`);
  });

  // 3. Calculate scraping cost
  const scrapeCost = mapResponse.links.length * 0.002;
  const totalCost = mapResponse.cost + scrapeCost;

  console.log(`\n💰 Cost Estimate:`);
  console.log(`   Map: $${mapResponse.cost}`);
  console.log(`   Scrape ${mapResponse.links.length} pages: $${scrapeCost.toFixed(3)}`);
  console.log(`   Total: $${totalCost.toFixed(3)}`);

  return mapResponse.links;
}

// Use it
const pages = await planScrape('https://docs.example.com', 'api');

// Now scrape only the pages you want
console.log('\n📥 Scraping selected pages...');
for (const page of pages.slice(0, 5)) {  // Scrape first 5
  const content = await client.scrape({
    url: page.url,
    formats: ['markdown']
  });
  console.log(`✅ Scraped: ${page.title}`);
}

Output:

🗺️  Mapping https://docs.example.com...

✅ Found 23 matching pages
💰 Map cost: $0.002

Pages to scrape:
1. API Reference
   https://docs.example.com/api/reference
2. API Authentication
   https://docs.example.com/api/auth
...

💰 Cost Estimate:
   Map: $0.002
   Scrape 23 pages: $0.023
   Total: $0.025

📥 Scraping selected pages...
✅ Scraped: API Reference
✅ Scraped: API Authentication
...

Example 2: Build a Sitemap

Generate a sitemap from any website.

import fs from 'fs';

async function generateSitemap(url: string) {
  console.log(`🗺️  Mapping ${url}...\n`);

  const response = await client.map({ url });

  console.log(`✅ Found ${response.links.length} pages\n`);

  // Create XML sitemap
  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${response.links.map(link => `  <url>
    <loc>${link.url}</loc>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>`).join('\n')}
</urlset>`;

  // Save to file
  fs.writeFileSync('sitemap.xml', sitemap);

  console.log('✅ Saved sitemap.xml');
  console.log(`📊 ${response.links.length} URLs included`);
  console.log(`💰 Cost: $${response.cost}`);

  return sitemap;
}

await generateSitemap('https://example.com');

Example 3: Content Audit Tool

Analyze website structure and find pages by category.

async function auditWebsite(url: string) {
  console.log(`🔍 Auditing ${url}...\n`);

  const response = await client.map({ url });

  console.log(`Total pages: ${response.links.length}\n`);

  // Categorize by path
  const categories = {
    blog: [],
    docs: [],
    api: [],
    products: [],
    other: []
  };

  for (const link of response.links) {
    const path = new URL(link.url).pathname.toLowerCase();

    if (path.includes('/blog')) categories.blog.push(link);
    else if (path.includes('/doc')) categories.docs.push(link);
    else if (path.includes('/api')) categories.api.push(link);
    else if (path.includes('/product')) categories.products.push(link);
    else categories.other.push(link);
  }

  // Report
  console.log('📊 Content Breakdown:');
  console.log(`   Blog posts: ${categories.blog.length}`);
  console.log(`   Documentation: ${categories.docs.length}`);
  console.log(`   API pages: ${categories.api.length}`);
  console.log(`   Products: ${categories.products.length}`);
  console.log(`   Other: ${categories.other.length}`);

  // Find pages without titles
  const noTitle = response.links.filter(l => !l.title || l.title.trim() === '');
  if (noTitle.length > 0) {
    console.log(`\n⚠️  ${noTitle.length} pages missing titles:`);
    noTitle.slice(0, 5).forEach(l => console.log(`   - ${l.url}`));
  }

  // Longest/shortest paths
  const sorted = [...response.links].sort((a, b) =>
    new URL(a.url).pathname.length - new URL(b.url).pathname.length
  );

  console.log('\n📏 URL Analysis:');
  console.log(`   Shortest: ${new URL(sorted[0].url).pathname}`);
  console.log(`   Longest: ${new URL(sorted[sorted.length - 1].url).pathname}`);

  console.log(`\n💰 Audit cost: $${response.cost}`);

  return categories;
}

const audit = await auditWebsite('https://example.com');

Request Parameters (Complete Reference)

Endpoint: POST /api/v2/map

Required Parameters

url

string

required

The website URL to map. Must be a valid HTTP(S) URL.Examples:

✅ https://example.com
✅ https://docs.example.com
❌ example.com (missing protocol)

Optional Parameters

ignoreSitemap

boolean

default:"false"

Skip sitemap.xml and force crawling to discover URLs.Use when:

Sitemap is outdated
You want actual site structure
Testing completeness

ignoreSitemap: true  // Force crawling

includeSubdomains

boolean

default:"false"

Include URLs from all subdomains (blog., docs., api.*, etc.)

includeSubdomains: true  // Map blog.example.com, docs.example.com, etc.

string

Filter discovered URLs by keyword. Only returns URLs containing this string.Examples:

search: "blog"     // Only blog URLs
search: "api"      // Only API URLs
search: "2024"     // Only URLs with "2024"

limit

integer

default:"5000"

Maximum number of URLs to discover. Stops after reaching this limit.Range: 1 - 5000

limit: 100  // Stop after 100 URLs

timeout

integer

default:"15000"

Timeout in milliseconds (how long to wait).Default: 15000 (15 seconds)

timeout: 30000  // 30 seconds

This is the operation timeout, not the HTTP request timeout. It controls how long the mapping operation runs.

Response Format

Response Structure

{
  "links": [
    {
      "url": "https://example.com/",
      "title": "Home Page"
    },
    {
      "url": "https://example.com/about",
      "title": "About Us"
    },
    {
      "url": "https://example.com/products",
      "title": "Products"
    }
  ],
  "statusCode": 200,
  "cost": 0.002
}

Response Fields

links

array

Array of discovered URLs with titles.Each link object contains:

url (string): Full URL
title (string): Page title

[
  {"url": "https://...", "title": "Page Title"},
  {"url": "https://...", "title": "Another Page"}
]

statusCode

integer

HTTP status code (200 for success)

cost

number

Cost in USD ($0.001 per request)

0.001

Error Handling

Error Format

All errors use this structure:

{
  "detail": {
    "error_type": "map_error",
    "error_code": "map_failed",
    "message": "Failed to map the provided URL",
    "details": {
      "url": "https://example.com",
      "error": "Connection timeout"
    }
  }
}

Common Errors

401 - Authentication Error

Missing or invalid API key

{
  "error_code": "missing_llmlayer_api_key",
  "message": "Provide LLMLayer API key via 'Authorization: Bearer <token>'"
}

Fix: Add your API key to the Authorization header.

400 - Invalid URL

Invalid or malformed URL

{
  "error_code": "invalid_url",
  "message": "The provided URL is not valid"
}

Fix: Ensure URL includes protocol (https://) and is properly formatted.

500 - Map Failed

Failed to map the website

{
  "error_code": "map_failed",
  "message": "Failed to map the provided URL"
}

Common causes:

Website is down
Website blocks crawlers
Network timeout
Invalid site structure

Fix: Retry or check if the website is accessible.

504 - Timeout

Operation timeout

{
  "error_code": "map_timeout",
  "message": "Mapping operation timed out"
}

Fix: Increase timeout or try mapping with a limit.

Robust Error Handling

import {
  LLMLayerClient,
  AuthenticationError,
  InvalidRequest,
  InternalServerError
} from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY
});

async function robustMap(url: string, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.map({ url });

    } catch (error) {
      // Don't retry auth errors
      if (error instanceof AuthenticationError) {
        console.error('❌ Fix your API key');
        throw error;
      }

      // Don't retry invalid URLs
      if (error instanceof InvalidRequest) {
        console.error('❌ Invalid URL:', url);
        throw error;
      }

      // Retry server errors
      if (error instanceof InternalServerError) {
        const waitTime = Math.pow(2, attempt) * 1000;
        console.log(`⏳ Map failed. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));

        if (attempt === maxRetries - 1) {
          console.error('❌ Max retries exceeded');
          throw error;
        }
        continue;
      }

      throw error;
    }
  }
}

// Usage
try {
  const response = await robustMap('https://example.com');
  console.log(`✅ Mapped ${response.links.length} pages`);
} catch (error) {
  console.error('Map failed:', error);
}

Best Practices

💰 Cost Optimization

Use Map before Crawl

Map entire site once ($0.001)
Filter URLs to only what you need
Crawl selected pages ($0.001 each)
Save money on unnecessary scraping

Cache results

Site structures don’t change often
Cache map results for hours/days
Re-map only when site updates

⚡ Performance Tips

Use appropriate limits

Don’t map 5000 URLs if you only need 50
Set limit to control discovery
Use search to filter early

Choose right strategy

Default (sitemap): Fastest
ignoreSitemap: true: More complete
includeSubdomains: Comprehensive

✨ Better Results

Filter effectively

Use search parameter to narrow results
Filter by path, keyword, or pattern
Process results programmatically

Understand the data

Check links.length before processing
Validate URLs before scraping
Group by subdomain or path

🛡️ Reliability

Handle errors gracefully

Some sites block mapping
Network issues happen
Implement retry logic

Validate before use

Check statusCode === 200
Verify links is not empty
Handle partial results

Monitor usage

Track discovered URL counts
Log failed mappings
Alert on anomalies

Important Limitations

Cannot map:

Sites that require authentication
Sites with aggressive bot detection
Sites that block crawlers in robots.txt
Private or internal networks

Maximum limits:

Up to 5,000 URLs per request
15-second default timeout (configurable)
Flat fee regardless of URLs found

What works best:

Public websites with sitemaps
Documentation sites
Blogs and news sites
E-commerce product catalogs
Company websites

Quick Tips

Planning to scrape? Always map first:

// 1. Map to discover (cheap)
const map = await client.map({ url });
// 2. Filter what you need
const filtered = map.links.filter(l => l.url.includes('/blog'));
// 3. Scrape only those (saves money!)

Large site? Use search to narrow results:

search: "docs"  // Only documentation pages

Need subdomains too? Enable the option:

includeSubdomains: true  // Gets blog.*, docs.*, api.*

Sitemap unreliable? Force crawling:

ignoreSitemap: true  // Discover actual site structure

Frequently Asked Questions

What's the difference between Map and Crawl?

Map API:

Discovers URLs only (no content)
Returns titles but no page content
Super fast (1-5 seconds)
Very cheap ($0.002)
Use for: Discovery, planning, sitemaps

Crawl API:

Gets full content from each page
Returns markdown, HTML, PDF, screenshots
Slower (depends on page count)
More expensive ($0.001 per page)
Use for: Content extraction, archiving

Best workflow: Map → Filter → Crawl selected pages

How does the API discover URLs?

The API uses multiple strategies:

Sitemap first (fastest)
- Checks for /sitemap.xml
- Uses sitemap index if available
- Parses all URLs from sitemap
Crawling fallback (if no sitemap or ignoreSitemap: true)
- Starts from homepage
- Follows internal links
- Discovers pages organically
Title extraction
- Fetches page metadata only
- Gets title without full content
- Much faster than scraping

Is the limit per domain or total?

The limit parameter is the total number of URLs returned, not per subdomain.

limit: 100  // Returns maximum 100 URLs total

If you enable includeSubdomains, those 100 URLs can come from any subdomain.

Can I map a specific subdomain only?

Yes! Just provide the subdomain URL:

await client.map({
  url: 'https://docs.example.com'  // Only docs subdomain
});

This will only map that specific subdomain, not the main domain or other subdomains.

What if the sitemap is outdated?

Use ignoreSitemap: true to force actual crawling:

await client.map({
  url: 'https://example.com',
  ignoreSitemap: true  // Skip sitemap, crawl instead
});

This discovers the actual site structure by following links, which may find pages not in the sitemap.

How can I filter results after mapping?

Process the links array with JavaScript/Python:

const response = await client.map({ url });

// Filter by path
const blogPosts = response.links.filter(l =>
  l.url.includes('/blog/')
);

// Filter by title
const apiDocs = response.links.filter(l =>
  l.title.toLowerCase().includes('api')
);

// Filter by date in URL
const recent = response.links.filter(l =>
  l.url.includes('2024') || l.url.includes('2025')
);

Can I map password-protected sites?

No, the Map API cannot access:

Password-protected pages
Pages behind authentication
Private intranets
Paywalled content

Only public, accessible websites can be mapped.

What's the timeout for?

The timeout parameter controls how long the mapping operation runs:

timeout: 15000  // 15 seconds (default)

If the operation takes longer:

It will stop and return what was found so far
Or return an error if nothing was discovered

Large sites may need longer timeouts.

Next Steps

Crawl API

Scrape multiple pages after mapping

Scraper API

Scrape individual pages

Answer API

Search + AI-powered answers

Python SDK

Python client documentation

TypeScript SDK

TypeScript client documentation

Need Help?

Discord Community

Chat with other developers

Email Support

[email protected]

Found a bug or have a feature request? We’d love to hear from you! Join our Discord or email us at [email protected]

Get Started

​What is the Map API?

URL Discovery

Site Structure

​Why Use Map Before Crawling?

​Pricing (Super Affordable)

Flat Fee Per Site

​Before You Start

​Authentication

​Your First Map (2-Minute Start)

​How It Works

​Basic Usage

​Map a Website (All Pages)

​Advanced Options

​Filter URLs by Keyword

​Include Subdomains

​Ignore Sitemap (Force Crawling)

​Set URL Limit

​Set Timeout

​Real-World Examples

​Example 1: Plan a Bulk Scrape

​Example 2: Build a Sitemap

​Example 3: Content Audit Tool

​Request Parameters (Complete Reference)

​Required Parameters

​Optional Parameters

​Response Format

​Response Structure

​Response Fields

​Error Handling

​Error Format

​Common Errors

​Robust Error Handling

​Best Practices

💰 Cost Optimization

⚡ Performance Tips

✨ Better Results

🛡️ Reliability

​Important Limitations

​Quick Tips

​Frequently Asked Questions

​Next Steps

Crawl API

Scraper API

Answer API

Python SDK

TypeScript SDK

​Need Help?

Discord Community

Email Support

What is the Map API?

Why Use Map Before Crawling?

Pricing (Super Affordable)

Before You Start

Authentication

Your First Map (2-Minute Start)

How It Works

Basic Usage

Map a Website (All Pages)

Advanced Options

Filter URLs by Keyword

Include Subdomains

Ignore Sitemap (Force Crawling)

Set URL Limit

Set Timeout

Real-World Examples

Example 1: Plan a Bulk Scrape

Example 2: Build a Sitemap

Example 3: Content Audit Tool

Request Parameters (Complete Reference)

Required Parameters

Optional Parameters

Response Format

Response Structure

Response Fields

Error Handling

Error Format

Common Errors

Robust Error Handling

Best Practices

Important Limitations

Quick Tips

Frequently Asked Questions

Next Steps

Need Help?