Skip to main content

Overview

Use Crawl when you need content from multiple pages under one site. The endpoint streams Server-Sent Events as pages finish, so your application can process pages without waiting for the full crawl to complete. The public crawl endpoint currently returns markdown page content only.

Endpoint

POST /api/v2/crawl_stream

Quickstart

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY,
});

for await (const event of client.crawlStream({
  url: 'https://www.ycombinator.com',
  maxPages: 10,
  maxDepth: 2,
  mainContentOnly: true,
})) {
  if (event.type === 'page') {
    console.log(event.page.final_url, event.page.markdown?.slice(0, 120));
  }

  if (event.type === 'usage') {
    console.log('Cost:', event.cost);
  }
}

Request Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-Seed URL
max_pagesintegerNo25Maximum pages to return, hard limit 100
max_depthintegerNo2Link depth from the seed URL
timeoutnumber | nullNo60Total crawl time budget in seconds
include_subdomainsbooleanNofalseInclude subdomains
include_linksbooleanNotrueKeep links in markdown content
include_imagesbooleanNotrueKeep image references in markdown content
advanced_proxybooleanNofalseUse for protected sites
main_content_onlybooleanNofalseReduce navigation and boilerplate
formats["markdown"]No["markdown"]Accepted for compatibility; only markdown is honored
HTTP requests use snake_case. The TypeScript SDK uses camelCase, for example maxPages, maxDepth, and mainContentOnly.

Stream Events

Each SSE frame contains a JSON object under data:.

Page

{
  "type": "page",
  "page": {
    "requested_url": "https://www.ycombinator.com",
    "final_url": "https://www.ycombinator.com",
    "title": "Y Combinator",
    "hash_sha256": "abc123...",
    "markdown": "# Docs\n\n...",
    "success": true,
    "error": null
  }
}

Usage

{
  "type": "usage",
  "billed_count": 10,
  "unit_cost": 0.001,
  "cost": 0.01
}

Done

{
  "type": "done",
  "response_time": "21.44"
}

Error

{
  "type": "error",
  "error": "Upstream crawl failed"
}

Pricing

Crawl reports usage as $0.001 per successfully scraped page in the usage event. Advanced proxy can improve success rates on protected sites. Use the emitted usage.cost and your dashboard ledger as the billing source of truth.

When to Use Map First

Use Map before Crawl when you want to inspect or filter URLs before fetching page content.
site = client.map("https://www.ycombinator.com", limit=100)
print([str(link.url) for link in site.links[:10]])

Errors

Status / eventMeaning
400Invalid URL, invalid max_pages, or PDF URL
401Missing or invalid LLMLayer API key
500 eventUpstream crawl failure after streaming starts
See Errors & Refunds for the shared error format.

More Examples

Streaming Crawl Recipes

Persist pages, handle usage events, and retry failures.

Map API

Discover URLs before crawling content.