Skip to main content
POST
/
api
/
v2
/
crawl_stream
Crawl API - Multi-page website crawler with streaming
curl --request POST \
  --url https://api.llmlayer.dev/api/v2/crawl_stream \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "https://example.com",
  "max_pages": 25,
  "max_depth": 2,
  "timeout": 60,
  "include_subdomains": false,
  "include_links": true,
  "include_images": true,
  "advanced_proxy": false,
  "main_content_only": false,
  "formats": [
    "markdown",
    "html"
  ]
}
'
"data: {\"type\":\"page\",\"page\":{\"requested_url\":\"https://example.com/page1\",\"final_url\":\"https://example.com/page1\",\"title\":\"Page Title\",\"hash_sha256\":\"abc123...\",\"markdown\":\"# Content...\",\"success\":true}}\n\n"

Authorizations

Authorization
string
header
required

Bearer token authentication using your LLMLayer API key. Include in Authorization header as: Bearer YOUR_LLMLAYER_API_KEY

Body

application/json
url
string<uri>
required

Seed URL to start crawling from

Example:

"https://example.com"

max_pages
integer
default:25

Maximum number of pages to crawl (hard limit: 100)

Required range: 1 <= x <= 100
max_depth
integer
default:2

Maximum depth to crawl from seed URL

Required range: x >= 1
timeout
number | null
default:60

Total timeout in seconds for the entire crawl operation

include_subdomains
boolean
default:false

If true, includes pages from subdomains

Include hyperlinks in extracted content

include_images
boolean
default:true

Include images in extracted content

advanced_proxy
boolean | null
default:false

Enable advanced proxy for protected sites.

main_content_only
boolean | null
default:false

Extract only main page content.

formats
enum<string>[]

Output format preference for crawled pages. Currently markdown is returned.

Available options:
markdown,
html,
screenshot,
pdf
Example:
["markdown", "html"]

Response

Server-Sent Events stream of crawled pages

SSE stream with event types: page, usage, done, error