Skip to main content

Overview

Use the Scraper API when you already have a URL and need page content. It supports:
  • markdown for LLM-ready text
  • html for raw rendered markup
  • screenshot for a base64 PNG capture
PDF URLs are not scraped by this endpoint. Use the PDF Content API for PDF text extraction.

Endpoint

POST /api/v2/scrape

Quickstart

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY,
});

const page = await client.scrape('https://www.ycombinator.com/blog', {
  formats: ['markdown'],
  mainContentOnly: true,
});

console.log(page.title);
console.log(page.markdown);
console.log(page.statusCode);

Formats

FormatResponse fieldBest forCost
markdownmarkdownLLM input, summaries, retrieval$0.001
htmlhtmlArchival or custom parsing$0.001
screenshotscreenshotVisual verification$0.001
You can request multiple formats in one call:
const page = await client.scrape({
  url: 'https://www.ycombinator.com',
  formats: ['markdown', 'html', 'screenshot'],
});
pdf is accepted by some clients for backward compatibility, but this endpoint does not generate PDF output. Direct PDF URLs return a validation error. Use /api/v2/get_pdf_content.

Request Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-Public http or https page URL
formatsstring[]Yes["markdown"] in SDKsAny of markdown, html, screenshot
include_imagesbooleanNotrueInclude image references in markdown
include_linksbooleanNotrueInclude links in markdown
advanced_proxybooleanNofalseUse for heavily protected sites
main_content_onlybooleanNofalseReduce navigation and boilerplate
HTTP requests use snake_case. The TypeScript SDK uses camelCase, for example advancedProxy and mainContentOnly.

Response

{
  "markdown": "# Article title\n\nArticle body...",
  "html": null,
  "screenshot": null,
  "pdf": null,
  "url": "https://www.ycombinator.com/blog",
  "title": "Article title",
  "statusCode": 200,
  "cost": 0.001,
  "metadata": {
    "description": "..."
  }
}
FieldTypeDescription
markdownstring | nullMarkdown content when available/requested
htmlstring | nullHTML content when requested
screenshotstring | nullBase64 PNG when requested
pdfstring | nullLegacy field; normally null
urlstringFinal URL after redirects
titlestring | nullPage title
statusCodeintegerTarget status code
costnumber | nullBilled cost
metadataobject | nullExtracted metadata

Pricing

Base cost is $0.001 per requested supported format. Advanced proxy adds $0.004 when enabled.
markdown only:                 $0.001
markdown + screenshot:         $0.002
markdown + html + screenshot:  $0.003
markdown + proxy:              $0.005

Errors

StatusMeaning
400Invalid URL, unsupported scheme, DNS failure, or PDF URL sent to Scraper
401Missing or invalid LLMLayer API key
403Blocked private/unsafe target
500Upstream scrape failure
See Errors & Refunds for the shared error format.

More Examples

Search + Scrape Pipeline

Search the web, scrape pages, and answer from collected context.

Extract API

Use structured extraction when you need schema-shaped data.