Scraper API - Multi-format content extraction
Extract content from a non-PDF URL as markdown, HTML, or a screenshot (base64). Direct PDF URLs are rejected; use /api/v2/get_pdf_content.
Authorizations
Bearer token authentication using your LLMLayer API key. Include in Authorization header as: Bearer YOUR_LLMLAYER_API_KEY
Body
URL to scrape
"https://www.ycombinator.com/blog"
Output formats to generate: markdown, html, or screenshot. Direct PDF URLs are rejected; use /api/v2/get_pdf_content. If older clients send 'pdf', it is ignored.
markdown, html, screenshot ["markdown", "screenshot"]Include images in markdown output
Include hyperlinks in markdown output
Enable advanced proxy for heavily protected sites.
Extract only the main content, excluding navigation and boilerplate.
Response
Scraped content in requested formats
Final URL after any redirects
HTTP status code (200 for success)
Markdown content when available/requested.
Content in HTML format (when 'html' in formats)
Legacy field. Present for backward compatibility; usually null/empty.
Base64-encoded screenshot image (when 'screenshot' in formats)
Page title extracted from metadata
Cost in USD ($0.001 per format requested)
Additional metadata extracted from the page
