What is the Scraper API?
The Scraper API converts any web page into clean, usable formats. Point it at a URL and get back:Clean Text
Markdown or HTMLExtract the main content without ads, popups, or navigation clutter
Visual Captures
ScreenshotGet a visual snapshot of the entire page as PNG image
Multi-format support: Request multiple formats in one API call! Each format costs $0.001.
Pricing (Pay Per Format)
Per-Format Pricing Model
$0.001 per format= $1 for 1,000 formats
Each format you request costs 0.003 total.
Advanced Proxy Pricing
Advanced Proxy (Optional)
Additional $0.004 per requestUse advanced proxy for:
- Sites with aggressive bot detection
- Sites that block standard requests
- Enterprise websites with strict security
- E-commerce sites with protection
One-time fee per request: The advanced proxy adds $0.004 per request, regardless of how many formats you request.
Pricing Examples
Single format:Before You Start
Authentication
All requests require your API key in theAuthorization header:
Your First Scrape (2-Minute Start)
Let’s scrape a website in under 2 minutes!Done! You just scraped a website and got clean markdown. The API removed all ads, navigation, and clutter - leaving only the main content.
Output Formats Explained
The Scraper API supports 3 output formats. You can request one or multiple formats in a single call.Quick Reference
| Format | Returns | Best For | Output Field | Cost |
|---|---|---|---|---|
| markdown | Clean text with formatting | AI processing, content extraction | markdown | $0.001 |
| html | Raw HTML | Preserving structure, custom parsing | html | $0.001 |
| screenshot | PNG image (base64) | Visual testing, archiving | screenshot | $0.001 |
Total cost = number of formats × $0.001Request 2 formats? Pay 0.003.
Markdown Format (Clean Text)
Get clean, readable text without ads, popups, or navigation. Best for: Content extraction, AI training data, reading apps, RSS feedsBasic Example
Control Images and Links
Markdown output is always clean:
- Removes ads, popups, cookie banners
- Removes navigation menus and sidebars
- Extracts only the main content
- Preserves formatting (headers, lists, code blocks)
Clean Content with Main Content Only
Extract only the main article/content without navigation, headers, or footers.Perfect for:
- Blog posts (without sidebar clutter)
- News articles (just the story)
- Documentation (pure content)
- Research papers (main text only)
- AI training data (cleaner input)
- ❌ Navigation bars
- ❌ Sidebars
- ❌ Headers and footers
- ❌ Advertisement sections
- ❌ Related posts widgets
- ✅ Main article content
- ✅ Embedded images in content
- ✅ Code blocks and tables
HTML Format (Raw HTML)
Get the complete HTML structure of the page. Best for: Custom parsing, preserving exact structure, web scraping frameworksExample
Screenshot Format (PNG Image)
Capture a visual snapshot of the page as a PNG image. Best for: Visual testing, documentation, archiving how a page looks, change detectionExample
Screenshot details:
- Full-page screenshot (not just viewport)
- PNG format
- Base64 encoded in the response
- Typical size: 100KB - 2MB depending on page
Advanced Proxy for Protected Sites
Use advanced proxy infrastructure for sites with strict bot protection.When to use advanced proxy:
- Site returns 403 Forbidden
- Getting CAPTCHA challenges
- High-security enterprise sites
- E-commerce platforms
- Sites that block datacenter IPs
- After standard scrape fails
Multi-Format Scraping
Request multiple formats in one API call!Cost calculation: 3 formats × 0.003 total
Combine All Features
Get clean content from protected sites with all formats.Real-World Examples
Example 1: Content Aggregator
Build a news aggregator that saves articles in multiple formats.Example 2: Visual Testing Tool
Monitor website changes by comparing screenshots.Example 3: Protected Site Scraper
Scrape content from protected e-commerce sites.Request Parameters (Complete Reference)
Endpoint:POST /api/v2/scrape
Required Parameters
The URL to scrape. Must be a valid HTTP(S) URL.Examples:
- ✅
https://example.com/article - ✅
https://blog.com/post?id=123 - ❌
example.com(missing protocol) - ❌
ftp://example.com(unsupported protocol)
List of output formats to generate. Can request one or multiple.Options:
"markdown", "html", "screenshot"Examples:Cost = number of formats × $0.001
Optional Parameters
Extract only the main content, removing navigation, headers, footers, and sidebars.
Perfect for: Blog posts, news articles, documentation, and AI training data where you want clean, focused content.
Enable advanced proxy infrastructure for sites with bot protection.
Use when sites return 403 errors, CAPTCHA challenges, or have aggressive bot detection.
Include images in markdown output. Only affects markdown format.
true- Keep image links in markdownfalse- Remove all images, text only
Include hyperlinks in markdown output. Only affects markdown format.
true- Keep hyperlinksfalse- Remove all links, plain text
Response Format
Response Structure
Response Fields
Clean markdown content (when
"markdown" in formats)Raw HTML content (when
"html" in formats)Base64-encoded PNG image (when
"screenshot" in formats)Decode to save:Final URL after following redirects
Page title extracted from metadata
HTTP status code (200 for success)
Cost in USD (number of formats × 0.004 if using advanced proxy)
Additional page metadata (when available)
Error Handling
Error Format
All errors use this structure:Common Errors
401 - Authentication Error
401 - Authentication Error
Missing or invalid API keyFix: Add your API key to the Authorization header.
400 - Invalid URL
400 - Invalid URL
Invalid or malformed URLFix: Ensure URL includes protocol (
https://) and is properly formatted.500 - Scraping Failed
500 - Scraping Failed
Failed to scrape the websiteCommon causes:
- Website is down
- Page requires authentication
- JavaScript-heavy site didn’t render
- Connection timeout
advancedProxy: true.504 - Timeout
504 - Timeout
Request took too longFix: The page took too long to load. Try again or the website may be slow.
Robust Error Handling
Best Practices
💰 Cost Optimization
Request only what you need
- Need just text? Request only markdown
- Need visual verification? Add screenshot
- Each format costs $0.001
- Web pages don’t change every second
- Cache for hours or days depending on content
- Save money on re-scraping
- Only for protected sites
- Costs $0.004 extra per request
- But significantly improves success rate
⚡ Performance Tips
Choose the right format
- Markdown: Fastest, smallest
- HTML: Fast, larger
- Screenshot: Slower, largest
includeImages: false= smaller, fasterincludeLinks: false= cleaner text
- Faster extraction
- Smaller markdown output
- Better for AI processing
- Scrape multiple URLs simultaneously
- Use Promise.all() or asyncio.gather()
🛡️ Reliability
Always handle errors
- Some sites block scrapers
- Some pages require auth
- Network issues happen
- Getting 403 errors
- Site blocks requests
- Standard scrape fails
- Need higher success rate
- Exponential backoff for failures
- Don’t retry bad URLs
- Max 3 retries recommended
- Check protocol (https://)
- Ensure proper formatting
- Handle user input carefully
Quick Tips
Frequently Asked Questions
How is pricing calculated?
How is pricing calculated?
Simple formula:The advanced proxy fee ($0.004) is charged once per request, regardless of how many formats you request.
- Base cost = number of formats × $0.001
- If
advanced_proxy: true, add $0.004
When should I use main_content_only?
When should I use main_content_only?
Use
main_content_only: true when:- You’re training AI models (cleaner data)
- You need to remove sidebars and navigation
- You want focused documentation content
- You need the full page structure
- Navigation menus are important
- You want sidebar information
- Page layout matters
- Headers and footers
- Navigation bars
- Sidebars
- Advertisement sections
- Related posts widgets
- Main article content
- Images within content
- Code blocks
- Tables
When should I use advanced_proxy?
When should I use advanced_proxy?
Use Even though it costs more, you actually get the data instead of a failure!
advanced_proxy: true when:- Standard scrape returns 403 Forbidden
- Site shows CAPTCHA challenges
- E-commerce sites with protection
- Enterprise websites with strict security
- Datacenter IPs are blocked
- You need higher success rates on protected sites
- Standard: $0.001 per format
- With proxy: 0.004 proxy fee
What's the difference between markdown and HTML?
What's the difference between markdown and HTML?
Markdown:
- Clean, readable text
- Removes ads, navigation, clutter
- Preserves formatting (headers, lists, etc.)
- Perfect for content extraction
- Cost: $0.001
- Complete page structure
- Includes everything (ads, scripts, etc.)
- For custom parsing or preservation
- Larger file size
- Cost: $0.001
Can I get multiple formats at once?
Can I get multiple formats at once?
Yes! Request as many formats as you want:Cost: 3 formats × 0.003Each format adds $0.001 to your total cost.
Why does my screenshot look different from the browser?
Why does my screenshot look different from the browser?
Screenshots are taken in a headless browser environment which may:
- Have different viewport size
- Not load some JavaScript elements
- Use default fonts/settings
- Not include certain animations
Can I use both main_content_only and advanced_proxy?
Can I use both main_content_only and advanced_proxy?
Yes! You can combine both features:Cost: 0.004 (proxy) = $0.005Perfect for: Protected news sites, paywalled blogs, enterprise documentation
How large can the response be?
How large can the response be?
Response sizes vary by format:
- Markdown: Usually 10-200 KB
- HTML: Usually 50-500 KB
- Screenshot: Usually 100KB-2MB
What if a website blocks the scraper?
What if a website blocks the scraper?
Some websites use bot detection that may block scraping. Signs:
- 403 Forbidden errors
- Captcha pages
- Empty/incomplete content
- Enable
advancedProxy: true(+$0.004) - Try again later
- Check if the site has an official API
- Contact the website owner for permission
Next Steps
Crawl API
Scrape multiple pages automatically
Answer API
Scrape + AI-powered answers in one call
Web Search
Find URLs to scrape with web search
Need Help?
Found a bug or have a feature request? We’d love to hear from you! Join our Discord or email us at [email protected]
