Skip to main content
POST
/
api
/
v2
/
extract
Extract API - Multi-mode page extraction
curl --request POST \
  --url https://api.llmlayer.dev/api/v2/extract \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "https://www.ycombinator.com/blog",
  "modes": [
    "json",
    "summary"
  ],
  "json_schema": "<string>",
  "query": "<string>",
  "instructions": "<string>",
  "response_language": "auto",
  "advanced_proxy": false,
  "main_content_only": true
}
'
{
  "url": "<string>",
  "title": "<string>",
  "metadata": {},
  "structured_data": {},
  "summary": "<string>",
  "answer": "<string>",
  "links": [
    {
      "url": "<string>",
      "text": "<string>",
      "internal": true
    }
  ],
  "brand": {},
  "cost": 0.01,
  "response_time": "3.42",
  "statusCode": 200
}

Authorizations

Authorization
string
header
required

Bearer token authentication using your LLMLayer API key. Include in Authorization header as: Bearer YOUR_LLMLAYER_API_KEY

Body

application/json
url
string
required

The page URL to extract from. Must be http(s). PDF URLs are not supported — use /get_pdf_content instead.

Example:

"https://www.ycombinator.com/blog"

modes
enum<string>[]

Extraction modes to run in one call. Any combination; duplicates are ignored. Pricing is summed per mode: json/summary/qa $0.005 each, links $0.001, brand $0.002.

Available options:
json,
summary,
qa,
links,
brand
Example:
["json", "summary"]
json_schema

Required when modes includes 'json'. Accepts a formal JSON schema, an example object (e.g. {"title": "string", "price": "number"}), or a plain-text description of the fields you want.

query
string

The question to answer from the page — required when modes includes 'qa'.

instructions
string

Optional extra guidance applied to all AI modes (json, summary, qa). E.g. 'dates in DD/MM/YYYY format'.

response_language
string
default:auto

Output language for summary/qa ('auto' matches the user/page language). E.g. 'en', 'fr', 'es'.

advanced_proxy
boolean
default:false

Use advanced proxy for sites with bot protection. Adds $0.004 once per request, only when a scrape runs (brand-only requests never scrape).

main_content_only
boolean

Omit this field to let the API pick the best default per selection: links-only requests scrape the full page (nav/footer links matter), AI modes use main content only.

Response

Extraction results. All result fields (structured_data, summary, answer, links, brand) are always present; modes you did not request are null.

All five result fields are always present; modes you did not request are null. The json mode result is returned as structured_data.

url
string

Final URL after redirects.

title
string | null

Page title.

metadata
object

Page metadata (description, OpenGraph fields, language, ...) as found by the scraper.

structured_data
object

Structured data matching your schema (when 'json' in modes).

summary
string | null

Markdown summary of the page (when 'summary' in modes).

answer
string | null

Markdown answer to your question (when 'qa' in modes).

All links found on the page, deduplicated, max 500 (when 'links' in modes).

brand
object

Brand profile: domain, title, description, colors, logos, backdrops, socials, industries, key links and pages (when 'brand' in modes).

cost
number

Total cost in USD — sum of the selected modes (+$0.004 advanced proxy when a scrape runs).

Example:

0.01

response_time
string

Total processing time in seconds.

Example:

"3.42"

statusCode
integer
Example:

200