Extract API

Overview

Use Extract when you need one page transformed into application-ready data. A single request can combine modes, sharing the same page fetch:

Mode	Response field	Requires LLM	Cost
`json`	`structured_data`	Yes	`$0.005`
`summary`	`summary`	Yes	`$0.005`
`qa`	`answer`	Yes	`$0.005`
`links`	`links`	No	`$0.001`
`brand`	`brand`	No	`$0.002`

All result fields are present on every response. Modes you did not request are null.

Endpoint

POST /api/v2/extract

Quickstart: Structured Data

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY,
});

const program = await client.extract('https://www.ycombinator.com/about', {
  modes: ['json'],
  jsonSchema: {
    program: 'string',
    duration: 'string',
    funding: 'string',
    benefits: ['string'],
  },
  instructions: 'Return concise values and use null when a field is missing.',
});

console.log(program.structured_data);
console.log(program.summary); // null

from llmlayer import LLMLayerClient

client = LLMLayerClient(api_key="YOUR_LLMLAYER_API_KEY")

program = client.extract(
    "https://www.ycombinator.com/about",
    modes=["json"],
    json_schema={
        "program": "string",
        "duration": "string",
        "funding": "string",
        "benefits": ["string"],
    },
    instructions="Return concise values and use null when a field is missing.",
)

print(program.structured_data)
print(program.summary)  # None

curl -X POST https://api.llmlayer.dev/api/v2/extract \
  -H "Authorization: Bearer YOUR_LLMLAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.ycombinator.com/about",
    "modes": ["json"],
    "json_schema": {
      "program": "string",
      "duration": "string",
      "funding": "string",
      "benefits": ["string"]
    },
    "instructions": "Return concise values and use null when a field is missing."
  }'

Request Parameters

Parameter	Type	Required	Default	Description
`url`	`string`	Yes	-	Public `http` or `https` page URL
`modes`	`string[]`	No	`["json"]`	Any of `json`, `summary`, `qa`, `links`, `brand`
`json_schema`	`object \| string`	Conditional	`null`	Required when `modes` includes `json`
`query`	`string`	Conditional	`null`	Required when `modes` includes `qa`
`instructions`	`string`	No	`null`	Extra guidance for LLM modes: `json`, `summary`, `qa`
`response_language`	`string`	No	`auto`	Best for summary and Q&A text
`advanced_proxy`	`boolean`	No	`false`	Use for heavily protected pages
`main_content_only`	`boolean \| null`	No	API-selected	Omit to let the API choose the best mode-specific default

HTTP requests use snake_case. The TypeScript SDK uses camelCase, for example jsonSchema, responseLanguage, and advancedProxy.

Response

{
  "url": "https://www.ycombinator.com/about",
  "title": "What Happens at YC | Y Combinator",
  "metadata": {
    "description": "..."
  },
  "structured_data": {
    "program": "Y Combinator startup program",
    "duration": "3 months",
    "funding": "$500k per company",
    "benefits": ["office hours", "founder community", "Demo Day"]
  },
  "summary": null,
  "answer": null,
  "links": null,
  "brand": null,
  "cost": 0.005,
  "response_time": "3.42",
  "statusCode": 200
}

Field	Type	Description
`url`	`string`	Final URL after redirects
`title`	`string \| null`	Page title
`metadata`	`object \| null`	Page metadata found by the scraper
`structured_data`	`object \| null`	Result of `json` mode
`summary`	`string \| null`	Result of `summary` mode
`answer`	`string \| null`	Result of `qa` mode
`links`	`array \| null`	Result of `links` mode
`brand`	`object \| null`	Result of `brand` mode
`cost`	`number \| null`	Total cost for selected modes
`response_time`	`string`	Total processing time in seconds
`statusCode`	`integer`	`200` on success

The structured result field is structured_data in the API, Python SDK, and TypeScript SDK.

Combining Modes

profile = client.extract(
    "https://www.ycombinator.com",
    modes=["summary", "links", "brand"],
)

print(profile.summary)
print(profile.links)
print(profile.brand)

const profile = await client.extract('https://www.ycombinator.com', {
  modes: ['summary', 'links', 'brand'],
});

console.log(profile.summary);
console.log(profile.links);
console.log(profile.brand);

JSON Schema Guidance

json_schema can be:

A formal JSON schema
An example object
A plain object where values describe expected types
A plain-text description

For reliable extraction:

Keep schemas focused.
Use arrays only when the page clearly contains lists.
Put normalization rules in instructions.
Use nullable expectations when fields may be missing.

Errors and Refunds

Status	Common reason	Charged?
`400`	Missing `json_schema`, missing `query`, invalid mode, PDF URL	No
`422`	Empty extractable content, or JSON output truncated	Depends on whether AI work ran
`500`	Page fetch failed	Refunded when failure happens before AI work
`502`	Brand fetch or model JSON failure	Depends on failure stage

See Errors & Refunds for exact error codes and refund behavior.

More Examples

Extract Recipes

Products, lists, Q&A, and brand enrichment.

API Reference

Full request and response schema.

Get Started

Resources

Examples

Overview

Endpoint

Quickstart: Structured Data

Request Parameters

Response

Combining Modes

JSON Schema Guidance

Errors and Refunds

More Examples

Extract Recipes

API Reference

​Overview

​Endpoint

​Quickstart: Structured Data

​Request Parameters

​Response

​Combining Modes

​JSON Schema Guidance

​Errors and Refunds

​More Examples

Extract Recipes

API Reference

Overview

Endpoint

Quickstart: Structured Data

Request Parameters

Response

Combining Modes

JSON Schema Guidance

Errors and Refunds

More Examples