Skip to main content

Overview

Use Extract when you need one page transformed into application-ready data. A single request can combine modes, sharing the same page fetch:
ModeResponse fieldRequires LLMCost
jsonstructured_dataYes$0.005
summarysummaryYes$0.005
qaanswerYes$0.005
linkslinksNo$0.001
brandbrandNo$0.002
All result fields are present on every response. Modes you did not request are null.

Endpoint

POST /api/v2/extract

Quickstart: Structured Data

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY,
});

const program = await client.extract('https://www.ycombinator.com/about', {
  modes: ['json'],
  jsonSchema: {
    program: 'string',
    duration: 'string',
    funding: 'string',
    benefits: ['string'],
  },
  instructions: 'Return concise values and use null when a field is missing.',
});

console.log(program.structured_data);
console.log(program.summary); // null

Request Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-Public http or https page URL
modesstring[]No["json"]Any of json, summary, qa, links, brand
json_schemaobject | stringConditionalnullRequired when modes includes json
querystringConditionalnullRequired when modes includes qa
instructionsstringNonullExtra guidance for LLM modes: json, summary, qa
response_languagestringNoautoBest for summary and Q&A text
advanced_proxybooleanNofalseUse for heavily protected pages
main_content_onlyboolean | nullNoAPI-selectedOmit to let the API choose the best mode-specific default
HTTP requests use snake_case. The TypeScript SDK uses camelCase, for example jsonSchema, responseLanguage, and advancedProxy.

Response

{
  "url": "https://www.ycombinator.com/about",
  "title": "What Happens at YC | Y Combinator",
  "metadata": {
    "description": "..."
  },
  "structured_data": {
    "program": "Y Combinator startup program",
    "duration": "3 months",
    "funding": "$500k per company",
    "benefits": ["office hours", "founder community", "Demo Day"]
  },
  "summary": null,
  "answer": null,
  "links": null,
  "brand": null,
  "cost": 0.005,
  "response_time": "3.42",
  "statusCode": 200
}
FieldTypeDescription
urlstringFinal URL after redirects
titlestring | nullPage title
metadataobject | nullPage metadata found by the scraper
structured_dataobject | nullResult of json mode
summarystring | nullResult of summary mode
answerstring | nullResult of qa mode
linksarray | nullResult of links mode
brandobject | nullResult of brand mode
costnumber | nullTotal cost for selected modes
response_timestringTotal processing time in seconds
statusCodeinteger200 on success
The structured result field is structured_data in the API, Python SDK, and TypeScript SDK.

Combining Modes

profile = client.extract(
    "https://www.ycombinator.com",
    modes=["summary", "links", "brand"],
)

print(profile.summary)
print(profile.links)
print(profile.brand)
const profile = await client.extract('https://www.ycombinator.com', {
  modes: ['summary', 'links', 'brand'],
});

console.log(profile.summary);
console.log(profile.links);
console.log(profile.brand);

JSON Schema Guidance

json_schema can be:
  • A formal JSON schema
  • An example object
  • A plain object where values describe expected types
  • A plain-text description
For reliable extraction:
  • Keep schemas focused.
  • Use arrays only when the page clearly contains lists.
  • Put normalization rules in instructions.
  • Use nullable expectations when fields may be missing.

Errors and Refunds

StatusCommon reasonCharged?
400Missing json_schema, missing query, invalid mode, PDF URLNo
422Empty extractable content, or JSON output truncatedDepends on whether AI work ran
500Page fetch failedRefunded when failure happens before AI work
502Brand fetch or model JSON failureDepends on failure stage
See Errors & Refunds for exact error codes and refund behavior.

More Examples

Extract Recipes

Products, lists, Q&A, and brand enrichment.

API Reference

Full request and response schema.