Answer API - LLMLAYER API Documentation

Overview

The Answer API provides two endpoints to fit your UX:

Blocking Answer

POST /api/v1/searchGet the full answer in a single response. Perfect when you need the complete result before proceeding.

Streaming Answer

POST /api/v1/search_streamStream partial tokens and metadata via SSE for chat UIs and progressive rendering.

Authentication

All requests require a bearer token in the Authorization header:

Authorization: Bearer YOUR_LLMLAYER_API_KEY

Missing or invalid API keys return 401 with error_code: "missing_llmlayer_api_key".

Do not expose your LLMLayer API key in client‑side code. Call the API from your server (or proxy) and stream results to the browser.

Quick Start

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({ apiKey: process.env.LLMLAYER_API_KEY! });

// Basic answer
const response = await client.answer({
query: 'What are the latest AI breakthroughs?',
model: 'openai/gpt-4o-mini'
});

console.log(response.llm_response); // Markdown by default

Cost Model

Zero‑markup policy. Provider usage is passed through at cost. LLMLayer charges a small infrastructure fee per search.

Total Cost = ($0.004 × max_queries) + (Input Tokens × Model Input Price) + (Output Tokens × Model Output Price) [+ $0.001 if return_images=true]

Use response fields to monitor cost: model_cost, llmlayer_cost, and token counts.

Supported Models & Pricing

Prices are USD per 1M tokens (input/output). LLMLayer passes through provider pricing with no markup. Your total cost = provider usage + LLMLayer fee.

If a model is not supported by your account/region, the API returns 400 with error_code: "invalid_model". Keep this table in sync with your allow‑list.

OpenAI

Model	Input ($/M)	Output ($/M)	Best For
`openai/gpt-5`	$1.25	$10.00	Complex reasoning & analysis
`openai/gpt-5-mini`	$0.25	$2.00	Cost-effective reasoning
`openai/gpt-5-nano`	$0.05	$0.40	Balanced performance
`openai/o3`	$2.00	$8.00	Complex reasoning & analysis
`openai/o3-mini`	$1.10	$4.40	Cost-effective reasoning
`openai/o4-mini`	$1.10	$4.40	Balanced performance
`openai/gpt-4.1`	$2.00	$8.00	Advanced tasks
`openai/gpt-4.1-mini`	$0.40	$1.60	Efficient advanced tasks
`openai/gpt-4o`	$2.50	$10.00	Multimodal & complex queries
`openai/gpt-4o-mini`	$0.15	$0.60	Fast, affordable searches

Groq

Model	Input ($/M)	Output ($/M)	Best For
`groq/openai-gpt-oss-120b`	$0.15	$0.75	High-performance search
`groq/openai-gpt-oss-20b`	$0.10	$0.50	Budget-friendly quality
`groq/kimi-k2`	$1.00	$3.00	High-performance search
`groq/qwen3-32b`	$0.29	$0.59	Budget-friendly quality
`groq/llama-3.3-70b-versatile`	$0.59	$0.79	Versatile applications
`groq/deepseek-r1-distill-llama-70b`	$0.75	$0.99	Deep reasoning tasks
`groq/llama-4-maverick-17b-128e-instruct`	$0.20	$0.60	Fast, efficient searches

Anthropic

Model	Input ($/M)	Output ($/M)	Best For
`anthropic/claude-sonnet-4`	$3.00	$15.00	Highly creative writing & intelligent responses

DeepSeek

Model	Input ($/M)	Output ($/M)	Best For
`deepseek/deepseek-chat`	$0.27	$1.10	General purpose chat
`deepseek/deepseek-reasoner`	$0.55	$2.19	Complex reasoning

Choosing a model

Fast & economical: openai/gpt-4o-mini, groq/openai-gpt-oss-20b
Balanced quality: openai/gpt-4.1-mini, groq/llama-3.3-70b-versatile
Premium reasoning: openai/gpt-5, openai/o3, anthropic/claude-sonnet-4, deepseek/deepseek-reasoner
Multimodal: openai/gpt-4o / openai/gpt-4o-mini (text + vision)

Blocking Answer Endpoint

POST /api/v1/search Generate a complete answer in one call. The server runs targeted web search, builds a context window, and asks your chosen model to answer with optional citations.

Request Body

query

string

required

Your question or instruction.Example: "What are the latest developments in quantum computing?"

model

string

required

LLM id (e.g. openai/gpt-4o-mini, openai/gpt-4.1-mini, anthropic/claude-sonnet-4, groq/llama-3.3-70b-versatile, deepseek/deepseek-reasoner).

If a model is unsupported, the API returns 400 with error_code: "invalid_model".

provider_key

string

Your upstream provider key (OpenAI/Groq/Anthropic/DeepSeek). If provided, provider usage is billed to you, and model_cost will be null in responses.Examples: sk-... (OpenAI), sk-ant-... (Anthropic)

location

string

default:"us"

Market/geo bias for search (country code).

Show View supported locations

us, ca, uk, mx, es, de, fr, pt, be, nl, ch, no, se, at, dk, fi, tr, it, pl, ru, za, ae, sa, ar, br, au, cn, kr, jp, in, ps, kw, om, qa, il, ma, eg, ir, ly, ye, id, pk, bd, my, ph, th, vn

system_prompt

string

Override the default system prompt (non‑JSON answers). Useful for specialized tone/formatting.Example: "You are a biomedical research assistant. Cite peer‑reviewed sources."

response_language

string

default:"auto"

Output language. Use auto to infer from the query.

answer_type

string

default:"markdown"

Output format: markdown | html | json.

If answer_type="json", you must provide json_schema. JSON output is not supported by the streaming endpoint.

search_type

string

default:"general"

Search vertical: general or news.

json_schema

string

Required when answer_type="json". A JSON Schema string describing the expected output. (Client SDKs also accept an object and serialize it for you.)

citations

boolean

default:"false"

Embed inline markers like [1] in the answer body.

return_sources

boolean

default:"false"

Include aggregated source metadata in the response.

return_images

boolean

default:"false"

Return relevant images from search.

Adds $0.001 to LLMLayer cost when enabled.

date_filter

string

default:"anytime"

Recency filter: anytime, hour, day, week, month, year.

max_tokens

integer

default:"1500"

Maximum LLM output tokens.

temperature

float

default:"0.7"

Sampling temperature (0.0—2.0). Lower is more factual; higher is more creative.

domain_filter

array

Include/exclude specific domains (use a leading - to exclude).Examples: ["nature.com", "-wikipedia.org"]

max_queries

integer

default:"1"

How many search sub‑queries to generate (1–5). Each adds $0.004 and may improve coverage.

search_context_size

string

default:"medium"

How much context to feed the LLM: low | medium | high.

Response Body

llm_response

string | object

The generated answer. String for markdown/html; object for json output.

sources

array

Source documents (present when return_sources=true).

Show Source object

title

string

Source title

link

string

Page URL

snippet

string

Relevant excerpt

images

array

Images (present when return_images=true).

Show Image object

title

string

Image title or alt text

imageUrl

string

Direct URL to the image

thumbnailUrl

string

Thumbnail URL

link

string

Page containing the image

source

string

Source site

response_time

string | number

Processing time in seconds (e.g., "2.34").

input_tokens

integer

Total input tokens.

output_tokens

integer

Total output tokens.

model_cost

number | null

Provider cost in USD (null when using provider_key).

llmlayer_cost

number

LLMLayer cost in USD.

Examples

Basic (with sources)

const resp = await client.answer({
query: 'Explain quantum computing in simple terms',
model: 'openai/gpt-4o-mini',
temperature: 0.7,
maxTokens: 1000,
returnSources: true,
});

console.log(resp.llm_response);
const total = (resp.model_cost ?? 0) + (resp.llmlayer_cost ?? 0);
console.log(`Total cost: $${total.toFixed(4)}`);

Structured JSON Output

const schema = {
type: 'object',
properties: {
topic: { type: 'string' },
key_concepts: { type: 'array', items: { type: 'string' } },
applications: {
type: 'array',
items: {
type: 'object',
properties: { name: { type: 'string' }, description: { type: 'string' } },
required: ['name', 'description']
}
}
},
required: ['topic', 'key_concepts', 'applications']
};

const resp = await client.answer({
query: 'Applications of machine learning in healthcare',
model: 'openai/gpt-4o',
answerType: 'json',
jsonSchema: schema,
maxQueries: 2,
searchContextSize: 'high',
});

const data = typeof resp.llm_response === 'string' ? JSON.parse(resp.llm_response) : resp.llm_response;
console.log(data.topic, data.key_concepts.length);

News with Citations & Domain Controls

const resp = await client.answer({
query: 'Latest developments in renewable energy',
model: 'anthropic/claude-sonnet-4',
searchType: 'news',
dateFilter: 'week',
citations: true,
returnSources: true,
returnImages: true,
maxQueries: 3,
domainFilter: ['reuters.com', 'bloomberg.com', '-reddit.com'],
systemPrompt: 'Focus on breakthroughs and policy changes',
});
console.log(resp.llm_response);

Use Your Own Provider Key

const resp = await client.answer({
query: 'Complex medical research query',
model: 'openai/gpt-4o',
provider_key: process.env.OPENAI_API_KEY!,
maxTokens: 4000,
domainFilter: ['pubmed.gov', 'nature.com', 'sciencedirect.com'],
searchContextSize: 'high',
});
console.log('LLMLayer cost:', resp.llmlayer_cost); // ~$0.004 × max_queries
console.log('Model cost:', resp.model_cost);       // null when using provider_key

Streaming Answer Endpoint

POST /api/v1/search_stream Stream partial tokens and metadata via Server‑Sent Events.

Streaming does not support answer_type="json". Use the blocking endpoint for structured JSON.

Request Body Notes

All parameters from the blocking endpoint apply except:

answer_type must not be "json" (server will return an error frame).
json_schema is not applicable.

Event Types

The response is an SSE stream with data: JSON frames. Possible events:

Type	Payload Keys	Meaning
`sources`	`data: Array<Source>`	Aggregated sources
`images`	`data: Array<Image>`	Relevant images
`llm`	`content: string`	Partial text chunk
`usage`	`input_tokens: number`, `output_tokens: number`, `model_cost: number \| null`, `llmlayer_cost: number`	Token/cost summary
`done`	`response_time: string`	Completion
`error`	`error: string`	Error message (terminate)

Streaming Examples

const stream = client.streamAnswer({
query: 'Explain the history of the internet',
model: 'groq/llama-3.3-70b-versatile',
returnSources: true,
temperature: 0.5,
});

for await (const event of stream) {
switch (event.type) {
case 'llm':
process.stdout.write(String(event.content));
break;
case 'sources':
console.log('\nSources:', event.data);
break;
case 'images':
console.log('\nImages:', event.data?.length ?? 0);
break;
case 'usage':
const total = (event.model_cost ?? 0) + (event.llmlayer_cost ?? 0);
console.log('\nTokens:', event.input_tokens);
console.log('Total cost:', total);
break;
case 'done':
console.log('\nCompleted in', event.response_time, 'seconds');
break;
case 'error':
console.error('\nStream error:', event.error);
break;
}
}

Errors

The API returns a consistent error envelope:

{
  "detail": {
    "error_type": "validation_error",
    "error_code": "missing_query",
    "message": "Query parameter cannot be empty",
    "details": {"...": "optional context"}
  }
}

Common Error Codes

Authentication (401)

Validation (400)

Provider (429/401/500)

Internal (500)

Best Practices

import { LLMLayerClient, AuthenticationError, InvalidRequest, RateLimitError, ProviderError } from 'llmlayer';

const client = new LLMLayerClient({ apiKey: process.env.LLMLAYER_API_KEY! });

async function robustAnswer(query: string, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
return await client.answer({ query, model: 'openai/gpt-4o-mini' });
} catch (err) {
if (err instanceof AuthenticationError) throw err; // fix credentials
if (err instanceof RateLimitError) {
await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000));
continue;
}
if (err instanceof ProviderError) {
// fallback model
return await client.answer({ query, model: 'groq/llama-3.3-70b-versatile' });
}
throw err;
}
}
throw new Error('Max retries exceeded');
}

Performance & Quality Tips

Reduce Cost

Model selection

Use openai/gpt-4o-mini for simple tasks
Use budget models for exploration; reserve premium for high‑stakes

Query optimization

Keep max_queries=1 unless you need deeper coverage
Tune max_tokens to expected output length
Disable return_images unless required

Improve Speed

Streaming

Use /search_stream for better perceived latency

Context

Start with search_context_size="low" for simple questions
Use domain filters to focus search

Enhance Quality

Search

Use max_queries=2–3 for research tasks
Prefer search_context_size="high" for complex topics
Enable citations for verifiable content

LLM

Provide a clear system_prompt
Set temperature based on task (≈0.3 facts, ≈0.8 creative)

Scale Reliably

Resilience

Exponential backoff on rate limits
Fallback models on provider errors

Keys & billing

Use provider_key for high‑volume workloads
Track costs with model_cost/llmlayer_cost

Rate Limits & Quotas

Limits vary by plan. Each request cost components:

Base: $0.004 × max_queries
Images: +$0.001 if return_images=true
Model: provider token usage × provider pricing

Monitor usage via response fields:

model_cost, llmlayer_cost
input_tokens, output_tokens

Next Steps

Quickstart

Get up and running in minutes

Web Search API

Query verticals directly without LLM

Pricing

Estimate your costs

Need help? Join our Discord or email support@llmlayer.ai

Get Started

​Overview

Blocking Answer

Streaming Answer

​Authentication

​Quick Start

​Cost Model

​Supported Models & Pricing

​OpenAI

​Groq

​Anthropic

​DeepSeek

​Choosing a model

​Blocking Answer Endpoint

​Request Body

​Response Body

​Examples

​Basic (with sources)

​Structured JSON Output

​News with Citations & Domain Controls

​Use Your Own Provider Key

​Streaming Answer Endpoint

​Request Body Notes

​Event Types

​Streaming Examples

​Errors

​Common Error Codes

​Best Practices

​Performance & Quality Tips

Reduce Cost

Improve Speed

Enhance Quality

Scale Reliably

​Rate Limits & Quotas

​Next Steps

Quickstart

Web Search API

Pricing

Overview

Authentication

Quick Start

Cost Model

Supported Models & Pricing

OpenAI

Groq

Anthropic

DeepSeek

Choosing a model

Blocking Answer Endpoint

Request Body

Response Body

Examples

Basic (with sources)

Structured JSON Output

News with Citations & Domain Controls

Use Your Own Provider Key

Streaming Answer Endpoint

Request Body Notes

Event Types

Streaming Examples

Errors

Common Error Codes

Best Practices

Performance & Quality Tips

Rate Limits & Quotas

Next Steps