Overview

The Answer API provides two endpoints to fit your UX:

Authentication

All requests require a bearer token in the Authorization header:
Authorization: Bearer YOUR_LLMLAYER_API_KEY
Missing or invalid API keys return 401 with error_code: "missing_llmlayer_api_key".
Do not expose your LLMLayer API key in client‑side code. Call the API from your server (or proxy) and stream results to the browser.

Quick Start

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({ apiKey: process.env.LLMLAYER_API_KEY! });

// Basic answer
const response = await client.answer({
query: 'What are the latest AI breakthroughs?',
model: 'openai/gpt-4o-mini'
});

console.log(response.llm_response); // Markdown by default

Cost Model

Zero‑markup policy. Provider usage is passed through at cost. LLMLayer charges a small infrastructure fee per search.
Total Cost = ($0.004 × max_queries) + (Input Tokens × Model Input Price) + (Output Tokens × Model Output Price) [+ $0.001 if return_images=true]
Use response fields to monitor cost: model_cost, llmlayer_cost, and token counts.

Supported Models & Pricing

Prices are USD per 1M tokens (input/output). LLMLayer passes through provider pricing with no markup. Your total cost = provider usage + LLMLayer fee.
If a model is not supported by your account/region, the API returns 400 with error_code: "invalid_model". Keep this table in sync with your allow‑list.

OpenAI

ModelInput ($/M)Output ($/M)Best For
openai/gpt-5$1.25$10.00Complex reasoning & analysis
openai/gpt-5-mini$0.25$2.00Cost-effective reasoning
openai/gpt-5-nano$0.05$0.40Balanced performance
openai/o3$2.00$8.00Complex reasoning & analysis
openai/o3-mini$1.10$4.40Cost-effective reasoning
openai/o4-mini$1.10$4.40Balanced performance
openai/gpt-4.1$2.00$8.00Advanced tasks
openai/gpt-4.1-mini$0.40$1.60Efficient advanced tasks
openai/gpt-4o$2.50$10.00Multimodal & complex queries
openai/gpt-4o-mini$0.15$0.60Fast, affordable searches

Groq

ModelInput ($/M)Output ($/M)Best For
groq/openai-gpt-oss-120b$0.15$0.75High-performance search
groq/openai-gpt-oss-20b$0.10$0.50Budget-friendly quality
groq/kimi-k2$1.00$3.00High-performance search
groq/qwen3-32b$0.29$0.59Budget-friendly quality
groq/llama-3.3-70b-versatile$0.59$0.79Versatile applications
groq/deepseek-r1-distill-llama-70b$0.75$0.99Deep reasoning tasks
groq/llama-4-maverick-17b-128e-instruct$0.20$0.60Fast, efficient searches

Anthropic

ModelInput ($/M)Output ($/M)Best For
anthropic/claude-sonnet-4$3.00$15.00Highly creative writing & intelligent responses

DeepSeek

ModelInput ($/M)Output ($/M)Best For
deepseek/deepseek-chat$0.27$1.10General purpose chat
deepseek/deepseek-reasoner$0.55$2.19Complex reasoning

Choosing a model

  • Fast & economical: openai/gpt-4o-mini, groq/openai-gpt-oss-20b
  • Balanced quality: openai/gpt-4.1-mini, groq/llama-3.3-70b-versatile
  • Premium reasoning: openai/gpt-5, openai/o3, anthropic/claude-sonnet-4, deepseek/deepseek-reasoner
  • Multimodal: openai/gpt-4o / openai/gpt-4o-mini (text + vision)

Blocking Answer Endpoint

POST /api/v1/search Generate a complete answer in one call. The server runs targeted web search, builds a context window, and asks your chosen model to answer with optional citations.

Request Body

query
string
required
Your question or instruction.Example: "What are the latest developments in quantum computing?"
model
string
required
LLM id (e.g. openai/gpt-4o-mini, openai/gpt-4.1-mini, anthropic/claude-sonnet-4, groq/llama-3.3-70b-versatile, deepseek/deepseek-reasoner).
If a model is unsupported, the API returns 400 with error_code: "invalid_model".
provider_key
string
Your upstream provider key (OpenAI/Groq/Anthropic/DeepSeek). If provided, provider usage is billed to you, and model_cost will be null in responses.Examples: sk-... (OpenAI), sk-ant-... (Anthropic)
location
string
default:"us"
Market/geo bias for search (country code).
system_prompt
string
Override the default system prompt (non‑JSON answers). Useful for specialized tone/formatting.Example: "You are a biomedical research assistant. Cite peer‑reviewed sources."
response_language
string
default:"auto"
Output language. Use auto to infer from the query.
answer_type
string
default:"markdown"
Output format: markdown | html | json.
If answer_type="json", you must provide json_schema. JSON output is not supported by the streaming endpoint.
search_type
string
default:"general"
Search vertical: general or news.
json_schema
string
Required when answer_type="json". A JSON Schema string describing the expected output. (Client SDKs also accept an object and serialize it for you.)
citations
boolean
default:"false"
Embed inline markers like [1] in the answer body.
return_sources
boolean
default:"false"
Include aggregated source metadata in the response.
return_images
boolean
default:"false"
Return relevant images from search.
Adds $0.001 to LLMLayer cost when enabled.
date_filter
string
default:"anytime"
Recency filter: anytime, hour, day, week, month, year.
max_tokens
integer
default:"1500"
Maximum LLM output tokens.
temperature
float
default:"0.7"
Sampling temperature (0.0—2.0). Lower is more factual; higher is more creative.
domain_filter
array
Include/exclude specific domains (use a leading - to exclude).Examples: ["nature.com", "-wikipedia.org"]
max_queries
integer
default:"1"
How many search sub‑queries to generate (1–5). Each adds $0.004 and may improve coverage.
search_context_size
string
default:"medium"
How much context to feed the LLM: low | medium | high.

Response Body

llm_response
string | object
The generated answer. String for markdown/html; object for json output.
sources
array
Source documents (present when return_sources=true).
images
array
Images (present when return_images=true).
response_time
string | number
Processing time in seconds (e.g., "2.34").
input_tokens
integer
Total input tokens.
output_tokens
integer
Total output tokens.
model_cost
number | null
Provider cost in USD (null when using provider_key).
llmlayer_cost
number
LLMLayer cost in USD.

Examples

Basic (with sources)

const resp = await client.answer({
query: 'Explain quantum computing in simple terms',
model: 'openai/gpt-4o-mini',
temperature: 0.7,
maxTokens: 1000,
returnSources: true,
});

console.log(resp.llm_response);
const total = (resp.model_cost ?? 0) + (resp.llmlayer_cost ?? 0);
console.log(`Total cost: $${total.toFixed(4)}`);

Structured JSON Output

const schema = {
type: 'object',
properties: {
topic: { type: 'string' },
key_concepts: { type: 'array', items: { type: 'string' } },
applications: {
type: 'array',
items: {
type: 'object',
properties: { name: { type: 'string' }, description: { type: 'string' } },
required: ['name', 'description']
}
}
},
required: ['topic', 'key_concepts', 'applications']
};

const resp = await client.answer({
query: 'Applications of machine learning in healthcare',
model: 'openai/gpt-4o',
answerType: 'json',
jsonSchema: schema,
maxQueries: 2,
searchContextSize: 'high',
});

const data = typeof resp.llm_response === 'string' ? JSON.parse(resp.llm_response) : resp.llm_response;
console.log(data.topic, data.key_concepts.length);

News with Citations & Domain Controls

const resp = await client.answer({
query: 'Latest developments in renewable energy',
model: 'anthropic/claude-sonnet-4',
searchType: 'news',
dateFilter: 'week',
citations: true,
returnSources: true,
returnImages: true,
maxQueries: 3,
domainFilter: ['reuters.com', 'bloomberg.com', '-reddit.com'],
systemPrompt: 'Focus on breakthroughs and policy changes',
});
console.log(resp.llm_response);

Use Your Own Provider Key

const resp = await client.answer({
query: 'Complex medical research query',
model: 'openai/gpt-4o',
provider_key: process.env.OPENAI_API_KEY!,
maxTokens: 4000,
domainFilter: ['pubmed.gov', 'nature.com', 'sciencedirect.com'],
searchContextSize: 'high',
});
console.log('LLMLayer cost:', resp.llmlayer_cost); // ~$0.004 × max_queries
console.log('Model cost:', resp.model_cost);       // null when using provider_key

Streaming Answer Endpoint

POST /api/v1/search_stream Stream partial tokens and metadata via Server‑Sent Events.
Streaming does not support answer_type="json". Use the blocking endpoint for structured JSON.

Request Body Notes

All parameters from the blocking endpoint apply except:
  • answer_type must not be "json" (server will return an error frame).
  • json_schema is not applicable.

Event Types

The response is an SSE stream with data: JSON frames. Possible events:
TypePayload KeysMeaning
sourcesdata: Array<Source>Aggregated sources
imagesdata: Array<Image>Relevant images
llmcontent: stringPartial text chunk
usageinput_tokens: number, output_tokens: number, model_cost: number | null, llmlayer_cost: numberToken/cost summary
doneresponse_time: stringCompletion
errorerror: stringError message (terminate)

Streaming Examples

const stream = client.streamAnswer({
query: 'Explain the history of the internet',
model: 'groq/llama-3.3-70b-versatile',
returnSources: true,
temperature: 0.5,
});

for await (const event of stream) {
switch (event.type) {
case 'llm':
process.stdout.write(String(event.content));
break;
case 'sources':
console.log('\nSources:', event.data);
break;
case 'images':
console.log('\nImages:', event.data?.length ?? 0);
break;
case 'usage':
const total = (event.model_cost ?? 0) + (event.llmlayer_cost ?? 0);
console.log('\nTokens:', event.input_tokens);
console.log('Total cost:', total);
break;
case 'done':
console.log('\nCompleted in', event.response_time, 'seconds');
break;
case 'error':
console.error('\nStream error:', event.error);
break;
}
}

Errors

The API returns a consistent error envelope:
{
  "detail": {
    "error_type": "validation_error",
    "error_code": "missing_query",
    "message": "Query parameter cannot be empty",
    "details": {"...": "optional context"}
  }
}

Common Error Codes

Best Practices

import { LLMLayerClient, AuthenticationError, InvalidRequest, RateLimitError, ProviderError } from 'llmlayer';

const client = new LLMLayerClient({ apiKey: process.env.LLMLAYER_API_KEY! });

async function robustAnswer(query: string, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
return await client.answer({ query, model: 'openai/gpt-4o-mini' });
} catch (err) {
if (err instanceof AuthenticationError) throw err; // fix credentials
if (err instanceof RateLimitError) {
await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000));
continue;
}
if (err instanceof ProviderError) {
// fallback model
return await client.answer({ query, model: 'groq/llama-3.3-70b-versatile' });
}
throw err;
}
}
throw new Error('Max retries exceeded');
}

Performance & Quality Tips

Reduce Cost

Model selection
  • Use openai/gpt-4o-mini for simple tasks
  • Use budget models for exploration; reserve premium for high‑stakes
Query optimization
  • Keep max_queries=1 unless you need deeper coverage
  • Tune max_tokens to expected output length
  • Disable return_images unless required

Improve Speed

Streaming
  • Use /search_stream for better perceived latency
Context
  • Start with search_context_size="low" for simple questions
  • Use domain filters to focus search

Enhance Quality

Search
  • Use max_queries=2–3 for research tasks
  • Prefer search_context_size="high" for complex topics
  • Enable citations for verifiable content
LLM
  • Provide a clear system_prompt
  • Set temperature based on task (≈0.3 facts, ≈0.8 creative)

Scale Reliably

Resilience
  • Exponential backoff on rate limits
  • Fallback models on provider errors
Keys & billing
  • Use provider_key for high‑volume workloads
  • Track costs with model_cost/llmlayer_cost

Rate Limits & Quotas

Limits vary by plan. Each request cost components:
  • Base: $0.004 × max_queries
  • Images: +$0.001 if return_images=true
  • Model: provider token usage × provider pricing
Monitor usage via response fields:
  • model_cost, llmlayer_cost
  • input_tokens, output_tokens

Next Steps

Need help? Join our Discord or email support@llmlayer.ai