Overview
The Answer API provides two endpoints to fit your UX:Blocking Answer
POST /api/v1/search
Get the full answer in a single response. Perfect when you need the complete result before proceeding.Streaming Answer
POST /api/v1/search_stream
Stream partial tokens and metadata via SSE for chat UIs and progressive rendering.Authentication
All requests require a bearer token in theAuthorization
header:
Missing or invalid API keys return 401 with
error_code: "missing_llmlayer_api_key"
.Do not expose your LLMLayer API key in client‑side code. Call the API from your server (or proxy) and stream results to the browser.
Quick Start
Cost Model
Zero‑markup policy. Provider usage is passed through at cost. LLMLayer charges a small infrastructure fee per search.
model_cost
, llmlayer_cost
, and token counts.
Supported Models & Pricing
Prices are USD per 1M tokens (input/output). LLMLayer passes through provider pricing with no markup. Your total cost = provider usage + LLMLayer fee.
If a model is not supported by your account/region, the API returns 400 with
error_code: "invalid_model"
. Keep this table in sync with your allow‑list.OpenAI
Model | Input ($/M) | Output ($/M) | Best For |
---|---|---|---|
openai/gpt-5 | $1.25 | $10.00 | Complex reasoning & analysis |
openai/gpt-5-mini | $0.25 | $2.00 | Cost-effective reasoning |
openai/gpt-5-nano | $0.05 | $0.40 | Balanced performance |
openai/o3 | $2.00 | $8.00 | Complex reasoning & analysis |
openai/o3-mini | $1.10 | $4.40 | Cost-effective reasoning |
openai/o4-mini | $1.10 | $4.40 | Balanced performance |
openai/gpt-4.1 | $2.00 | $8.00 | Advanced tasks |
openai/gpt-4.1-mini | $0.40 | $1.60 | Efficient advanced tasks |
openai/gpt-4o | $2.50 | $10.00 | Multimodal & complex queries |
openai/gpt-4o-mini | $0.15 | $0.60 | Fast, affordable searches |
Groq
Model | Input ($/M) | Output ($/M) | Best For |
---|---|---|---|
groq/openai-gpt-oss-120b | $0.15 | $0.75 | High-performance search |
groq/openai-gpt-oss-20b | $0.10 | $0.50 | Budget-friendly quality |
groq/kimi-k2 | $1.00 | $3.00 | High-performance search |
groq/qwen3-32b | $0.29 | $0.59 | Budget-friendly quality |
groq/llama-3.3-70b-versatile | $0.59 | $0.79 | Versatile applications |
groq/deepseek-r1-distill-llama-70b | $0.75 | $0.99 | Deep reasoning tasks |
groq/llama-4-maverick-17b-128e-instruct | $0.20 | $0.60 | Fast, efficient searches |
Anthropic
Model | Input ($/M) | Output ($/M) | Best For |
---|---|---|---|
anthropic/claude-sonnet-4 | $3.00 | $15.00 | Highly creative writing & intelligent responses |
DeepSeek
Model | Input ($/M) | Output ($/M) | Best For |
---|---|---|---|
deepseek/deepseek-chat | $0.27 | $1.10 | General purpose chat |
deepseek/deepseek-reasoner | $0.55 | $2.19 | Complex reasoning |
Choosing a model
- Fast & economical:
openai/gpt-4o-mini
,groq/openai-gpt-oss-20b
- Balanced quality:
openai/gpt-4.1-mini
,groq/llama-3.3-70b-versatile
- Premium reasoning:
openai/gpt-5
,openai/o3
,anthropic/claude-sonnet-4
,deepseek/deepseek-reasoner
- Multimodal:
openai/gpt-4o
/openai/gpt-4o-mini
(text + vision)
Blocking Answer Endpoint
POST /api/v1/search
Generate a complete answer in one call. The server runs targeted web search, builds a context window, and asks your chosen model to answer with optional citations.
Request Body
Your question or instruction.Example:
"What are the latest developments in quantum computing?"
LLM id (e.g.
openai/gpt-4o-mini
, openai/gpt-4.1-mini
, anthropic/claude-sonnet-4
, groq/llama-3.3-70b-versatile
, deepseek/deepseek-reasoner
).If a model is unsupported, the API returns 400 with
error_code: "invalid_model"
.Your upstream provider key (OpenAI/Groq/Anthropic/DeepSeek). If provided, provider usage is billed to you, and
model_cost
will be null
in responses.Examples: sk-...
(OpenAI), sk-ant-...
(Anthropic)Market/geo bias for search (country code).
Override the default system prompt (non‑JSON answers). Useful for specialized tone/formatting.Example:
"You are a biomedical research assistant. Cite peer‑reviewed sources."
Output language. Use
auto
to infer from the query.Output format:
markdown
| html
| json
.If
answer_type="json"
, you must provide json_schema
. JSON output is not supported by the streaming endpoint.Search vertical:
general
or news
.Required when
answer_type="json"
. A JSON Schema string describing the expected output. (Client SDKs also accept an object and serialize it for you.)Embed inline markers like
[1]
in the answer body.Include aggregated source metadata in the response.
Return relevant images from search.
Adds $0.001 to LLMLayer cost when enabled.
Recency filter:
anytime
, hour
, day
, week
, month
, year
.Maximum LLM output tokens.
Sampling temperature (0.0—2.0). Lower is more factual; higher is more creative.
Include/exclude specific domains (use a leading
-
to exclude).Examples: ["nature.com", "-wikipedia.org"]
How many search sub‑queries to generate (1–5). Each adds $0.004 and may improve coverage.
How much context to feed the LLM:
low
| medium
| high
.Response Body
The generated answer. String for
markdown
/html
; object for json
output.Source documents (present when
return_sources=true
).Images (present when
return_images=true
).Processing time in seconds (e.g.,
"2.34"
).Total input tokens.
Total output tokens.
Provider cost in USD (null when using
provider_key
).LLMLayer cost in USD.
Examples
Basic (with sources)
Structured JSON Output
News with Citations & Domain Controls
Use Your Own Provider Key
Streaming Answer Endpoint
POST /api/v1/search_stream
Stream partial tokens and metadata via Server‑Sent Events.
Streaming does not support
answer_type="json"
. Use the blocking endpoint for structured JSON.Request Body Notes
All parameters from the blocking endpoint apply except:answer_type
must not be"json"
(server will return an error frame).json_schema
is not applicable.
Event Types
The response is an SSE stream withdata:
JSON frames. Possible events:
Type | Payload Keys | Meaning |
---|---|---|
sources | data: Array<Source> | Aggregated sources |
images | data: Array<Image> | Relevant images |
llm | content: string | Partial text chunk |
usage | input_tokens: number , output_tokens: number , model_cost: number | null , llmlayer_cost: number | Token/cost summary |
done | response_time: string | Completion |
error | error: string | Error message (terminate) |
Streaming Examples
Errors
The API returns a consistent error envelope:Common Error Codes
Authentication (401)
Authentication (401)
missing_llmlayer_api_key
— No API key provided{provider}_auth_error
— Provider authentication failed when usingprovider_key
Validation (400)
Validation (400)
missing_query
— Query parameter is emptymissing_model
— Model parameter is emptyinvalid_model
— Model is not supportedmissing_json_schema
— Required whenanswer_type="json"
invalid_search_parameters
— Invalid search configuration
Provider (429/401/500)
Provider (429/401/500)
{provider}_rate_limit
— Provider rate limit exceeded{provider}_error
— General provider errorprovider_error
— Upstream LLM provider issue
Internal (500)
Internal (500)
query_generation_error
— Failed to generate search queriessearch_context_error
— Failed to retrieve search resultsunexpected_error
— Unexpected server error
Best Practices
Performance & Quality Tips
Reduce Cost
Model selection
- Use
openai/gpt-4o-mini
for simple tasks - Use budget models for exploration; reserve premium for high‑stakes
- Keep
max_queries=1
unless you need deeper coverage - Tune
max_tokens
to expected output length - Disable
return_images
unless required
Improve Speed
Streaming
- Use
/search_stream
for better perceived latency
- Start with
search_context_size="low"
for simple questions - Use domain filters to focus search
Enhance Quality
Search
- Use
max_queries=2–3
for research tasks - Prefer
search_context_size="high"
for complex topics - Enable
citations
for verifiable content
- Provide a clear
system_prompt
- Set
temperature
based on task (≈0.3 facts, ≈0.8 creative)
Scale Reliably
Resilience
- Exponential backoff on rate limits
- Fallback models on provider errors
- Use
provider_key
for high‑volume workloads - Track costs with
model_cost
/llmlayer_cost
Rate Limits & Quotas
Limits vary by plan. Each request cost components:- Base:
$0.004 × max_queries
- Images:
+$0.001
ifreturn_images=true
- Model: provider token usage × provider pricing
model_cost
,llmlayer_cost
input_tokens
,output_tokens
Next Steps
Quickstart
Get up and running in minutes
Web Search API
Query verticals directly without LLM
Pricing
Estimate your costs
Need help? Join our Discord or email support@llmlayer.ai