Skip to main content

Overview

The Answer API combines live web search with LLM generation in one request. Use it when you need:
  • Current information from the web
  • Source-backed responses
  • Structured output (JSON) for downstream processing
  • Streaming UX for chat and copilots

Endpoints

EndpointMethodBest forSupports JSON output
/api/v2/answerPOSTStandard request/response flowsYes
/api/v2/answer_streamPOST (SSE)Real-time streaming UXNo
provider_key is deprecated and currently ignored by both endpoints. It is still accepted for backward compatibility.

Authentication

All requests require:
Authorization: Bearer YOUR_LLMLAYER_API_KEY
Keep API keys server-side. Do not expose them in browser code.

Model Selection

ModelPricing modelBest for
llmlayer-webFlat $0.007 × max_queriesDefault recommendation
llmlayer-fastFlat $0.009 × max_queriesFaster responses
openai/gpt-4o-miniToken pricing + LLMLayer feeBudget + quality
openai/gpt-5.1Token pricing + LLMLayer feeHighest reasoning quality
If unsure, start with llmlayer-web and max_queries=1.

Quickstart

Non-streaming (/answer)

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY,
});

const response = await client.answer({
  query: 'What are the latest developments in quantum computing?',
  model: 'llmlayer-web',
  returnSources: true,
});

console.log(response.answer);
console.log('Sources:', response.sources?.length || 0);
console.log('LLMLayer cost:', response.llmlayer_cost);
Example response (simplified):
{
  "answer": "...",
  "sources": [
    {
      "title": "...",
      "link": "https://...",
      "snippet": "..."
    }
  ],
  "response_time": "2.14",
  "input_tokens": 1432,
  "output_tokens": 288,
  "model_cost": null,
  "llmlayer_cost": 0.007
}

Streaming (/answer_stream)

import { LLMLayerClient } from 'llmlayer';

const client = new LLMLayerClient({
  apiKey: process.env.LLMLAYER_API_KEY,
});

const stream = client.streamAnswer({
  query: 'Explain retrieval-augmented generation in simple terms',
  model: 'openai/gpt-4o-mini',
  returnSources: true,
});

for await (const event of stream) {
  if (event.type === 'answer') {
    process.stdout.write(event.content || '');
  } else if (event.type === 'sources') {
    console.log('\n\nSources:', event.data?.length || 0);
  } else if (event.type === 'usage') {
    console.log('\nCost:', (event.model_cost || 0) + (event.llmlayer_cost || 0));
  } else if (event.type === 'done') {
    console.log('\nDone in', event.response_time, 'seconds');
  }
}

Request Parameters

This is the canonical request-body table for both endpoints.
ParameterTypeRequiredDefaultApplies toCost impactDetails
querystringYes-Both-User question/instruction
modelstringYes-BothDepends on modelExample: llmlayer-web, openai/gpt-4o-mini
search_typestringNogeneralBoth-general or news
date_filterstringNoanytimeBoth-anytime, hour, day, week, month, year
locationstringNousBoth-Country code for localized search
domain_filterstring[]NonullBoth-Include domains, or exclude with - prefix
search_context_sizestringNomediumBothIndirectlow, medium, high
max_queriesintegerNo1BothIncreases LLMLayer feeRange 1-4
max_tokensintegerNo1500BothIncreases model usageResponse length cap
temperaturenumberNo0.7Both-Range 0.0-2.0
response_languagestringNoautoBoth-Example: en, fr, es
citationsbooleanNofalseBothIndirectAdds inline citation markers
return_sourcesbooleanNofalseBoth-Includes sources array
return_imagesbooleanNofalseBoth+ $0.001Includes images array
answer_typestringNomarkdown/answer-markdown, html, json
json_schemaobject or stringConditionalnull/answer-Required when answer_type="json"
system_promptstringNonullBoth-Custom behavior instructions
provider_keystringNonullBoth-Deprecated, accepted, ignored
HTTP requests use snake_case field names. JavaScript SDK examples use camelCase.

Parameter Rules

  1. max_queries must be between 1 and 4.
  2. answer_type="json" requires json_schema.
  3. /api/v2/answer_stream does not support structured JSON output.
  4. provider_key does not change routing or billing.
  5. Use -domain.com in domain_filter to exclude domains.

Non-streaming Response Contract (/answer)

FieldTypeWhen presentDescription
answerstring | objectAlwaysGenerated answer (object when JSON mode)
sourcesarrayIf return_sources=trueSource documents used
imagesarrayIf return_images=trueImage search results
response_timestringAlwaysTotal processing time in seconds
input_tokensintegerAlwaysInput token usage
output_tokensintegerAlwaysOutput token usage
model_costnumber | nullUsuallyModel usage cost
llmlayer_costnumberAlwaysLLMLayer infrastructure cost
Source objects typically include title, link, snippet (plus provider-specific extras). Image objects typically include title, imageUrl, thumbnailUrl, source, link.

Streaming Event Contract (/answer_stream)

The stream is Server-Sent Events (text/event-stream). Each frame contains JSON under data:.
Event typePayloadNotes
sources{ "type": "sources", "data": Source[] }Emitted when return_sources=true
images{ "type": "images", "data": Image[] }Emitted when return_images=true
answer{ "type": "answer", "content": "..." }Main text chunks
usage{ "type": "usage", "input_tokens": ..., "output_tokens": ..., "model_cost": ..., "llmlayer_cost": ... }Billing and token usage
done{ "type": "done", "response_time": "..." }Final event
error{ "type": "error", "error": "..." }Runtime stream error
Example stream sequence:
{"type":"sources","data":[...]}
{"type":"answer","content":"The "}
{"type":"answer","content":"main idea ..."}
{"type":"usage","input_tokens":1234,"output_tokens":210,"model_cost":0.0002,"llmlayer_cost":0.004}
{"type":"done","response_time":"2.41"}
On early setup failures, some clients may receive an immediate frame like { "error": "missing_query" }.

Practical Examples

1) News summary with citations

{
  "query": "Latest developments in renewable energy",
  "model": "openai/gpt-5.1",
  "search_type": "news",
  "date_filter": "week",
  "citations": true,
  "return_sources": true,
  "max_queries": 2
}

2) Structured JSON extraction (/answer only)

{
  "query": "Summarize top AI model launches this month",
  "model": "openai/gpt-5.1",
  "answer_type": "json",
  "json_schema": {
    "type": "object",
    "properties": {
      "summary": { "type": "string" },
      "items": {
        "type": "array",
        "items": { "type": "string" }
      }
    },
    "required": ["summary", "items"]
  }
}

3) Domain-constrained answer

{
  "query": "What are current type 2 diabetes treatment guidelines?",
  "model": "openai/gpt-5.1",
  "domain_filter": ["pubmed.gov", "nih.gov", "-reddit.com"],
  "search_context_size": "high",
  "temperature": 0.3,
  "return_sources": true
}

Error Handling

/answer error format

{
  "detail": {
    "error_type": "validation_error",
    "error_code": "missing_query",
    "message": "Query parameter cannot be empty",
    "details": null
  }
}

Common status codes

StatusCategoryTypical reason
400ValidationMissing/invalid request parameters
401AuthenticationMissing or invalid LLMLayer API key
429Rate limit/providerProvider or account rate limiting
500Internal/providerUnexpected backend/provider failure
502ProviderUpstream provider-specific failure

/answer_stream errors

  • Runtime failures are emitted as stream events (type: error).
  • Early validation failures can appear as an immediate single error frame.

Pricing

Standard token-priced models

Total = (0.004 × max_queries)
      + model_input_cost
      + model_output_cost
      + (0.001 if return_images=true)

LLMLayer fixed-price models

llmlayer-web  = 0.007 × max_queries
llmlayer-fast = 0.009 × max_queries
Use llmlayer_cost and model_cost from responses as the billing source of truth.

Implementation Checklist

  1. Start with /answer unless you need progressive rendering.
  2. Set max_queries=1 first; increase only for research-style queries.
  3. Enable return_sources=true for trust-sensitive use cases.
  4. Use answer_type="json" + json_schema for structured pipelines.
  5. Add retry/backoff logic for transient 429/500/502 paths.
  6. Keep keys server-side and log llmlayer_cost + token usage.

FAQ

Use /answer_stream for chat UIs and live typing effects. Use /answer for batch jobs, strict request/response flows, and JSON structured output.
No. Streaming returns incremental text chunks and does not support JSON schema-constrained output.
Use domain_filter, search_type, date_filter, and search_context_size together. For factual tasks, use lower temperature.
It is accepted to avoid breaking older clients, but currently ignored.
Start with llmlayer-web, keep max_queries=1, only request images/sources when needed, and tune max_tokens to expected output length.

Next Steps

Web Search API

Raw search results without LLM generation

Scraper API

Extract full page content from URLs

Answer Stream Endpoint

OpenAPI reference for SSE endpoint

Python SDK

Python package and usage examples

TypeScript SDK

JS/TS package and usage examples

Need Help?

Discord Community

Ask implementation questions

Email Support