Sync or Async? LLMLayer supports both! Use search() for simple scripts, asearch() for async frameworks like FastAPI. All methods have sync and async versions.
1

Install the SDK

pip install llmlayer
2
3

Make your first search

from llmlayer import LLMLayerClient

client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

response = client.search(
    query="What happened in AI this week?",
    model="openai/o3"
)

print(response.llm_response)

Sync vs Async

LLMLayer supports both synchronous and asynchronous operations:
MethodTypeDescription
search()SyncBlocking call that returns complete response
search_stream()SyncReturns generator for streaming responses
asearch()AsyncNon-blocking call for async/await workflows
asearch_stream()AsyncReturns async generator for streaming
Use async methods when making multiple searches in parallel or integrating with async frameworks like FastAPI.

Quick Comparison ( python sync vs async )

# Synchronous version
from llmlayer import LLMLayerClient

client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

# Blocking call - waits for response
response = client.search(
    query="What is quantum computing?",
    model="openai/o4-mini"
)

print(response.llm_response)
Here’s the complete example with all parameters:
from llmlayer import LLMLayerClient

# Initialize the client
client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx", # Your LLMLayer API key
)

# Make a search request
response = client.search(
    query="What are the latest developments in quantum computing?",
    model="openai/o4-mini", # or any model from the provider
    return_sources=True,
    max_tokens=1500,
    temperature=0.7,
    location='us',
    response_language='en',
)

# Print the response
print(response.llm_response)
print(f"\nTokens used: {response.input_tokens + response.output_tokens}")
print(f"Response time: {response.response_time}s")

# Print sources
for source in response.sources:
    print(f"- {source['title']}: {source['link']}")

Streaming Responses

LLMLayer supports both sync and async streaming for real-time output:
from llmlayer import LLMLayerClient

client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

# Sync streaming with search_stream()
for event in client.search_stream(
    query="Explain the latest SpaceX launch",
    model="anthropic/claude-4-sonnet-20241022",
    return_sources=True
):
    if event["type"] == "llm":
        # Streaming text chunks
        print(event["content"], end="", flush=True)
    elif event["type"] == "sources":
        # Sources array
        print("\n\nSources:", event["data"])
    elif event["type"] == 'usage':
        print("Usage INPUT:", event['input_tokens'])
        print("Usage OUTPUT:", event['output_tokens'])
        print("MODEL COST:", event['model_cost'])
        print("LLMLAYER COST:", event['llmlayer_cost'])

Async Operations

Use asearch() in python for non-blocking operations in async applications:
import asyncio
from llmlayer import LLMLayerClient

async def main():
    client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

# Async search with asearch()
response = await client.asearch(
    query="What are the latest AI breakthroughs?",
    model="openai/gpt-4o-mini",
    return_sources=True
)

print(response.llm_response)
print(f"Response time: {response.response_time}s")

# Run the async function
asyncio.run(main())
# Search only trusted medical sources
response = client.search(
    query="COVID-19 vaccine efficacy studies 2025",
    model="groq/kimi-k2",
    domain_filter=["pubmed.gov", "nature.com", "nejm.org", "lancet.com"],
    return_sources=True,
    citations=True,
    date_filter="week"
)

# Financial data from specific sources
response = client.search(
    query="Federal Reserve interest rate predictions",
    model="groq/kimi-k2",
    domain_filter=["federalreserve.gov", "bloomberg.com", "reuters.com","-nytimes.com"], # exclude nytimes.com from results
    date_filter="month"
)

JSON Output Format

Prefer Premium Models for JSON, GPT-4.1-mini is a very good and consistent model
import json


list_films_schema= {
"films":[
        {
            "title": {"type": "string"},
            "release_year": {"type": "integer"},
            "director": {"type": "string"},
    }
  ]
}

response = client.search(
    query="what are the latest films of tom cruise",
    model="openai/gpt-4.1-mini",
    answer_type="json",
    json_schema=json.dumps(json_schema)
)

# Parse the structured response
data = response.llm_response
print(f"DATA: {data}")

Parameters Reference

Required Parameters

ParameterTypeDescriptionExample
querystringYour search question"Latest AI news"
modelstringModel to use for processing"claude-sonnet-4-20250514"

Optional Parameters

ParameterTypeDefaultDescription
return_sourcesbooleanfalseInclude source URLs in response
return_imagesbooleanfalseInclude relevant images
citationsbooleanfalseAdd inline citations [1] in response
max_tokensinteger1500Maximum response length
temperaturefloat0.7Response creativity (0-1)
search_context_sizestringmediumSize of the search context to use for the LLM low, medium, high
search_typestring"general""general" or "news"
date_filterstring"anytime""anytime", "hour", "day", "week", "month", "year"
locationstring"us"Geographic bias for results
response_languagestring"auto"Response language or "auto"
domain_filterarraynullLimit to specific domains, e.g. ["pubmed.com", "nature.com"] to include domains and add "-" to exclude domains, e.g. ["pubmed.com", "-nature.com"] -
max_queriesinteger1Number of search queries to generate ( a prompts can need 2 or more queries to be effective). each search query will be charged 0,005 $.
system_promptstringnullOverride default system prompt of LLMLAYER
answer_typestring"markdown""markdown", "html", or "json"
json_schemastringnullRequired when answer_type="json"
provider_keystringnullYou can bring your own api key for a specific model ( the provider of the selected model) , this is an option for advanced users that want to be charged directly by their model provider

Environment Variables

Set these to avoid hardcoding keys:
# LLMLayer API key (required)
export LLMLAYER_API_KEY="llm_xxxxxxxxxxxxx"

# Provider API keys ( optional, if you want to be charged directly by the provider )
export LLMLAYER_PROVIDER_KEY="sk-..."

Then use without explicit keys:
from llmlayer import LLMLayerClient

# Sync usage - reads all credentials from environment
client = LLMLayerClient()
response = client.search(
    query="Latest news",
    model="openai/gpt-4.1-mini"
)

Next Steps

Need Help?