Sync or Async? LLMLayer supports both! Use search() for simple scripts, asearch() for async frameworks like
FastAPI. All methods have sync and async versions.
Make your first search
from llmlayer import LLMLayerClient
client = LLMLayerClient(
api_key="llm_xxxxxxxxxxxxx",
)
response = client.search(
query="What happened in AI this week?",
model="openai/o3"
)
print(response.llm_response)
Sync vs Async
LLMLayer supports both synchronous and asynchronous operations:
| Method | Type | Description |
|---|
search() | Sync | Blocking call that returns complete response |
search_stream() | Sync | Returns generator for streaming responses |
asearch() | Async | Non-blocking call for async/await workflows |
asearch_stream() | Async | Returns async generator for streaming |
Use async methods when making multiple searches in parallel or integrating with async frameworks like FastAPI.
Quick Comparison ( python sync vs async )
# Synchronous version
from llmlayer import LLMLayerClient
client = LLMLayerClient(
api_key="llm_xxxxxxxxxxxxx",
)
# Blocking call - waits for response
response = client.search(
query="What is quantum computing?",
model="openai/o4-mini"
)
print(response.llm_response)
Your First Search
Here’s the complete example with all parameters:
from llmlayer import LLMLayerClient
# Initialize the client
client = LLMLayerClient(
api_key="llm_xxxxxxxxxxxxx", # Your LLMLayer API key
)
# Make a search request
response = client.search(
query="What are the latest developments in quantum computing?",
model="openai/o4-mini", # or any model from the provider
return_sources=True,
max_tokens=1500,
temperature=0.7,
location='us',
response_language='en',
)
# Print the response
print(response.llm_response)
print(f"\nTokens used: {response.input_tokens + response.output_tokens}")
print(f"Response time: {response.response_time}s")
# Print sources
for source in response.sources:
print(f"- {source['title']}: {source['link']}")
Streaming Responses
LLMLayer supports both sync and async streaming for real-time output:
from llmlayer import LLMLayerClient
client = LLMLayerClient(
api_key="llm_xxxxxxxxxxxxx",
)
# Sync streaming with search_stream()
for event in client.search_stream(
query="Explain the latest SpaceX launch",
model="anthropic/claude-4-sonnet-20241022",
return_sources=True
):
if event["type"] == "llm":
# Streaming text chunks
print(event["content"], end="", flush=True)
elif event["type"] == "sources":
# Sources array
print("\n\nSources:", event["data"])
elif event["type"] == 'usage':
print("Usage INPUT:", event['input_tokens'])
print("Usage OUTPUT:", event['output_tokens'])
print("MODEL COST:", event['model_cost'])
print("LLMLAYER COST:", event['llmlayer_cost'])
Async Operations
Use asearch() in python for non-blocking operations in async applications:
import asyncio
from llmlayer import LLMLayerClient
async def main():
client = LLMLayerClient(
api_key="llm_xxxxxxxxxxxxx",
)
# Async search with asearch()
response = await client.asearch(
query="What are the latest AI breakthroughs?",
model="openai/gpt-4o-mini",
return_sources=True
)
print(response.llm_response)
print(f"Response time: {response.response_time}s")
# Run the async function
asyncio.run(main())
Domain-Filtered Search
# Search only trusted medical sources
response = client.search(
query="COVID-19 vaccine efficacy studies 2025",
model="groq/kimi-k2",
domain_filter=["pubmed.gov", "nature.com", "nejm.org", "lancet.com"],
return_sources=True,
citations=True,
date_filter="week"
)
# Financial data from specific sources
response = client.search(
query="Federal Reserve interest rate predictions",
model="groq/kimi-k2",
domain_filter=["federalreserve.gov", "bloomberg.com", "reuters.com","-nytimes.com"], # exclude nytimes.com from results
date_filter="month"
)
Prefer Premium Models for JSON, GPT-4.1-mini is a very good and consistent model
import json
list_films_schema= {
"films":[
{
"title": {"type": "string"},
"release_year": {"type": "integer"},
"director": {"type": "string"},
}
]
}
response = client.search(
query="what are the latest films of tom cruise",
model="openai/gpt-4.1-mini",
answer_type="json",
json_schema=json.dumps(json_schema)
)
# Parse the structured response
data = response.llm_response
print(f"DATA: {data}")
Parameters Reference
Required Parameters
| Parameter | Type | Description | Example |
|---|
query | string | Your search question | "Latest AI news" |
model | string | Model to use for processing | "claude-sonnet-4-20250514" |
Optional Parameters
| Parameter | Type | Default | Description |
|---|
return_sources | boolean | false | Include source URLs in response |
return_images | boolean | false | Include relevant images |
citations | boolean | false | Add inline citations [1] in response |
max_tokens | integer | 1500 | Maximum response length |
temperature | float | 0.7 | Response creativity (0-1) |
search_context_size | string | medium | Size of the search context to use for the LLM low, medium, high |
search_type | string | "general" | "general" or "news" |
date_filter | string | "anytime" | "anytime", "hour", "day", "week", "month", "year" |
location | string | "us" | Geographic bias for results |
response_language | string | "auto" | Response language or "auto" |
domain_filter | array | null | Limit to specific domains, e.g. ["pubmed.com", "nature.com"] to include domains and add "-" to exclude domains, e.g. ["pubmed.com", "-nature.com"] - |
max_queries | integer | 1 | Number of search queries to generate ( a prompts can need 2 or more queries to be effective). each search query will be charged 0,005 $. |
system_prompt | string | null | Override default system prompt of LLMLAYER |
answer_type | string | "markdown" | "markdown", "html", or "json" |
json_schema | string | null | Required when answer_type="json" |
provider_key | string | null | You can bring your own api key for a specific model ( the provider of the selected model) , this is an option for advanced users that want to be charged directly by their model provider |
Environment Variables
Set these to avoid hardcoding keys:
# LLMLayer API key (required)
export LLMLAYER_API_KEY="llm_xxxxxxxxxxxxx"
# Provider API keys ( optional, if you want to be charged directly by the provider )
export LLMLAYER_PROVIDER_KEY="sk-..."
Then use without explicit keys:
from llmlayer import LLMLayerClient
# Sync usage - reads all credentials from environment
client = LLMLayerClient()
response = client.search(
query="Latest news",
model="openai/gpt-4.1-mini"
)
Next Steps
Need Help?