Quickstart

Sync or Async? LLMLayer supports both! Use search() for simple scripts, asearch() for async frameworks like FastAPI. All methods have sync and async versions.

Install the SDK

pip install llmlayer

Get your API key

Get your LLMLayer API key

Make your first search

from llmlayer import LLMLayerClient

client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

response = client.search(
    query="What happened in AI this week?",
    model="openai/o3"
)

print(response.llm_response)

Sync vs Async

LLMLayer supports both synchronous and asynchronous operations:

Method	Type	Description
`search()`	Sync	Blocking call that returns complete response
`search_stream()`	Sync	Returns generator for streaming responses
`asearch()`	Async	Non-blocking call for async/await workflows
`asearch_stream()`	Async	Returns async generator for streaming

Use async methods when making multiple searches in parallel or integrating with async frameworks like FastAPI.

Quick Comparison ( python sync vs async )

# Synchronous version
from llmlayer import LLMLayerClient

client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

# Blocking call - waits for response
response = client.search(
    query="What is quantum computing?",
    model="openai/o4-mini"
)

print(response.llm_response)

Your First Search

Here’s the complete example with all parameters:

from llmlayer import LLMLayerClient

# Initialize the client
client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx", # Your LLMLayer API key
)

# Make a search request
response = client.search(
    query="What are the latest developments in quantum computing?",
    model="openai/o4-mini", # or any model from the provider
    return_sources=True,
    max_tokens=1500,
    temperature=0.7,
    location='us',
    response_language='en',
)

# Print the response
print(response.llm_response)
print(f"\nTokens used: {response.input_tokens + response.output_tokens}")
print(f"Response time: {response.response_time}s")

# Print sources
for source in response.sources:
    print(f"- {source['title']}: {source['link']}")

Streaming Responses

LLMLayer supports both sync and async streaming for real-time output:

from llmlayer import LLMLayerClient

client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

# Sync streaming with search_stream()
for event in client.search_stream(
    query="Explain the latest SpaceX launch",
    model="anthropic/claude-4-sonnet-20241022",
    return_sources=True
):
    if event["type"] == "llm":
        # Streaming text chunks
        print(event["content"], end="", flush=True)
    elif event["type"] == "sources":
        # Sources array
        print("\n\nSources:", event["data"])
    elif event["type"] == 'usage':
        print("Usage INPUT:", event['input_tokens'])
        print("Usage OUTPUT:", event['output_tokens'])
        print("MODEL COST:", event['model_cost'])
        print("LLMLAYER COST:", event['llmlayer_cost'])

Async Operations

Use asearch() in python for non-blocking operations in async applications:

import asyncio
from llmlayer import LLMLayerClient

async def main():
    client = LLMLayerClient(
    api_key="llm_xxxxxxxxxxxxx",
)

# Async search with asearch()
response = await client.asearch(
    query="What are the latest AI breakthroughs?",
    model="openai/gpt-4o-mini",
    return_sources=True
)

print(response.llm_response)
print(f"Response time: {response.response_time}s")

# Run the async function
asyncio.run(main())

Domain-Filtered Search

# Search only trusted medical sources
response = client.search(
    query="COVID-19 vaccine efficacy studies 2025",
    model="groq/kimi-k2",
    domain_filter=["pubmed.gov", "nature.com", "nejm.org", "lancet.com"],
    return_sources=True,
    citations=True,
    date_filter="week"
)

# Financial data from specific sources
response = client.search(
    query="Federal Reserve interest rate predictions",
    model="groq/kimi-k2",
    domain_filter=["federalreserve.gov", "bloomberg.com", "reuters.com","-nytimes.com"], # exclude nytimes.com from results
    date_filter="month"
)

JSON Output Format

Prefer Premium Models for JSON, GPT-4.1-mini is a very good and consistent model

import json


list_films_schema= {
"films":[
        {
            "title": {"type": "string"},
            "release_year": {"type": "integer"},
            "director": {"type": "string"},
    }
  ]
}

response = client.search(
    query="what are the latest films of tom cruise",
    model="openai/gpt-4.1-mini",
    answer_type="json",
    json_schema=json.dumps(json_schema)
)

# Parse the structured response
data = response.llm_response
print(f"DATA: {data}")

Parameters Reference

Required Parameters

Parameter	Type	Description	Example
`query`	string	Your search question	`"Latest AI news"`
`model`	string	Model to use for processing	`"claude-sonnet-4-20250514"`

Optional Parameters

Parameter	Type	Default	Description
`return_sources`	boolean	`false`	Include source URLs in response
`return_images`	boolean	`false`	Include relevant images
`citations`	boolean	`false`	Add inline citations [1] in response
`max_tokens`	integer	`1500`	Maximum response length
`temperature`	float	`0.7`	Response creativity (0-1)
`search_context_size`	string	`medium`	Size of the search context to use for the LLM `low`, `medium`, `high`
`search_type`	string	`"general"`	`"general"` or `"news"`
`date_filter`	string	`"anytime"`	`"anytime"`, `"hour"`, `"day"`, `"week"`, `"month"`, `"year"`
`location`	string	`"us"`	Geographic bias for results
`response_language`	string	`"auto"`	Response language or `"auto"`
`domain_filter`	array	`null`	Limit to specific domains, e.g. `["pubmed.com", "nature.com"]` to include domains and add `"-"` to exclude domains, e.g. `["pubmed.com", "-nature.com"]` -
`max_queries`	integer	`1`	Number of search queries to generate ( a prompts can need 2 or more queries to be effective). each search query will be charged 0,005 $.
`system_prompt`	string	`null`	Override default system prompt of LLMLAYER
`answer_type`	string	`"markdown"`	`"markdown"`, `"html"`, or `"json"`
`json_schema`	string	`null`	Required when `answer_type="json"`
`provider_key`	string	`null`	You can bring your own api key for a specific model ( the provider of the selected model) , this is an option for advanced users that want to be charged directly by their model provider

Environment Variables

Set these to avoid hardcoding keys:

# LLMLayer API key (required)
export LLMLAYER_API_KEY="llm_xxxxxxxxxxxxx"

# Provider API keys ( optional, if you want to be charged directly by the provider )
export LLMLAYER_PROVIDER_KEY="sk-..."

Then use without explicit keys:

from llmlayer import LLMLayerClient

# Sync usage - reads all credentials from environment
client = LLMLayerClient()
response = client.search(
    query="Latest news",
    model="openai/gpt-4.1-mini"
)

Get Started

Sync vs Async

Quick Comparison ( python sync vs async )

Your First Search

Streaming Responses

Async Operations

Domain-Filtered Search

JSON Output Format

Parameters Reference

Required Parameters

Optional Parameters

Environment Variables

Next Steps

API Reference

Parameters

Rate Limits

Need Help?

Get Started

​Sync vs Async

​Quick Comparison ( python sync vs async )

​Your First Search

​Streaming Responses

​Async Operations

​Domain-Filtered Search

​JSON Output Format

​Parameters Reference

​Required Parameters

​Optional Parameters

​Environment Variables

​Next Steps

API Reference

Parameters

Rate Limits

​Need Help?

Sync vs Async

Quick Comparison ( python sync vs async )

Your First Search

Streaming Responses

Async Operations

Domain-Filtered Search

JSON Output Format

Parameters Reference

Required Parameters

Optional Parameters

Environment Variables

Next Steps

Need Help?