LLM providers

One code path, fourteen providers. Most modern providers expose an OpenAI-compatible API, so the system supports them through a universal OpenAI-compatible adapter.

Audience: operators choosing and configuring an LLM backend. What you will accomplish: pick a provider, set the right env vars, and know when to go local vs cloud.

Native vs OpenAI-compatible

Native providers:

openai — GPT-4o, GPT-4o-mini, etc.
anthropic — Claude 3.5 Sonnet, Haiku, etc.
google — Gemini models (requires GOOGLE_API_KEY).

OpenAI-compatible (use the LLM_BASE_URL override):

ollama — local models (Llama, Mistral, etc.)
openrouter — route to 100+ models
together — Together AI
groq — Groq (also works natively)
deepseek — DeepSeek models
fireworks — Fireworks AI
mistral — Mistral AI
vllm — vLLM self-hosted
lmstudio — LM Studio local
llamacpp — llama.cpp local

All OpenAI-compatible providers use the same langchain_openai.ChatOpenAI client — just set LLM_BASE_URL to your endpoint. Local providers (Ollama, LM Studio, vLLM, llama.cpp) need no API key.

The LLM_BASE_URL override

For any OpenAI-compatible endpoint, point the adapter at it:

LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1     # Ollama
# LLM_BASE_URL=https://openrouter.ai/api/v1   # OpenRouter
# LLM_BASE_URL=https://api.together.xyz/v1    # Together AI
# LLM_BASE_URL=http://localhost:1234/v1       # LM Studio

Provider comparison

Provider	Type	Latency	Cost (per 1M tokens)	Best For	API Key
OpenAI	Cloud API	~1s	Input $0.15 / Output $0.60 (gpt-4o-mini)	General production use	`OPENAI_API_KEY`
Anthropic	Cloud API	~1.5s	Input $0.25 / Output $1.25 (claude-3.5-haiku)	Long-context reasoning, safety	`ANTHROPIC_API_KEY`
Google Gemini	Cloud API	~1s	Free tier: 15 RPM; Paid ~$0.075/1M (gemini-2.0-flash)	Cost-effective, multimodal	`GOOGLE_API_KEY`
Groq	Cloud API	~0.3s	Free tier available; Paid ~$0.05/1M	Fastest inference, real-time chat	`GROQ_API_KEY` + `LLM_BASE_URL`
DeepSeek	Cloud API	~2s	Input $0.14 / Output $0.28 (deepseek-chat)	Budget-friendly, strong coding	`OPENAI_API_KEY` + `LLM_BASE_URL`
Together	Cloud API	~1s	Varies by model (~$0.10–$0.80/1M)	Open-source model access	`OPENAI_API_KEY` + `LLM_BASE_URL`
Mistral	Cloud API	~1s	Input $0.10 / Output $0.30 (mistral-small)	European data compliance	`OPENAI_API_KEY` + `LLM_BASE_URL`
Fireworks	Cloud API	~0.5s	~$0.20/1M (open-source models)	Fast open-source inference	`OPENAI_API_KEY` + `LLM_BASE_URL`
OpenRouter	Cloud proxy	Varies	Varies by model + 5% surcharge	Single API for 100+ models	`OPENAI_API_KEY` + `LLM_BASE_URL`
Ollama	Local	~2–10s	Free (own hardware)	Full privacy, air-gapped, zero cost	None (local)
vLLM	Local	~1–5s	Free (own hardware)	High-throughput self-hosted	None (local)
LM Studio	Local	~2–10s	Free (own hardware)	Desktop dev/testing	None (local)
llama.cpp	Local	~3–15s	Free (own hardware)	Minimal hardware, CPU-only	None (local)

Verify your result

Verify: You set LLM_PROVIDER and, for compatible endpoints, LLM_BASE_URL.
Verify: Cloud providers have their required key; local providers need none.
Verify: You picked latency/cost trade-offs that match your workload.

Common failure modes

Pick a vector embedding backend in Embeddings.
See resilience behavior on transient provider errors in Deployment.
Full env reference in Configuration.

Rating answers Embeddings