LLM providers
One code path, fourteen providers. Most modern providers expose an OpenAI-compatible API, so the system supports them through a universal OpenAI-compatible adapter.
Audience: operators choosing and configuring an LLM backend. What you will accomplish: pick a provider, set the right env vars, and know when to go local vs cloud.
Native vs OpenAI-compatible
Native providers:
openai— GPT-4o, GPT-4o-mini, etc.anthropic— Claude 3.5 Sonnet, Haiku, etc.google— Gemini models (requiresGOOGLE_API_KEY).
OpenAI-compatible (use the LLM_BASE_URL override):
ollama— local models (Llama, Mistral, etc.)openrouter— route to 100+ modelstogether— Together AIgroq— Groq (also works natively)deepseek— DeepSeek modelsfireworks— Fireworks AImistral— Mistral AIvllm— vLLM self-hostedlmstudio— LM Studio localllamacpp— llama.cpp local
All OpenAI-compatible providers use the same langchain_openai.ChatOpenAI client — just
set LLM_BASE_URL to your endpoint. Local providers (Ollama, LM Studio, vLLM, llama.cpp)
need no API key.
The LLM_BASE_URL override
For any OpenAI-compatible endpoint, point the adapter at it:
LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1 # Ollama
# LLM_BASE_URL=https://openrouter.ai/api/v1 # OpenRouter
# LLM_BASE_URL=https://api.together.xyz/v1 # Together AI
# LLM_BASE_URL=http://localhost:1234/v1 # LM StudioProvider comparison
| Provider | Type | Latency | Cost (per 1M tokens) | Best For | API Key |
|---|---|---|---|---|---|
| OpenAI | Cloud API | ~1s | Input $0.15 / Output $0.60 (gpt-4o-mini) | General production use | OPENAI_API_KEY |
| Anthropic | Cloud API | ~1.5s | Input $0.25 / Output $1.25 (claude-3.5-haiku) | Long-context reasoning, safety | ANTHROPIC_API_KEY |
| Google Gemini | Cloud API | ~1s | Free tier: 15 RPM; Paid ~$0.075/1M (gemini-2.0-flash) | Cost-effective, multimodal | GOOGLE_API_KEY |
| Groq | Cloud API | ~0.3s | Free tier available; Paid ~$0.05/1M | Fastest inference, real-time chat | GROQ_API_KEY + LLM_BASE_URL |
| DeepSeek | Cloud API | ~2s | Input $0.14 / Output $0.28 (deepseek-chat) | Budget-friendly, strong coding | OPENAI_API_KEY + LLM_BASE_URL |
| Together | Cloud API | ~1s | Varies by model (~$0.10–$0.80/1M) | Open-source model access | OPENAI_API_KEY + LLM_BASE_URL |
| Mistral | Cloud API | ~1s | Input $0.10 / Output $0.30 (mistral-small) | European data compliance | OPENAI_API_KEY + LLM_BASE_URL |
| Fireworks | Cloud API | ~0.5s | ~$0.20/1M (open-source models) | Fast open-source inference | OPENAI_API_KEY + LLM_BASE_URL |
| OpenRouter | Cloud proxy | Varies | Varies by model + 5% surcharge | Single API for 100+ models | OPENAI_API_KEY + LLM_BASE_URL |
| Ollama | Local | ~2–10s | Free (own hardware) | Full privacy, air-gapped, zero cost | None (local) |
| vLLM | Local | ~1–5s | Free (own hardware) | High-throughput self-hosted | None (local) |
| LM Studio | Local | ~2–10s | Free (own hardware) | Desktop dev/testing | None (local) |
| llama.cpp | Local | ~3–15s | Free (own hardware) | Minimal hardware, CPU-only | None (local) |
Verify your result
- Verify: You set
LLM_PROVIDERand, for compatible endpoints,LLM_BASE_URL. - Verify: Cloud providers have their required key; local providers need none.
- Verify: You picked latency/cost trade-offs that match your workload.
Common failure modes
Related next steps
- Pick a vector embedding backend in Embeddings.
- See resilience behavior on transient provider errors in Deployment.
- Full env reference in Configuration.