ProvidersLLM providers

LLM providers

One code path, fourteen providers. Most modern providers expose an OpenAI-compatible API, so the system supports them through a universal OpenAI-compatible adapter.

Audience: operators choosing and configuring an LLM backend. What you will accomplish: pick a provider, set the right env vars, and know when to go local vs cloud.

Native vs OpenAI-compatible

Native providers:

  • openai — GPT-4o, GPT-4o-mini, etc.
  • anthropic — Claude 3.5 Sonnet, Haiku, etc.
  • google — Gemini models (requires GOOGLE_API_KEY).

OpenAI-compatible (use the LLM_BASE_URL override):

  • ollama — local models (Llama, Mistral, etc.)
  • openrouter — route to 100+ models
  • together — Together AI
  • groq — Groq (also works natively)
  • deepseek — DeepSeek models
  • fireworks — Fireworks AI
  • mistral — Mistral AI
  • vllm — vLLM self-hosted
  • lmstudio — LM Studio local
  • llamacpp — llama.cpp local

All OpenAI-compatible providers use the same langchain_openai.ChatOpenAI client — just set LLM_BASE_URL to your endpoint. Local providers (Ollama, LM Studio, vLLM, llama.cpp) need no API key.

The LLM_BASE_URL override

For any OpenAI-compatible endpoint, point the adapter at it:

LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1     # Ollama
# LLM_BASE_URL=https://openrouter.ai/api/v1   # OpenRouter
# LLM_BASE_URL=https://api.together.xyz/v1    # Together AI
# LLM_BASE_URL=http://localhost:1234/v1       # LM Studio

Provider comparison

ProviderTypeLatencyCost (per 1M tokens)Best ForAPI Key
OpenAICloud API~1sInput $0.15 / Output $0.60 (gpt-4o-mini)General production useOPENAI_API_KEY
AnthropicCloud API~1.5sInput $0.25 / Output $1.25 (claude-3.5-haiku)Long-context reasoning, safetyANTHROPIC_API_KEY
Google GeminiCloud API~1sFree tier: 15 RPM; Paid ~$0.075/1M (gemini-2.0-flash)Cost-effective, multimodalGOOGLE_API_KEY
GroqCloud API~0.3sFree tier available; Paid ~$0.05/1MFastest inference, real-time chatGROQ_API_KEY + LLM_BASE_URL
DeepSeekCloud API~2sInput $0.14 / Output $0.28 (deepseek-chat)Budget-friendly, strong codingOPENAI_API_KEY + LLM_BASE_URL
TogetherCloud API~1sVaries by model (~$0.10–$0.80/1M)Open-source model accessOPENAI_API_KEY + LLM_BASE_URL
MistralCloud API~1sInput $0.10 / Output $0.30 (mistral-small)European data complianceOPENAI_API_KEY + LLM_BASE_URL
FireworksCloud API~0.5s~$0.20/1M (open-source models)Fast open-source inferenceOPENAI_API_KEY + LLM_BASE_URL
OpenRouterCloud proxyVariesVaries by model + 5% surchargeSingle API for 100+ modelsOPENAI_API_KEY + LLM_BASE_URL
OllamaLocal~2–10sFree (own hardware)Full privacy, air-gapped, zero costNone (local)
vLLMLocal~1–5sFree (own hardware)High-throughput self-hostedNone (local)
LM StudioLocal~2–10sFree (own hardware)Desktop dev/testingNone (local)
llama.cppLocal~3–15sFree (own hardware)Minimal hardware, CPU-onlyNone (local)

Verify your result

  • Verify: You set LLM_PROVIDER and, for compatible endpoints, LLM_BASE_URL.
  • Verify: Cloud providers have their required key; local providers need none.
  • Verify: You picked latency/cost trade-offs that match your workload.

Common failure modes