Configuration
Every environment variable, with its real default and purpose. Grouped to match
.env.example. Copy it to .env and override what you need — see
Installation.
Audience: operators configuring the service. What you will accomplish: set each variable with confidence.
LLM
| Variable | Default | Purpose |
|---|---|---|
LLM_PROVIDER | openai | Provider: openai, anthropic, google, groq, ollama, openrouter, together, deepseek, mistral, vllm, lmstudio, llamacpp. |
LLM_MODEL | gpt-4o-mini | Model name for the chosen provider. |
LLM_BASE_URL | (empty) | Override base URL for OpenAI-compatible endpoints (Ollama, OpenRouter, Together, LM Studio, …). |
OPENAI_API_KEY | your_openai_api_key_here | OpenAI key (also reused by several OpenAI-compatible providers with LLM_BASE_URL). |
ANTHROPIC_API_KEY | (empty) | Anthropic key (native provider). |
GOOGLE_API_KEY | (empty) | Google Gemini key (native provider). |
GROQ_API_KEY | (empty) | Groq key. |
See LLM providers.
Embeddings
| Variable | Default | Purpose |
|---|---|---|
EMBEDDING_PROVIDER | openai | openai, fastembed (ONNX, ~50MB, zero CVEs), or huggingface (torch). |
EMBEDDING_MODEL | text-embedding-3-small | Model for the chosen provider. Changing it requires re-ingesting. |
See Embeddings.
Redis
| Variable | Default | Purpose |
|---|---|---|
REDIS_HOST | localhost | Redis hostname. |
REDIS_PORT | 6379 | Redis port. |
REDIS_PASSWORD | (empty) | Redis password (if required). |
REDIS_TTL_SECONDS | 86400 | TTL for conversation memory keys. |
Vector DB (ChromaDB)
| Variable | Default | Purpose |
|---|---|---|
CHROMA_PERSIST_DIR | ./chroma_db | On-disk persistence directory for ChromaDB. |
CHROMA_COLLECTION | policies | Primary document collection name. |
Retrieval
| Variable | Default | Purpose |
|---|---|---|
RETRIEVAL_SCORE_THRESHOLD | 0.3 | Minimum cosine similarity for the relevance gate. Raise to 0.7 for stricter grounding. |
RETRIEVAL_STRATEGY | mmr | mmr (default), hybrid (dense + BM25 via RRF), or hybrid_rerank (reranker integration point). |
See Retrieval.
Query rewrite
| Variable | Default | Purpose |
|---|---|---|
QUERY_REWRITE_ENABLED | true | Condense multi-turn follow-ups into a standalone search query before retrieval (skipped on the first turn). |
Groundedness
| Variable | Default | Purpose |
|---|---|---|
GROUNDEDNESS_ENABLED | true | Verify the answer is supported by retrieved chunks; expose meta.grounded. |
GROUNDEDNESS_MODE | heuristic | heuristic (no extra LLM call) or llm (JSON judge). |
GROUNDEDNESS_MIN_SCORE | 0.5 | Fraction of answer sentences that must be supported. |
STRICT_REFUSE_ON_UNGROUNDED | true | In strict mode, refuse an unsupported answer and clear its sources. |
See Groundedness.
Chat mode / self-ingest
| Variable | Default | Purpose |
|---|---|---|
CHAT_MODE | strict | strict, open, learning, or learning_review (server default; clients can override per request). |
SELF_INGEST_MIN_LENGTH | 50 | Minimum answer length (chars) required for auto-ingest in learning modes. |
See Chat modes.
Resilience
| Variable | Default | Purpose |
|---|---|---|
PROVIDER_MAX_RETRIES | 3 | Retries for transient LLM/embedding failures. |
PROVIDER_RETRY_BASE_DELAY | 0.5 | Seconds; exponential backoff base. |
CIRCUIT_BREAKER_ENABLED | true | Enable the in-process circuit breaker. |
CB_FAILURE_THRESHOLD | 5 | Consecutive failures before the circuit opens. |
CB_RESET_SECONDS | 30 | Cool-down before a half-open trial call. |
Persona / branding
| Variable | Default | Purpose |
|---|---|---|
ASSISTANT_NAME | our company | Assistant identity used in prompts (default reproduces the original copy). |
KNOWLEDGE_DOMAIN | (empty) | Optional domain framing, e.g. “returns & shipping policy”. Empty adds none. |
ESCALATION_MESSAGE | Please contact support. | Refusal/escalation copy. |
Guardrails
| Variable | Default | Purpose |
|---|---|---|
GUARDRAILS_ENABLED | true | Master switch for input/output guardrails. |
GUARDRAILS_BLOCK_INJECTION | true | Reject likely prompt-injection / jailbreak inputs (returns 400). |
GUARDRAILS_MASK_PII | false | Mask emails / phone / card-like digit runs in output. |
GUARDRAILS_MAX_ANSWER_CHARS | 4000 | Hard cap on answer length; 0 disables the cap. |
Ingest
| Variable | Default | Purpose |
|---|---|---|
MAX_FILE_SIZE_MB | 50 | Maximum download/upload size. |
DOWNLOAD_TIMEOUT_SECONDS | 120 | Timeout for URL document downloads. |
INGEST_MODE | inline | inline (in-process BackgroundTasks) or queue (Redis-backed durable worker). |
INGEST_MAX_ATTEMPTS | 3 | Retry attempts for queued ingest jobs. |
INGEST_INCOMING_DIR | ./ingest_incoming | Shared staging dir for uploads in queue mode (must be a shared volume). |
See Deployment.
App / Logging
| Variable | Default | Purpose |
|---|---|---|
DEBUG | false | Debug mode. |
LOG_LEVEL | INFO | Logging level. |
LOG_FORMAT | text | text (default) or json for log aggregators (Datadog/CloudWatch/ELK). |
CORS_ORIGINS | [] | Allowed cross-origin origins. Default deny; never use ["*"] in production. |
See Observability.
Security
| Variable | Default | Purpose |
|---|---|---|
API_KEY | (empty) | API key protecting ingest/review endpoints. Empty skips auth (dev mode); DELETE always requires it when set. |
REQUIRE_AUTH_FOR_INGEST | false | Require the key on non-DELETE ingest/review/feedback-list endpoints. |
TRUSTED_PROXIES | [] | CIDR ranges of trusted proxies for real-client-IP extraction in rate limiting. |
ALLOWED_HOSTS | ["*"] | SSRF allowlist for download hosts; ["*"] allows any public host (private IPs always blocked). |
See Security.
Verify your result
- Verify: Your
.envsetsLLM_PROVIDER+ the matching key. - Verify: Production sets
API_KEY,CORS_ORIGINS, andTRUSTED_PROXIESexplicitly. - Verify: You re-ingest documents whenever
EMBEDDING_MODELchanges.
Related next steps
- Apply these in Installation.
- Harden the security-relevant ones in Security.
- See the endpoints they affect in the API summary.