Deployment
Run the service reliably in production: the compose topology, durable ingestion, and how to scale.
Audience: operators running the backend. What you will accomplish: choose an ingestion mode, scale workers, and serve streaming through a proxy.
Compose topology (api + worker + redis)
docker-compose.yml runs three services:
- api — the FastAPI app serving
/api/v1. - worker —
python -m ingest.worker, the durable ingest consumer. - redis — memory, rate limiting, the ingest registry, and the durable ingest queue.
The cloud compose file sets INGEST_MODE=queue so ingestion survives API restarts. For a
single-process setup, set INGEST_MODE=inline (the code default) and drop the worker. For
a zero-cloud stack, see docker-compose.local.yml in Installation.
Durable ingestion (INGEST_MODE)
INGEST_MODE controls how queued documents are processed:
| Mode | Behavior |
|---|---|
inline (default) | Processed in-process via FastAPI BackgroundTasks. Simple; suited to small deployments. Ingestion does not survive an API restart mid-job. |
queue | Jobs are enqueued onto a Redis list consumed by python -m ingest.worker. Survives API restarts and retries transient failures. |
Key properties of queue mode:
- Retries — failed jobs are retried up to
INGEST_MAX_ATTEMPTS(default3). - Idempotency — ingestion is incremental and content-hashed, so re-running a job does not duplicate chunks; unchanged content is skipped.
- Shared upload staging — uploaded files are staged to
INGEST_INCOMING_DIR(default./ingest_incoming), which must be a volume shared between the api and worker containers in queue mode.
# Run the durable worker (queue mode)
python -m ingest.workerIf it fails: If uploads never process, confirm INGEST_INCOMING_DIR is a volume shared by both the api and worker containers.
Scaling
- Redis-shared state — memory, rate-limit counters, and queues live in Redis, so multiple API replicas share state correctly behind a load balancer.
- More API workers — run Uvicorn/Gunicorn with
--workers Nto use multiple processes. - More ingest throughput — run additional
ingest.workerprocesses against the same Redis queue.
Verify your result
- Verify: You chose
inline(single process) orqueue(durable worker) deliberately. - Verify: In queue mode,
INGEST_INCOMING_DIRis a shared volume and the worker is running. - Verify: Multiple replicas share Redis state; SSE buffering is disabled at the proxy.
Common failure modes
Related next steps
- Harden the deployment in Security.
- Trace requests in Observability.
- Review every flag in Configuration.