User Guide
The AI Chatbot Backend, explained for the people who use it
A retrieval-augmented (RAG) knowledge assistant that answers from your approved documents, shows the citations behind every answer, and tells you honestly how well each answer is grounded. Multi-mode, multilingual, and streamable.
- Modes: strict · open · learning · learning_review
- Languages: EN · AR · PT
- SSE streaming
- RFC 9457 errors
What this assistant is
The service is a document-grounded question-answering API. You send a question to
POST /api/v1/chat (base URL http://127.0.0.1:8000); it retrieves the most relevant
chunks from your ingested documents, drafts an answer, and verifies that the answer is
actually supported by those chunks before returning it.
Three properties make it trustworthy rather than just fluent:
- Citations on every answer. Each reply carries structured
sources[]—label,doc_id,score,page, andsnippet— so a person can verify it. - Honest grounding.
meta.groundedreports whether the answer issupported,partial, orunsupported. In strict mode an unsupported answer is replaced by a refusal, so a confident-sounding hallucination cannot ship. - You stay in control. Pick a mode per request, force a language, and tune retrieval
(
top_k,score_threshold) without changing server config.
Choose your path
Installation
Run the server from source (conda + pip), with Docker Compose, or fully local with Ollama + FastEmbed — zero cloud keys.
OpenQuickstart
Get a first grounded answer back from a running server with one curl command, then verify the response.
OpenArchitecture
FastAPI + middleware + the 8-node LangGraph pipeline + Redis + ChromaDB, and the sync / streaming / async-ingest lifecycles.
OpenChatting
The chat endpoint, request fields, the four modes, and how to read the answer + sources + grounding.
OpenStreaming
Render tokens as they arrive over Server-Sent Events, with the token → sources → done event order.
OpenLanguages
English, Arabic, and European Portuguese — auto-detection, RTL rendering, and multi-turn follow-ups.
OpenTrust & citations
Read citations, turn grounding scores into a confidence indicator, and treat refusal as a feature.
OpenIngesting documents
Add knowledge by URL or local upload, poll ingest status, and understand API-key auth.
OpenRating answers
Submit up/down feedback with a correlation id so bad answers feed the review queue.
OpenLLM & embeddings
14 LLM providers via a universal OpenAI-compatible adapter, plus OpenAI / FastEmbed / HuggingFace embeddings.
OpenDeploy & secure
Compose topology, durable ingestion, scaling, the security model and hardening checklist, observability, and evaluation.
OpenConfiguration
Every environment variable grouped like .env.example — defaults and purpose for LLM, retrieval, security, and more.
OpenAPI summary
Endpoint list, request headers, and the response/rate-limit headers you should log.
OpenTroubleshooting
Empty strict answers, 401 on ingest, 422 validation, 429 backoff, and degraded health.
OpenRelated next steps
- New here? Begin with the Quickstart.
- Building a chat UI? Read Streaming and Trust & citations.
- Operating the service? See Ingesting documents and Errors & rate limits.