User Guide

The AI Chatbot Backend, explained for the people who use it

A retrieval-augmented (RAG) knowledge assistant that answers from your approved documents, shows the citations behind every answer, and tells you honestly how well each answer is grounded. Multi-mode, multilingual, and streamable.

  • Modes: strict · open · learning · learning_review
  • Languages: EN · AR · PT
  • SSE streaming
  • RFC 9457 errors

What this assistant is

The service is a document-grounded question-answering API. You send a question to POST /api/v1/chat (base URL http://127.0.0.1:8000); it retrieves the most relevant chunks from your ingested documents, drafts an answer, and verifies that the answer is actually supported by those chunks before returning it.

Three properties make it trustworthy rather than just fluent:

  • Citations on every answer. Each reply carries structured sources[]label, doc_id, score, page, and snippet — so a person can verify it.
  • Honest grounding. meta.grounded reports whether the answer is supported, partial, or unsupported. In strict mode an unsupported answer is replaced by a refusal, so a confident-sounding hallucination cannot ship.
  • You stay in control. Pick a mode per request, force a language, and tune retrieval (top_k, score_threshold) without changing server config.

Choose your path

Get started

Installation

Run the server from source (conda + pip), with Docker Compose, or fully local with Ollama + FastEmbed — zero cloud keys.

Open
15 min

Quickstart

Get a first grounded answer back from a running server with one curl command, then verify the response.

Open
Concept

Architecture

FastAPI + middleware + the 8-node LangGraph pipeline + Redis + ChromaDB, and the sync / streaming / async-ingest lifecycles.

Open
How-to

Chatting

The chat endpoint, request fields, the four modes, and how to read the answer + sources + grounding.

Open
How-to

Streaming

Render tokens as they arrive over Server-Sent Events, with the token → sources → done event order.

Open
Concept

Languages

English, Arabic, and European Portuguese — auto-detection, RTL rendering, and multi-turn follow-ups.

Open
Concept

Trust & citations

Read citations, turn grounding scores into a confidence indicator, and treat refusal as a feature.

Open
How-to

Ingesting documents

Add knowledge by URL or local upload, poll ingest status, and understand API-key auth.

Open
How-to

Rating answers

Submit up/down feedback with a correlation id so bad answers feed the review queue.

Open
Providers

LLM & embeddings

14 LLM providers via a universal OpenAI-compatible adapter, plus OpenAI / FastEmbed / HuggingFace embeddings.

Open
Operations

Deploy & secure

Compose topology, durable ingestion, scaling, the security model and hardening checklist, observability, and evaluation.

Open
Reference

Configuration

Every environment variable grouped like .env.example — defaults and purpose for LLM, retrieval, security, and more.

Open
Reference

API summary

Endpoint list, request headers, and the response/rate-limit headers you should log.

Open
Reference

Troubleshooting

Empty strict answers, 401 on ingest, 422 validation, 429 backoff, and degraded health.

Open