Retrieval
How the system decides what context to feed the model — and what to do when nothing is relevant.
Audience: developers tuning answer quality. What you will accomplish: understand the score gate, the retrieval strategies, and query rewriting.
The relevance score gate
Retrieval runs in two steps. Step 1 fetches the single closest chunk and checks its cosine
similarity against RETRIEVAL_SCORE_THRESHOLD (default 0.3). Behavior then depends on
CHAT_MODE:
- strict — if even the best match is below threshold, the question is off-topic: no context is sent to the LLM and a refusal prompt is used.
- open / learning — below-threshold matches are still passed as weak grounding signals; the prompt tells the model to use general knowledge when context is weak and to be honest about provenance (best-available behavior).
ChromaDB is configured with cosine distance (hnsw:space: cosine) so scores are
comparable to a 0..1 threshold.
Retrieval strategies (MMR vs hybrid)
Step 2 runs only when the gate passes. The strategy is set by RETRIEVAL_STRATEGY:
| Value | What it does |
|---|---|
mmr (default) | Maximal Marginal Relevance — fetches candidates and picks chunks that are both relevant and diverse, avoiding near-identical paragraphs. |
hybrid | Dense vectors + BM25 lexical recall fused via Reciprocal Rank Fusion (RRF). Recovers acronyms, SKUs, and exact phrases that pure dense search misses. |
hybrid_rerank | Hybrid candidates passed through a reranker integration point (default passthrough until a concrete reranker is wired in). |
Context-aware query rewriting (#1)
When QUERY_REWRITE_ENABLED=true (default), the condense_query node rewrites a
multi-turn follow-up into a standalone search query using the rolling summary plus recent
turns — before retrieval. This fixes the classic failure where “what about overseas?”
retrieves nothing because it has no subject on its own.
Two important properties:
- It is skipped on the first turn (there is nothing to condense).
- Only retrieval uses the rewritten query — generation still uses the original question, so the answer stays faithful to what the user actually asked.
Verify your result
- Verify: You can explain why strict mode returns empty
sourceson an off-topic question. - Verify: You know
RETRIEVAL_STRATEGYacceptsmmr | hybrid | hybrid_rerankand thatmmris the default. - Verify: You understand that query rewriting affects retrieval only, not the generated answer.
Common failure modes
Related next steps
- See how a passing retrieval is then checked in Groundedness.
- Compare per-mode gate behavior in Chat modes.
- Tune the knobs in Configuration.