ProvidersEmbeddings

Embeddings

Embeddings turn document chunks and questions into vectors for retrieval. Three providers are supported via EMBEDDING_PROVIDER.

Audience: operators tuning retrieval quality, cost, and locality. What you will accomplish: choose an embedding provider/model and avoid the re-ingest trap.

The three providers

  • OpenAI (default) — no extra dependencies. text-embedding-3-small / text-embedding-3-large. Best when you don’t want to manage local inference.
  • FastEmbed (recommended for local) — ONNX Runtime, no torch dependency. ~50MB download vs ~2GB for torch-based alternatives, zero CVEs (pure Python + ONNX). Any FastEmbed-compatible model works; unknown models trigger a warning but still attempt to load.
  • HuggingFace (optional) — torch-based (sentence-transformers/transformers). Only install if you need a HuggingFace-specific model unavailable in FastEmbed.
# FastEmbed is already in requirements.txt:
#   EMBEDDING_PROVIDER=fastembed
#   EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
 
# HuggingFace (extra deps, pulls torch):
pip install langchain-huggingface sentence-transformers transformers numpy

FastEmbed model registry

ModelProviderDimensionsDownloadContextBest For
text-embedding-3-smallOpenAI1536API-only8191Default, production reliability
text-embedding-3-largeOpenAI3072API-only8191Maximum accuracy, higher cost
BAAI/bge-small-en-v1.5FastEmbed384~50MB512Prototyping, small datasets, low memory
BAAI/bge-base-en-v1.5FastEmbed768~120MB512Balanced speed/quality (recommended)
BAAI/bge-large-en-v1.5FastEmbed1024~430MB512Highest local quality, slower inference
sentence-transformers/all-MiniLM-L6-v2FastEmbed384~30MB256Fast semantic search, versatile
sentence-transformers/all-MiniLM-L12-v2FastEmbed384~60MB256Slightly better quality than L6
BAAI/bge-m3FastEmbed1024~570MB8192Arabic/English mixed content, multilingual
nomic-ai/nomic-embed-text-v1.5FastEmbed768~130MB8192Long documents (>256 tokens)
sentence-transformers/all-MiniLM-L6-v2HuggingFace384~2GB+256Same model, torch-based (AVOID if FastEmbed works)

Verify your result

  • Verify: You set EMBEDDING_PROVIDER and a matching EMBEDDING_MODEL.
  • Verify: You prefer FastEmbed for local/zero-CVE embeddings over HuggingFace+torch.
  • Verify: After changing the model, you re-ingested every document.

Common failure modes