zvec-db Documentation

Welcome to the zvec-db documentation!

Version Python License

zvec-db is a utility suite for sparse vectorization and document reranking, designed to work with zvec.

Quick Start

Sparse Embedding

from zvec_db.embedders import BM25Embedder

# Training
embedder = BM25Embedder(max_features=4096)
embedder.fit(documents)

# Embedding
vector = embedder.embed("search query")
print(vector)  # {42: 0.523, 108: 0.312, ...}

Reranking

from zvec_db.rerankers import RrfReranker
from zvec.model.doc import Doc

reranker = RrfReranker(topn=10)
results = reranker.rerank({
    "bm25": bm25_docs,
    "dense": dense_docs
})

Features

  • 6 Sparse Embedders: Count, BM25, BM25L, BM25+, DisMax, TF-IDF

  • 3 Rerankers: RRF, Weighted, MultiField

  • Normalization: Standard and Bayesian

  • zvec-compatible: Sparse vector formats compatible with zvec

  • Tests: 100+ tests with ~95% coverage

Note

For more examples and guides, see the Installation and Sparse and Dense Embedding and Reranking sections.

Indices and tables