zvec_db.rerankers

Rerankers for post-processing and combining search results.

class zvec_db.rerankers.RerankFunction(topn=10, rerank_field=None, schema=None, metrics=None)[source]

Abstract base class for reranking search results.

Rerankers refine the output of one or more vector queries by applying a secondary scoring strategy. They are used in the query() method of Collection via the reranker parameter.

Parameters:
  • topn (int, optional) – Number of top documents to return after reranking. Defaults to 10.

  • rerank_field (str | None, optional) – Field name used as input for reranking (e.g., document title or body). Defaults to None.

  • schema (CollectionSchema | None, optional) – Collection schema to automatically extract metrics from. If provided and no explicit metrics are given, metric types are inferred from the schema. Defaults to None.

  • metrics (str | MetricType | dict[str, str | MetricType | None] | None, optional) –

    Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g., MetricType.COSINE) applied

    to all sources

    • A dict mapping source names to their metric type (use None or MetricType.IP for sources that don’t need conversion, e.g., BM25 scores)

    • If None and schema is provided, metrics are inferred from the schema (defaults to IP if not specified)

    • If None and no schema, defaults to IP (no conversion needed)

    Defaults to None.

Note

Subclasses must implement the rerank() method.

__init__(topn=10, rerank_field=None, schema=None, metrics=None)[source]
Parameters:
  • topn (int)

  • rerank_field (str | None)

  • schema (CollectionSchema | None)

  • metrics (str | MetricType | dict[str, str | MetricType | None] | None)

property topn: int

Number of top documents to return after reranking.

Type:

int

property rerank_field: str | None

Field name used as reranking input.

Type:

str | None

property schema: CollectionSchema | None

The collection schema if provided.

Type:

CollectionSchema | None

property metrics: dict[str, str | MetricType | None]

Per-source metric types.

Type:

dict[str, str | MetricType | None]

abstractmethod rerank(query_results, query=None)[source]

Rerank documents from one or more vector queries.

Parameters:
  • query_results (dict[str, list[Doc]]) – Mapping from vector field name to list of retrieved documents (sorted by relevance).

  • query (str | None, optional) – The search query. Some rerankers may require this (e.g., CrossEncoder). Defaults to None.

Returns:

Reranked list of documents (length <= topn),

with updated score fields.

Return type:

list[Doc]

class zvec_db.rerankers.FusionRerankerBase(topn=10, rerank_field=None, schema=None, metrics=None)[source]

Base class for fusion-based rerankers combining multiple sources.

This class provides shared functionality for rerankers that fuse scores from multiple retrieval sources, including metric conversion and normalization.

Conversion formulas (ensure higher=better): - COSINE: (2 - score) / 2 - distance [0, 2] -> similarity [0, 1] - L2: -score - inverts order - IP: no conversion - already “higher=better” (also for BM25/non-vector scores)

Normalization: - COSINE: NEVER normalized (conversion already produces [0, 1]) - Others: Optional normalization (bayes, minmax, percentile, atan, etc.)

Parameters:
  • topn (int)

  • rerank_field (str | None)

  • schema (CollectionSchema | None)

  • metrics (str | MetricType | dict[str, str | MetricType | None] | None)

class zvec_db.rerankers.RrfReranker(topn=10, rerank_field=None, rank_constant=60, weights=None, normalize=None, metrics=None, schema=None)[source]

Reciprocal Rank Fusion (RRF) reranker with optional source weighting.

RRF combines results from multiple ranked lists by computing a fused score based on the reciprocal of each document’s rank:

\[\text{RRF}(d) = \sum_{r \in R} w_r \times \frac{1}{k + \text{rank}(d, r)}\]
where:
  • \(k\) is the rank_constant (default: 60)

  • \(w_r\) is the weight for source \(r\) (default: 1.0)

By default, all sources have equal weight. Use the weights parameter to favor certain sources over others.

Parameters:
  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored by RRF. Defaults to None.

  • rank_constant (int, optional) – Smoothing constant \(k\) in the RRF formula. Larger values reduce the impact of early ranks. Defaults to 60.

  • weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).

  • normalize (Optional[Union[bool, str, dict]], optional) – Ignored for RRF. RRF uses ranks, not scores, so normalization has no effect. Setting this parameter will emit a warning. Defaults to None.

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])

  • schema (Optional['CollectionSchema'])

Example

>>> # Basic RRF with default parameters
>>> reranker = RrfReranker(topn=10)
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Weighted RRF: favor dense embeddings (70%) over BM25 (30%)
>>> reranker = RrfReranker(
...     topn=10,
...     weights={"dense": 0.7, "bm25": 0.3}
... )
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Custom rank constant (higher = more uniform ranking)
>>> reranker = RrfReranker(topn=10, rank_constant=100)
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})

Note

RRF uses only document ranks, not raw scores. This makes it robust to score scale differences between sources (e.g., BM25 scores vs. cosine similarities). Normalization is not applicable to RRF.

See also

WeightedReranker: For weighted fusion based on scores rather than ranks.

__init__(topn=10, rerank_field=None, rank_constant=60, weights=None, normalize=None, metrics=None, schema=None)[source]
Parameters:
  • topn (int)

  • rerank_field (Optional[str])

  • rank_constant (int)

  • weights (Optional[dict[str, float]])

  • normalize (Optional[Union[bool, str, dict]])

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])

  • schema (Optional['CollectionSchema'])

property rank_constant: int
property weights: dict[str, float]
property normalize: bool | str | dict | None
rerank(query_results, query=None)[source]

Apply Reciprocal Rank Fusion to combine multiple query results.

Parameters:
  • query_results (dict[str, list[Doc]]) – Results from one or more vector queries. Keys are source names (e.g., “bm25”, “dense”), values are ranked document lists.

  • query (Optional[str], optional) – Ignored. Defaults to None.

Returns:

Reranked documents with RRF scores in the score field,

sorted by descending score.

Return type:

list[Doc]

Example

>>> reranker = RrfReranker(topn=5)
>>> results = reranker.rerank({
...     "bm25": bm25_results,
...     "dense": dense_results
... })
>>> print(f"Top document: {results[0].id} (score: {results[0].score:.4f})")
class zvec_db.rerankers.WeightedReranker(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Weighted fusion with optional normalization and metric conversion.

This class combines scores from multiple sources using weighted sum:

\[\text{score}(d) = \sum_{s \in S} \text{norm}(\text{score}_s(d)) \times w_s\]

where \(w_s\) is the weight for source \(s\).

Features: - Optional distance->similarity conversion (COSINE, L2, IP) - Optional normalization per source (bayes, minmax, percentile) - Smart defaults: COSINE -> no additional normalization, others -> bayes

Distance to similarity conversion: - COSINE: (2 - score) / 2 - distance [0, 2] -> similarity [0, 1] - L2: -score - inverts order - IP: no conversion (already similarity, including BM25 scores)

Note

COSINE metric is NEVER additionally normalized - the conversion formula (2 - score) / 2 already produces scores in [0, 1]. Setting normalize for COSINE sources has no effect.

Normalization methods (applied AFTER conversion, except for COSINE): - bayes (default for non-COSINE): Bayesian sigmoid calibration - minmax: (x - min) / (max - min) - percentile: rank-based normalization - default: index-aware scaling with avgscore - atan: arctan-based normalization 0.5 + atan(s)/pi

(assumes scores already converted to “higher=better”)

Parameters:
  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

  • weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).

  • normalize (Union[bool, str, dict[str, Any], None], optional) – Normalization configuration. Can be: - True (default): Smart default - COSINE -> no norm, others -> “bayes” - str: Method name (“bayes”, “minmax”, “percentile”, “default”, “atan”) - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: None} - None or False: No normalization (raw scores after conversion)

  • metrics (Optional[Union[MetricType, dict[str, MetricType]]], optional) –

    Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g., MetricType.COSINE) applied to all sources - A dict mapping source names to their metric type

    (use MetricType.IP for sources that don’t need conversion, e.g., BM25 scores)

    • If None and schema is provided, metrics are inferred from the schema

  • schema (Optional[CollectionSchema], optional) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema (defaults to IP).

Raises:

ValueError – If neither metrics nor schema is provided.

Example

>>> # Already normalized scores [0, 1]
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3}
... )
>>> results = reranker.rerank({
...     "bm25": bm25_docs_normalized,
...     "dense": dense_docs_normalized
... })
>>> # Raw scores with smart default normalization
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize=True  # COSINE -> /2, others -> bayes
... )
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Per-source normalization config
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize={"bm25": "bayes", "dense": "cosine"}  # cosine = no-op
... )
>>> # No normalization (raw scores after conversion only)
>>> reranker = WeightedReranker(
...     metrics={"bm25": MetricType.IP},
...     normalize=None
... )
>>> # Schema auto-detection (recommended with zvec)
>>> import zvec
>>> collection = zvec.open("./my_collection")
>>> reranker = WeightedReranker(
...     schema=collection.schema,
...     weights={"dense": 0.7, "bm25": 0.3},
...     normalize=True
... )

Note

Distance to similarity conversion is applied before normalization: - COSINE: 2 - score (distance [0,2] -> similarity [0,2]) - L2: -score (inverts order) - IP: no conversion (already similarity, including BM25 scores)

See also

RrfReranker: Rank-based fusion (uses ranks, not scores).

__init__(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Initialize WeightedReranker.

Parameters:
  • topn (int) – Number of top documents to return.

  • rerank_field (Optional[str]) – Ignored.

  • weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.

  • normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: - True (default): Smart default - COSINE -> no-op, others -> “bayes” - "bayes": Bayesian sigmoid calibration for all sources - "minmax": (x - min) / (max - min) for all sources - "percentile": Rank-based normalization for all sources - "cosine": No-op (identity). COSINE scores already in [0, 1] - "default": Min-max with avgscore - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} - None or False: No normalization (raw scores after conversion)

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for distance-to-similarity conversion. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema.

  • schema (Optional[CollectionSchema]) – Collection schema to automatic extract metrics from.

Raises:

ValueError – If neither metrics nor schema is provided.

property weights: dict[str, float]
property normalize: bool | str | dict[str, Any] | None
rerank(query_results, query=None)[source]

Convert scores and compute weighted fusion.

Steps: 1. Convert metrics to ensure higher=better:

  • COSINE: (2 - score) / 2

  • L2: -score (inverts order)

  • IP: no conversion

  1. Apply normalization per source (COSINE: skipped, others: bayes by default)

  2. Filter out documents with normalized score <= 0

  3. Compute weighted fusion

Parameters:
  • query_results (dict[str, list[Doc]]) – Dictionary mapping source names to lists of documents.

  • query (Optional[str], optional) – Ignored. Defaults to None.

Returns:

Reranked documents with weighted scores.

Return type:

list[Doc]

Note

COSINE scores are NOT additionally normalized after conversion, since (2-score)/2 already produces scores in [0, 1].

class zvec_db.rerankers.MultiFieldWeightedReranker(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Reranker that combines scores from multiple sources and document fields.

This reranker extends the standard weighted fusion approach by supporting field-level weighting within documents. This is useful when documents have structured fields (e.g., title, content, tags) and you want to weight their contributions differently.

The score fusion is computed as:

\[\text{score}(d) = \sum_{s \in S} w_s \times \sum_{f \in F} w_f \times \text{norm}(\text{score}_{s,f}(d))\]
where:
  • \(w_s\) is the weight for source \(s\)

  • \(w_f\) is the weight for field \(f\)

  • \(\text{norm}\) is the normalization function (Standard or Bayesian)

This is preferred over NormalizedWeightedReranker when:

  • Documents have structured fields with different importance (title > content).

  • You need fine-grained control over score contributions.

  • Different fields use different scoring scales.

Parameters:
  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

  • metric (Optional[MetricType], optional) – Metric for RAW scores. Default “cosine” because it’s the main use case with zvec/Qdrant. - MetricType.COSINE : cosine distances [0, 2] - MetricType.L2 : L2 distances - MetricType.IP : similarities (inner product, including BM25 scores)

  • source_weights (Optional[dict[str, float]], optional) – Weight per source key. Sources not listed use weight 1.0. Defaults to None (equal weights).

  • field_weights (Optional[dict[str, float]], optional) – Weight per document field. Fields not listed use weight 1.0. The field is retrieved from doc.fields dictionary. Defaults to None (equal weights for all fields).

  • normalizer_configs (Optional[dict[str, Any]], optional) – A mapping of source keys to their specific normalization configurations.

  • default_norm_config (Union[bool, str, dict[str, Any]], optional) – The normalization method to use for keys not found in normalizer_configs. Defaults to True (standard normalization).

  • weights (Optional[dict[str, float]])

  • normalize (Union[bool, str, dict[str, Any], None])

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])

  • schema (Optional[CollectionSchema])

Note

Field scores are expected to be stored in doc.fields[field_name] as numeric values. If a field is missing or has a non-numeric value, it contributes 0 to the score.

Example

>>> reranker = MultiFieldWeightedReranker(
...     topn=20,
...     source_weights={"bm25": 0.7, "dense": 0.3},
...     field_weights={"title": 3.0, "body": 1.0, "tags": 0.5}
... )
>>> results = reranker.rerank({
...     "bm25": bm25_docs,
...     "dense": dense_docs
... })
__init__(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Initialize MultiFieldWeightedReranker.

Parameters:
  • topn (int) – Number of top documents to return.

  • rerank_field (Optional[str]) – Ignored.

  • source_weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.

  • field_weights (Optional[dict[str, float]]) – Weight per document field.

  • normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: - True (default): Smart default - COSINE → no-op, others → “bayes” - str: Method name (“bayes”, “minmax”, “percentile”, “cosine”) - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} - None or False: No normalization (raw scores after conversion)

  • Note"cosine" is a no-op (identity) since COSINE scores are already

  • [0 (in)

  • `` (1] after conversion)

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for converting distances to similarities. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema. Required if schema is not provided.

  • schema (Optional[CollectionSchema]) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema.

  • weights (Optional[dict[str, float]])

Raises:

ValueError – If neither metrics nor schema is provided.

Example

>>> # Automatic metric detection from collection schema
>>> import zvec
>>> collection = zvec.open("./my_collection")
>>> reranker = MultiFieldWeightedReranker(
...     schema=collection.schema,
...     source_weights={"bm25": 0.6, "dense": 0.4},
...     field_weights={"title": 3.0, "content": 1.0},
...     normalize=True  # Default: bayes for all
... )
rerank(query_results, query=None)[source]

Normalize scores per-source and compute weighted fusion with field weighting.

This method performs the following steps:

  1. Iterates through each source in query_results.

  2. For each document, computes a field-weighted score.

  3. Applies normalization per source (smart default: COSINE → /2, others → bayes).

  4. Filters out documents with a normalized score of 0.0.

  5. Delegates to WeightedReranker for source-weighted fusion.

Parameters:
  • query_results (dict[str, list[Doc]]) – Dictionary mapping source names to lists of documents. Each document should have id, score, and fields with numeric values for field scoring.

  • query (str | None)

Returns:

Reranked documents with weighted normalized scores in the score field, sorted by descending score.

Return type:

list[Doc]

Example

>>> query_results = {
...     "sparse_bm25": bm25_docs,
...     "dense_cosine": dense_docs
... }
>>> reranked = reranker.rerank(query_results)
class zvec_db.rerankers.BaseCrossEncoderReranker(query, topn=10, rerank_field=None, fusion_score_weight=1.0)[source]

Abstract base class for cross-encoder reranking.

This class provides the common infrastructure for cross-encoder scoring. Subclasses must implement the _compute_scores_batch() method to define their scoring strategy.

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int, optional) – Number of top documents to return after reranking. Defaults to 10.

  • rerank_field (Optional[str], optional) – Document field to use for reranking. If None, uses the entire document content. Defaults to None.

  • fusion_score_weight (float, optional) –

    Weight for blending cross-encoder scores with fusion scores.

    Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)

    • weight = 1.0 → 100% cross-encoder, 0% fusion (pure cross-encoder, default)

    • weight = 0.8 → 80% cross-encoder, 20% fusion

    • weight = 0.5 → 50% cross-encoder, 50% fusion

    • weight = 0.0 → 0% cross-encoder, 100% fusion (pure fusion)

    Defaults to 1.0 (pure cross-encoder score).

Note

  • Subclasses must implement _compute_scores_batch() or _compute_score()

  • Cross-encoder reranking is more accurate but slower than score fusion

  • For large document sets, consider using max_batch_size to limit API calls

__init__(query, topn=10, rerank_field=None, fusion_score_weight=1.0)[source]
Parameters:
  • query (str)

  • topn (int)

  • rerank_field (str | None)

  • fusion_score_weight (float)

property query: str

Default query for reranking.

Type:

str

property fusion_score_weight: float

Weight for blending cross-encoder scores with fusion scores.

Type:

float

rerank(query_results, query=None)[source]

Rerank documents using cross-encoder scoring.

Parameters:
  • query_results (dict[str, list[Doc]]) – Results from one or more vector queries.

  • query (Optional[str], optional) – Query for reranking. Overrides constructor value if provided.

Returns:

Reranked documents with cross-encoder scores.

Return type:

list[Doc]

class zvec_db.rerankers.SentenceTransformerReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]

Cross-encoder reranker using Sentence Transformers models locally.

This reranker uses the CrossEncoder class from sentence-transformers to compute relevance scores between query and document pairs. Unlike API-based cross-encoders, this runs entirely locally on CPU or GPU.

SentenceTransformer CrossEncoder models output a single score via sigmoid for binary relevance (relevant/not relevant).

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • model_name (str, optional) – CrossEncoder model name from HuggingFace. Examples: - “cross-encoder/ms-marco-MiniLM-L-6-v2” (fast, good quality) - “cross-encoder/ms-marco-TinyBERT-L-2-v2” (very fast) - “cross-encoder/stsb-distilroberta-base” (semantic similarity) Defaults to “cross-encoder/ms-marco-MiniLM-L-6-v2”.

  • device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.

  • max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.

  • rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.

  • batch_size (int, optional) – Batch size for inference. Defaults to 32.

  • show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.

  • fusion_score_weight (float, optional) –

    Weight for blending cross-encoder scores with fusion scores.

    Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)

    • weight = 1.0 → 100% cross-encoder, 0% fusion (default)

    • weight = 0.8 → 80% cross-encoder, 20% fusion

    • weight = 0.5 → 50% cross-encoder, 50% fusion

    • weight = 0.0 → 0% cross-encoder, 100% fusion

    Defaults to 1.0 (pure cross-encoder score).

  • model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to CrossEncoder constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).

Example

>>> from zvec_db.rerankers.cross_encoder import SentenceTransformerReranker
>>>
>>> # Binary relevance reranker
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     topn=10,
... )
>>>
>>> results = reranker.rerank({"bm25": bm25_docs})
>>>
>>> # Blended scores: 80% cross-encoder + 20% fusion
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     topn=10,
...     fusion_score_weight=0.8,
... )
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for private models
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="org/private-model",
...     model_kwargs={"token": "hf_..."},
... )
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for dtype (float16 for reduced memory)
>>> import torch
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     model_kwargs={"torch_dtype": torch.float16},
... )
>>> results = reranker.rerank({"bm25": docs})

Note

  • Requires the sentence-transformers package

  • Models are downloaded automatically on first use

  • GPU acceleration available if CUDA is installed

  • Models output scores in [0, 1] via sigmoid

See also

OpenAIReranker: API-based cross-encoder with LLM.

__init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • model_name (str)

  • device (str | None)

  • max_length (int | None)

  • rerank_field (str | None)

  • batch_size (int)

  • show_progress_bar (bool)

  • fusion_score_weight (float)

  • model_kwargs (Mapping[str, Any] | None)

fit(documents)[source]

Initialize the reranker by loading the model.

For Sentence Transformers CrossEncoder, this loads the model. No training is performed as models are pre-trained.

Parameters:

documents (list[str]) – List of documents (not used, for API compatibility).

Returns:

For method chaining.

Return type:

self

property batch_size
property device
property max_length
property model_kwargs
property model_name
property show_progress_bar
class zvec_db.rerankers.ClassificationReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]

Multi-class classification reranker using HuggingFace transformers.

This reranker uses a multi-class classification model from HuggingFace (via the transformers library) and computes the expected value of the class distribution:

\[E[\text{score}] = \frac{\sum_{i} prob_i \times i}{num\_classes - 1}\]

The model outputs logits for each class (0, 1, 2, …, num_classes-1). Softmax is applied to get probabilities, then expected value is computed and normalized to [0, 1].

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • model_name (str, optional) –

    Classification model name from HuggingFace. Should be a model fine-tuned for text classification with multiple labels. Examples: “cross-encoder/ms-marco-MiniLM-L-6-v2” (binary),

    ”nboost/pt-bert-base-uncased-msmarco” (binary), or any model with config.num_labels set.

  • device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.

  • max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.

  • num_classes (Optional[int], optional) – Number of classes for classification. If None, will be inferred from model.config.num_labels. For binary: 2 (classes 0 and 1) For multi-class: e.g., 5 for 0-4 relevance scale. Defaults to None (auto-infer).

  • rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.

  • batch_size (int, optional) – Batch size for inference. Defaults to 32.

  • show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.

  • fusion_score_weight (float, optional) –

    Weight for blending cross-encoder scores with fusion scores.

    Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)

    • weight = 1.0 → 100% cross-encoder, 0% fusion (default)

    • weight = 0.8 → 80% cross-encoder, 20% fusion

    • weight = 0.5 → 50% cross-encoder, 50% fusion

    • weight = 0.0 → 0% cross-encoder, 100% fusion

    Defaults to 1.0 (pure cross-encoder score).

  • model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to AutoModelForSequenceClassification and AutoTokenizer. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto” for auto-detection) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”, “sdpa”) - load_in_8bit: Enable 8-bit quantization (requires bitsandbytes) - load_in_4bit: Enable 4-bit quantization (requires bitsandbytes) - device_map: Device mapping for distributed loading (e.g., “auto”, “balanced”) Defaults to None (no additional kwargs).

Example

>>> from zvec_db.rerankers.cross_encoder import ClassificationReranker
>>>
>>> # Binary classification (num_classes inferred from model)
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     topn=10,
... )
>>>
>>> # Multi-level relevance with explicit num_classes
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="your-multi-class-classifier",
...     num_classes=5,
...     topn=10,
... )
>>>
>>> reranker.fit([])  # Load model
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for private models or custom options
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="org/private-model",
...     model_kwargs={"token": "hf_...", "trust_remote_code": True},
... )
>>> reranker.fit([])
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for dtype (float16 for reduced memory)
>>> import torch
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     model_kwargs={"torch_dtype": torch.float16},
... )
>>> reranker.fit([])
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for 8-bit quantization (requires bitsandbytes)
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     model_kwargs={"load_in_8bit": True},
... )
>>> reranker.fit([])
>>> results = reranker.rerank({"bm25": docs})

Note

  • Requires the transformers and torch packages

  • Model must be trained/fine-tuned for multi-class text classification

  • num_classes is inferred from model.config.num_labels if not provided

  • GPU acceleration available if CUDA is installed

  • Scores are normalized to [0, 1] via expected value

See also

OpenAIDecoderReranker: API-based classification with LLM logprobs.

__init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • model_name (str)

  • device (str | None)

  • max_length (int | None)

  • num_classes (int | None)

  • rerank_field (str | None)

  • batch_size (int)

  • show_progress_bar (bool)

  • fusion_score_weight (float)

  • model_kwargs (Mapping[str, Any] | None)

fit(documents)[source]

Initialize the reranker by loading the model.

Parameters:

documents (list[str]) – List of documents (not used, for API compatibility).

Returns:

For method chaining.

Return type:

self

property batch_size
property device
property max_length
property model_kwargs
property model_name
property num_classes
property show_progress_bar
class zvec_db.rerankers.OpenAIReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]

Cross-encoder reranker using OpenAI-compatible /rerank or /score endpoints.

Uses vLLM’s native endpoints: /rerank for query-document scoring, /score for text pair similarity. Both return scores in [0, 1].

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int) – Number of top documents to return. Defaults to 10.

  • base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.

  • api_key (Optional[str]) – API key. Defaults to None.

  • model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.

  • endpoint (Literal["rerank", "score"]) – Endpoint to use. Defaults to “rerank”.

  • timeout (float) – HTTP timeout in seconds. Defaults to 30.0.

  • rerank_field (Optional[str]) – Document field for scoring. Defaults to None.

  • fusion_score_weight (float) – Weight for cross-encoder vs fusion scores. 1.0 = pure cross-encoder, 0.0 = pure fusion. Defaults to 1.0.

  • truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.

  • max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.

  • initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.

  • max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.

  • exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.

  • jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.

  • retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.

Example

>>> from zvec_db.rerankers.cross_encoder import OpenAIReranker
>>> reranker = OpenAIReranker(
...     query="machine learning",
...     endpoint="rerank",
...     base_url="http://localhost:8000",
... )
>>> results = reranker.rerank({"bm25": docs})
>>> # With custom retry settings for production
>>> reranker = OpenAIReranker(
...     query="machine learning",
...     max_retries=5,
...     initial_delay=2.0,
...     max_delay=120.0,
... )

Note: Requires vLLM with /rerank or /score endpoint enabled.

__init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • base_url (str)

  • api_key (str | None)

  • model (str)

  • endpoint (Literal['rerank', 'score'])

  • timeout (float)

  • rerank_field (str | None)

  • fusion_score_weight (float)

  • truncate_prompt_tokens (int | None)

  • max_retries (int)

  • initial_delay (float)

  • max_delay (float)

  • exponential_base (float)

  • jitter (float)

  • retry_config (RetryConfig | None)

property api_key
property base_url
property endpoint
property model
property timeout
property truncate_prompt_tokens
class zvec_db.rerankers.OpenAIEncoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]

Cross-encoder reranker using the /classify endpoint for encoder models.

Uses vLLM’s /classify endpoint for encoder models (BERT, RoBERTa). Computes expected value score from class probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int) – Number of top documents to return. Defaults to 10.

  • base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.

  • api_key (Optional[str]) – API key. Defaults to None.

  • model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.

  • num_classes (Optional[int]) – Number of classes. Auto-detected if None.

  • timeout (float) – HTTP timeout in seconds. Defaults to 30.0.

  • rerank_field (Optional[str]) – Document field for scoring.

  • fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.

  • separator (str) – Query-document separator. Defaults to “ “.

  • truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.

Example

>>> from zvec_db.rerankers.cross_encoder import OpenAIEncoderReranker
>>> reranker = OpenAIEncoderReranker(
...     query="machine learning",
...     num_classes=2,
...     base_url="http://localhost:8000",
... )
>>> results = reranker.rerank({"bm25": docs})

Note: Requires vLLM with /classify endpoint enabled.

__init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • base_url (str)

  • api_key (str | None)

  • model (str)

  • num_classes (int | None)

  • timeout (float)

  • rerank_field (str | None)

  • fusion_score_weight (float)

  • separator (str)

  • truncate_prompt_tokens (int | None)

property api_key
property base_url
property model
property num_classes
property separator
property timeout
property truncate_prompt_tokens
class zvec_db.rerankers.OpenAIDecoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]

Cross-encoder reranker using LLM logprobs with structured output.

Uses /chat/completions with logprobs and regex-constrained output. Computes expected value score from log probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int) – Number of top documents to return. Defaults to 10.

  • base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.

  • api_key (Optional[str]) – API key. Defaults to None.

  • model (str) – Model identifier. Defaults to “gpt-4o-mini”.

  • num_classes (int) – Number of classes. Defaults to 2.

  • timeout (float) – HTTP timeout in seconds. Defaults to 30.0.

  • max_batch_size (Optional[int]) – Max documents per batch. Default None.

  • rerank_field (Optional[str]) – Document field for scoring.

  • fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.

  • concurrency (int) – Concurrent API calls. Defaults to 4.

Example

>>> from zvec_db.rerankers.cross_encoder import OpenAIDecoderReranker
>>> reranker = OpenAIDecoderReranker(
...     query="machine learning",
...     num_classes=2,
...     model="gpt-4o-mini",
... )
>>> results = reranker.rerank({"bm25": docs})

Note: Requires model with logprobs support (–enable-logprobs for vLLM).

MAX_CLASSES = 10
__init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]
Parameters:
  • query (str)

  • topn (int)

  • base_url (str)

  • api_key (str | None)

  • model (str)

  • num_classes (int)

  • timeout (float)

  • max_batch_size (int | None)

  • rerank_field (str | None)

  • fusion_score_weight (float)

  • concurrency (int)

property api_key
property base_url
property concurrency
property max_batch_size
property model
property num_classes
property timeout
class zvec_db.rerankers.Normalize(config=None)[source]

Callable normaliser for lists of (uid, score) pairs.

Instances behave like functions: call them with a score list and an optional avgscore and the result will be a new list with all scores mapped into the closed unit interval. The precise transformation is determined by the configuration supplied at construction time.

Parameters:

config (Union[bool, str, Dict[str, Any], None])

method

Lowercase string naming the chosen normalisation algorithm.

Type:

str

alpha

Scale parameter used in Bayesian modes.

Type:

float

beta

Centre parameter used in Bayesian modes; None triggers median-based automatic selection.

Type:

Optional[float]

__init__(config=None)[source]

Initialise a Normalize instance.

Parameters:

config (bool, str, dict or None, optional) –

Configuration object that selects the normalisation strategy. The following forms are interpreted:

  • None or False : equivalent to "default" - standard index-aware scaling.

  • truthy non-dict : also selects the default behaviour.

  • str : the string value is converted to lower case and used as the method name. Supported methods: - "bayes", "bayesian", "bb25" : Bayesian sigmoid calibration - "minmax" : (x - min) / (max - min) - "percentile" (alias: "rank") : rank-based normalization - "default" : standard index-aware scaling

  • dict : a copy of the dictionary is stored, and may contain the keys method (string), alpha (float) and beta (float or None). Any missing keys will be filled with defaults (alpha defaults to 1.0; beta to None).

Notes

The configuration is shallow-copied to prevent external modification from affecting the normaliser’s internal state.

__call__(scores, avgscore=0.0)[source]

Normalise a list of document scores.

Parameters:
  • scores (ScoreList) – Sequence of (uid, score) pairs, typically produced by a retrieval algorithm. It is assumed that the list is sorted in descending order of score; the method will use the first entry to compute the maximum when performing default scaling.

  • avgscore (float, optional) – Average score computed over the entire corpus. This is only used by the default normalisation strategy. In Bayesian modes the value is ignored entirely.

Returns:

New list where each score has been replaced with a value in [0.0, 1.0] according to the chosen transformation.

Return type:

ScoreList

Notes

Multiple normalisation methods are supported:

  • default – scales scores relative to an estimated maximum and clips values. This keeps the relative ordering intact but bounds the range.

  • bayesian – applies a sigmoid function calibrated using the positive scores only. Negative or zero input scores are mapped to 0.0 unconditionally. Robust to outliers.

  • minmax – (x - min) / (max - min). Preserves relative distances.

  • percentile – rank-based normalization. Very robust to outliers.

  • cosine – no-op (identity). COSINE conversion (2-score)/2 already produces scores in [0, 1], so no additional normalization is needed.

  • atan – arctan-based normalization: 1 - 2*atan(s)/pi for L2, 0.5 + atan(s)/pi for IP. Maps unbounded scores to [0, 1].

class zvec_db.rerankers.PipelineReranker(rerankers, topn=10, rerank_field=None)[source]

Chain multiple rerankers sequentially.

This reranker applies a list of rerankers in sequence, passing the output of one as the input to the next. This is useful for combining different reranking strategies (e.g., RRF followed by cross-encoder).

Parameters:
  • rerankers (list) – List of rerankers to apply in order.

  • topn (int, optional) – Number of final documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

Example

>>> pipeline = PipelineReranker([
...     RrfReranker(topn=50, rank_constant=60),
...     SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10)
... ])
>>> results = collection.query(..., reranker=pipeline)
__init__(rerankers, topn=10, rerank_field=None)[source]

Initialize PipelineReranker with a list of rerankers.

Parameters:
  • rerankers (list) – List of reranker instances to apply in order. Each reranker must implement the rerank() method.

  • topn (int, optional) – Number of final documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

Example

>>> pipeline = PipelineReranker([
...     RrfReranker(topn=50, rank_constant=60),
...     SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10)
... ])
>>> results = collection.query(..., reranker=pipeline)
rerank(query_results, query=None)[source]

Apply rerankers sequentially.

Parameters:
  • query_results (dict[str, list[Doc]]) – Results from vector queries.

  • query (Optional[str], optional) – The search query. Passed to underlying rerankers. Defaults to None.

Returns:

Final re-ranked documents after all rerankers applied.

Return type:

list[Doc]

zvec_db.rerankers.extract_score(doc)[source]

Extract score from a document, handling various numeric types.

Parameters:

doc (Doc) – Document with a score attribute.

Returns:

Score as a float, or 0.0 if score is None or invalid.

Return type:

float

Example

>>> doc = Doc(id="1", score=0.8)
>>> extract_score(doc)
0.8
>>> doc_no_score = Doc(id="2", score=None)
>>> extract_score(doc_no_score)
0.0
zvec_db.rerankers.extract_field_score(doc, field_name)[source]

Extract score from a specific document field.

Parameters:
  • doc (Doc) – Document with fields attribute.

  • field_name (str) – Name of the field to extract score from.

Returns:

Field score as a float, or 0.0 if field is missing or non-numeric.

Return type:

float

Example

>>> doc = Doc(id="1", fields={"title_score": 0.9, "content_score": 0.7})
>>> extract_field_score(doc, "title_score")
0.9
>>> extract_field_score(doc, "missing_field")
0.0
zvec_db.rerankers.get_document_text(doc, rerank_field=None)[source]

Extract document text for scoring or embedding.

This function attempts to extract text content from a document using the following strategy:

  1. If rerank_field is specified and the document has that field, use it.

  2. Otherwise, try common field names: “content”, “text”, “body”, “passage”.

  3. If no field matches, concatenate all fields.

  4. As a last resort, return the document ID as a string.

Parameters:
  • doc (Doc) – Document to extract text from.

  • rerank_field (Optional[str]) – Specific field name to use. If None, uses the fallback strategy. Defaults to None.

Returns:

Extracted document text.

Return type:

str

Example

>>> doc = Doc(id="1", fields={"content": "Hello world", "title": "Test"})
>>> get_document_text(doc)
'Hello world'
>>> get_document_text(doc, rerank_field="title")
'Test'

Modules

base

Base classes for reranking in zvec-db.

cross_encoder

Cross-encoder rerankers for accurate pairwise scoring.

fusion

Fusion-based rerankers for combining multiple retrieval results.

utils

Utility classes for reranking operations.