Reranking
Overview
The zvec_db.rerankers sub-package provides algorithms to combine and rerank search results from multiple sources.
Available classes:
RerankFunction- Abstract base class for all rerankersFusionRerankerBase- Base class for fusion-based rerankersRrfReranker- Reciprocal Rank Fusion (RRF)WeightedReranker- Weighted fusion with normalization and metric conversionMultiFieldWeightedReranker- Field-based weighting (title, content, tags)Normalize- Score normalization utilityPipelineReranker- Pipeline to chain multiple rerankersBaseCrossEncoderReranker- Base class for Cross-Encoder rerankersSentenceTransformerReranker- Local Cross-Encoder with Sentence TransformersClassificationReranker- Cross-Encoder with multi-class classificationOpenAIReranker- Cross-Encoder via OpenAI-compatible APIOpenAIEncoderReranker- Encoder-based reranker via APIOpenAIDecoderReranker- Decoder-based reranker via API (chat completions)
Understanding the ``metrics`` parameter:
The metrics parameter controls how scores are converted from distance to similarity:
metrics=MetricType.COSINE- Apply cosine conversion(2 - score) / 2to all sourcesmetrics={"dense": COSINE, "bm25": None}- Per-source metrics (None = no conversion)metrics=Nonewithschema=...- Auto-detect from collection schemametrics=Nonewithout schema - No conversion (scores assumed to be similarities)
Default: metrics=MetricType.COSINE for compatibility with zvec/Qdrant cosine distances.
For BM25-only, use metrics=None.
Understanding the ``normalize`` parameter:
The normalize parameter controls score normalization after metric conversion:
normalize=True(default) - Smart default: COSINE → no-op, others →"bayes"normalize="bayes"- Bayesian sigmoid calibration (robust to outliers)normalize="minmax"- Min-max scaling:(x - min) / (max - min)normalize="percentile"- Rank-based normalization (very robust to outliers)normalize="cosine"- No-op (identity). COSINE scores already in [0, 1]normalize={"source1": "bayes", "source2": "cosine"}- Per-source configurationnormalize=NoneorFalse- No normalization (scores after conversion only)
Base Classes
RerankFunction
- class zvec_db.rerankers.RerankFunction(topn=10, rerank_field=None, schema=None, metrics=None)[source]
Abstract base class for reranking search results.
Rerankers refine the output of one or more vector queries by applying a secondary scoring strategy. They are used in the
query()method ofCollectionvia thererankerparameter.- Parameters:
topn (int, optional) – Number of top documents to return after reranking. Defaults to 10.
rerank_field (str | None, optional) – Field name used as input for reranking (e.g., document title or body). Defaults to None.
schema (CollectionSchema | None, optional) – Collection schema to automatically extract metrics from. If provided and no explicit metrics are given, metric types are inferred from the schema. Defaults to None.
metrics (str | MetricType | dict[str, str | MetricType | None] | None, optional) –
Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g.,
MetricType.COSINE) appliedto all sources
A dict mapping source names to their metric type (use
NoneorMetricType.IPfor sources that don’t need conversion, e.g., BM25 scores)If None and schema is provided, metrics are inferred from the schema (defaults to IP if not specified)
If None and no schema, defaults to IP (no conversion needed)
Defaults to None.
Note
Subclasses must implement the
rerank()method.- property schema: CollectionSchema | None
The collection schema if provided.
- Type:
CollectionSchema | None
FusionRerankerBase
- class zvec_db.rerankers.FusionRerankerBase(topn=10, rerank_field=None, schema=None, metrics=None)[source]
Base class for fusion-based rerankers combining multiple sources.
This class provides shared functionality for rerankers that fuse scores from multiple retrieval sources, including metric conversion and normalization.
Conversion formulas (ensure higher=better): - COSINE:
(2 - score) / 2- distance [0, 2] -> similarity [0, 1] - L2:-score- inverts order - IP: no conversion - already “higher=better” (also for BM25/non-vector scores)Normalization: - COSINE: NEVER normalized (conversion already produces [0, 1]) - Others: Optional normalization (bayes, minmax, percentile, atan, etc.)
- Parameters:
- __init__(topn=10, rerank_field=None, schema=None, metrics=None)
- abstractmethod rerank(query_results, query=None)
Rerank documents from one or more vector queries.
- Parameters:
- Returns:
- Reranked list of documents (length <= topn),
with updated
scorefields.
- Return type:
list[Doc]
Fusion Rerankers
RrfReranker
- class zvec_db.rerankers.RrfReranker(topn=10, rerank_field=None, rank_constant=60, weights=None, normalize=None, metrics=None, schema=None)[source]
Reciprocal Rank Fusion (RRF) reranker with optional source weighting.
RRF combines results from multiple ranked lists by computing a fused score based on the reciprocal of each document’s rank:
\[\text{RRF}(d) = \sum_{r \in R} w_r \times \frac{1}{k + \text{rank}(d, r)}\]- where:
\(k\) is the
rank_constant(default: 60)\(w_r\) is the weight for source \(r\) (default: 1.0)
By default, all sources have equal weight. Use the
weightsparameter to favor certain sources over others.- Parameters:
topn (int, optional) – Number of top documents to return. Defaults to 10.
rerank_field (Optional[str], optional) – Ignored by RRF. Defaults to None.
rank_constant (int, optional) – Smoothing constant \(k\) in the RRF formula. Larger values reduce the impact of early ranks. Defaults to 60.
weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).
normalize (Optional[Union[bool, str, dict]], optional) – Ignored for RRF. RRF uses ranks, not scores, so normalization has no effect. Setting this parameter will emit a warning. Defaults to None.
metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])
schema (Optional['CollectionSchema'])
Example
>>> # Basic RRF with default parameters >>> reranker = RrfReranker(topn=10) >>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Weighted RRF: favor dense embeddings (70%) over BM25 (30%) >>> reranker = RrfReranker( ... topn=10, ... weights={"dense": 0.7, "bm25": 0.3} ... ) >>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Custom rank constant (higher = more uniform ranking) >>> reranker = RrfReranker(topn=10, rank_constant=100) >>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
Note
RRF uses only document ranks, not raw scores. This makes it robust to score scale differences between sources (e.g., BM25 scores vs. cosine similarities). Normalization is not applicable to RRF.
See also
WeightedReranker: For weighted fusion based on scores rather than ranks.
- __init__(topn=10, rerank_field=None, rank_constant=60, weights=None, normalize=None, metrics=None, schema=None)[source]
- rerank(query_results, query=None)[source]
Apply Reciprocal Rank Fusion to combine multiple query results.
- Parameters:
- Returns:
- Reranked documents with RRF scores in the
scorefield, sorted by descending score.
- Reranked documents with RRF scores in the
- Return type:
list[Doc]
Example
>>> reranker = RrfReranker(topn=5) >>> results = reranker.rerank({ ... "bm25": bm25_results, ... "dense": dense_results ... }) >>> print(f"Top document: {results[0].id} (score: {results[0].score:.4f})")
WeightedReranker
- class zvec_db.rerankers.WeightedReranker(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Weighted fusion with optional normalization and metric conversion.
This class combines scores from multiple sources using weighted sum:
\[\text{score}(d) = \sum_{s \in S} \text{norm}(\text{score}_s(d)) \times w_s\]where \(w_s\) is the weight for source \(s\).
Features: - Optional distance->similarity conversion (COSINE, L2, IP) - Optional normalization per source (bayes, minmax, percentile) - Smart defaults: COSINE -> no additional normalization, others -> bayes
Distance to similarity conversion: - COSINE:
(2 - score) / 2- distance [0, 2] -> similarity [0, 1] - L2:-score- inverts order - IP: no conversion (already similarity, including BM25 scores)Note
COSINE metric is NEVER additionally normalized - the conversion formula
(2 - score) / 2already produces scores in [0, 1]. Setting normalize for COSINE sources has no effect.Normalization methods (applied AFTER conversion, except for COSINE): - bayes (default for non-COSINE): Bayesian sigmoid calibration - minmax: (x - min) / (max - min) - percentile: rank-based normalization - default: index-aware scaling with avgscore - atan: arctan-based normalization
0.5 + atan(s)/pi(assumes scores already converted to “higher=better”)
- Parameters:
topn (int, optional) – Number of top documents to return. Defaults to 10.
rerank_field (Optional[str], optional) – Ignored. Defaults to None.
weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).
normalize (Union[bool, str, dict[str, Any], None], optional) – Normalization configuration. Can be: -
True(default): Smart default - COSINE -> no norm, others -> “bayes” -str: Method name (“bayes”, “minmax”, “percentile”, “default”, “atan”) -dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: None} -NoneorFalse: No normalization (raw scores after conversion)metrics (Optional[Union[MetricType, dict[str, MetricType]]], optional) –
Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g.,
MetricType.COSINE) applied to all sources - A dict mapping source names to their metric type(use
MetricType.IPfor sources that don’t need conversion, e.g., BM25 scores)If None and schema is provided, metrics are inferred from the schema
schema (Optional[CollectionSchema], optional) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema (defaults to IP).
- Raises:
ValueError – If neither metrics nor schema is provided.
Example
>>> # Already normalized scores [0, 1] >>> reranker = WeightedReranker( ... weights={"bm25": 0.7, "dense": 0.3} ... ) >>> results = reranker.rerank({ ... "bm25": bm25_docs_normalized, ... "dense": dense_docs_normalized ... })
>>> # Raw scores with smart default normalization >>> reranker = WeightedReranker( ... weights={"bm25": 0.7, "dense": 0.3}, ... normalize=True # COSINE -> /2, others -> bayes ... ) >>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Per-source normalization config >>> reranker = WeightedReranker( ... weights={"bm25": 0.7, "dense": 0.3}, ... normalize={"bm25": "bayes", "dense": "cosine"} # cosine = no-op ... )
>>> # No normalization (raw scores after conversion only) >>> reranker = WeightedReranker( ... metrics={"bm25": MetricType.IP}, ... normalize=None ... )
>>> # Schema auto-detection (recommended with zvec) >>> import zvec >>> collection = zvec.open("./my_collection") >>> reranker = WeightedReranker( ... schema=collection.schema, ... weights={"dense": 0.7, "bm25": 0.3}, ... normalize=True ... )
Note
Distance to similarity conversion is applied before normalization: - COSINE:
2 - score(distance [0,2] -> similarity [0,2]) - L2:-score(inverts order) - IP: no conversion (already similarity, including BM25 scores)See also
RrfReranker: Rank-based fusion (uses ranks, not scores).
- __init__(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Initialize WeightedReranker.
- Parameters:
topn (int) – Number of top documents to return.
rerank_field (Optional[str]) – Ignored.
weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.
normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: -
True(default): Smart default - COSINE -> no-op, others -> “bayes” -"bayes": Bayesian sigmoid calibration for all sources -"minmax": (x - min) / (max - min) for all sources -"percentile": Rank-based normalization for all sources -"cosine": No-op (identity). COSINE scores already in [0, 1] -"default": Min-max with avgscore -dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} -NoneorFalse: No normalization (raw scores after conversion)metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for distance-to-similarity conversion. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema.
schema (Optional[CollectionSchema]) – Collection schema to automatic extract metrics from.
- Raises:
ValueError – If neither metrics nor schema is provided.
- rerank(query_results, query=None)[source]
Convert scores and compute weighted fusion.
Steps: 1. Convert metrics to ensure higher=better:
COSINE: (2 - score) / 2
L2: -score (inverts order)
IP: no conversion
Apply normalization per source (COSINE: skipped, others: bayes by default)
Filter out documents with normalized score <= 0
Compute weighted fusion
- Parameters:
- Returns:
Reranked documents with weighted scores.
- Return type:
list[Doc]
Note
COSINE scores are NOT additionally normalized after conversion, since (2-score)/2 already produces scores in [0, 1].
MultiFieldWeightedReranker
- class zvec_db.rerankers.MultiFieldWeightedReranker(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Reranker that combines scores from multiple sources and document fields.
This reranker extends the standard weighted fusion approach by supporting field-level weighting within documents. This is useful when documents have structured fields (e.g., title, content, tags) and you want to weight their contributions differently.
The score fusion is computed as:
\[\text{score}(d) = \sum_{s \in S} w_s \times \sum_{f \in F} w_f \times \text{norm}(\text{score}_{s,f}(d))\]- where:
\(w_s\) is the weight for source \(s\)
\(w_f\) is the weight for field \(f\)
\(\text{norm}\) is the normalization function (Standard or Bayesian)
This is preferred over
NormalizedWeightedRerankerwhen:Documents have structured fields with different importance (title > content).
You need fine-grained control over score contributions.
Different fields use different scoring scales.
- Parameters:
topn (int, optional) – Number of top documents to return. Defaults to 10.
rerank_field (Optional[str], optional) – Ignored. Defaults to None.
metric (Optional[MetricType], optional) – Metric for RAW scores. Default “cosine” because it’s the main use case with zvec/Qdrant. -
MetricType.COSINE: cosine distances [0, 2] -MetricType.L2: L2 distances -MetricType.IP: similarities (inner product, including BM25 scores)source_weights (Optional[dict[str, float]], optional) – Weight per source key. Sources not listed use weight 1.0. Defaults to None (equal weights).
field_weights (Optional[dict[str, float]], optional) – Weight per document field. Fields not listed use weight 1.0. The field is retrieved from
doc.fieldsdictionary. Defaults to None (equal weights for all fields).normalizer_configs (Optional[dict[str, Any]], optional) – A mapping of source keys to their specific normalization configurations.
default_norm_config (Union[bool, str, dict[str, Any]], optional) – The normalization method to use for keys not found in
normalizer_configs. Defaults to True (standard normalization).metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])
schema (Optional[CollectionSchema])
Note
Field scores are expected to be stored in
doc.fields[field_name]as numeric values. If a field is missing or has a non-numeric value, it contributes 0 to the score.Example
>>> reranker = MultiFieldWeightedReranker( ... topn=20, ... source_weights={"bm25": 0.7, "dense": 0.3}, ... field_weights={"title": 3.0, "body": 1.0, "tags": 0.5} ... ) >>> results = reranker.rerank({ ... "bm25": bm25_docs, ... "dense": dense_docs ... })
- __init__(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Initialize MultiFieldWeightedReranker.
- Parameters:
topn (int) – Number of top documents to return.
rerank_field (Optional[str]) – Ignored.
source_weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.
field_weights (Optional[dict[str, float]]) – Weight per document field.
normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: -
True(default): Smart default - COSINE → no-op, others → “bayes” -str: Method name (“bayes”, “minmax”, “percentile”, “cosine”) -dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} -NoneorFalse: No normalization (raw scores after conversion)Note –
"cosine"is a no-op (identity) since COSINE scores are already[0 (in)
`` (1] after conversion)
metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for converting distances to similarities. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema. Required if schema is not provided.
schema (Optional[CollectionSchema]) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema.
- Raises:
ValueError – If neither metrics nor schema is provided.
Example
>>> # Automatic metric detection from collection schema >>> import zvec >>> collection = zvec.open("./my_collection") >>> reranker = MultiFieldWeightedReranker( ... schema=collection.schema, ... source_weights={"bm25": 0.6, "dense": 0.4}, ... field_weights={"title": 3.0, "content": 1.0}, ... normalize=True # Default: bayes for all ... )
- rerank(query_results, query=None)[source]
Normalize scores per-source and compute weighted fusion with field weighting.
This method performs the following steps:
Iterates through each source in
query_results.For each document, computes a field-weighted score.
Applies normalization per source (smart default: COSINE → /2, others → bayes).
Filters out documents with a normalized score of 0.0.
Delegates to
WeightedRerankerfor source-weighted fusion.
- Parameters:
- Returns:
Reranked documents with weighted normalized scores in the
scorefield, sorted by descending score.- Return type:
list[Doc]
Example
>>> query_results = { ... "sparse_bm25": bm25_docs, ... "dense_cosine": dense_docs ... } >>> reranked = reranker.rerank(query_results)
Cross-Encoder Rerankers
BaseCrossEncoderReranker
- class zvec_db.rerankers.BaseCrossEncoderReranker(query, topn=10, rerank_field=None, fusion_score_weight=1.0)[source]
Abstract base class for cross-encoder reranking.
This class provides the common infrastructure for cross-encoder scoring. Subclasses must implement the _compute_scores_batch() method to define their scoring strategy.
- Parameters:
query (str) – Query for reranking. Required.
topn (int, optional) – Number of top documents to return after reranking. Defaults to 10.
rerank_field (Optional[str], optional) – Document field to use for reranking. If None, uses the entire document content. Defaults to None.
fusion_score_weight (float, optional) –
Weight for blending cross-encoder scores with fusion scores.
Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)
weight = 1.0 → 100% cross-encoder, 0% fusion (pure cross-encoder, default)
weight = 0.8 → 80% cross-encoder, 20% fusion
weight = 0.5 → 50% cross-encoder, 50% fusion
weight = 0.0 → 0% cross-encoder, 100% fusion (pure fusion)
Defaults to 1.0 (pure cross-encoder score).
Note
Subclasses must implement _compute_scores_batch() or _compute_score()
Cross-encoder reranking is more accurate but slower than score fusion
For large document sets, consider using max_batch_size to limit API calls
- property fusion_score_weight: float
Weight for blending cross-encoder scores with fusion scores.
- Type:
SentenceTransformerReranker
- class zvec_db.rerankers.SentenceTransformerReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Cross-encoder reranker using Sentence Transformers models locally.
This reranker uses the CrossEncoder class from sentence-transformers to compute relevance scores between query and document pairs. Unlike API-based cross-encoders, this runs entirely locally on CPU or GPU.
SentenceTransformer CrossEncoder models output a single score via sigmoid for binary relevance (relevant/not relevant).
- Parameters:
query (str) – Query for reranking. Required.
topn (int, optional) – Number of top documents to return. Defaults to 10.
model_name (str, optional) – CrossEncoder model name from HuggingFace. Examples: - “cross-encoder/ms-marco-MiniLM-L-6-v2” (fast, good quality) - “cross-encoder/ms-marco-TinyBERT-L-2-v2” (very fast) - “cross-encoder/stsb-distilroberta-base” (semantic similarity) Defaults to “cross-encoder/ms-marco-MiniLM-L-6-v2”.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.
batch_size (int, optional) – Batch size for inference. Defaults to 32.
show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.
fusion_score_weight (float, optional) –
Weight for blending cross-encoder scores with fusion scores.
Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)
weight = 1.0 → 100% cross-encoder, 0% fusion (default)
weight = 0.8 → 80% cross-encoder, 20% fusion
weight = 0.5 → 50% cross-encoder, 50% fusion
weight = 0.0 → 0% cross-encoder, 100% fusion
Defaults to 1.0 (pure cross-encoder score).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to CrossEncoder constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).
Example
>>> from zvec_db.rerankers.cross_encoder import SentenceTransformerReranker >>> >>> # Binary relevance reranker >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... ) >>> >>> results = reranker.rerank({"bm25": bm25_docs}) >>> >>> # Blended scores: 80% cross-encoder + 20% fusion >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... fusion_score_weight=0.8, ... ) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for private models >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="org/private-model", ... model_kwargs={"token": "hf_..."}, ... ) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for dtype (float16 for reduced memory) >>> import torch >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... model_kwargs={"torch_dtype": torch.float16}, ... ) >>> results = reranker.rerank({"bm25": docs})
Note
Requires the sentence-transformers package
Models are downloaded automatically on first use
GPU acceleration available if CUDA is installed
Models output scores in [0, 1] via sigmoid
See also
OpenAIReranker: API-based cross-encoder with LLM.
- __init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
- fit(documents)[source]
Initialize the reranker by loading the model.
For Sentence Transformers CrossEncoder, this loads the model. No training is performed as models are pre-trained.
- property batch_size
- property device
- property fusion_score_weight: float
Weight for blending cross-encoder scores with fusion scores.
- Type:
- property max_length
- property model_kwargs
- property model_name
- rerank(query_results, query=None)
Rerank documents using cross-encoder scoring.
- property schema: CollectionSchema | None
The collection schema if provided.
- Type:
CollectionSchema | None
- property show_progress_bar
ClassificationReranker
- class zvec_db.rerankers.ClassificationReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Multi-class classification reranker using HuggingFace transformers.
This reranker uses a multi-class classification model from HuggingFace (via the transformers library) and computes the expected value of the class distribution:
\[E[\text{score}] = \frac{\sum_{i} prob_i \times i}{num\_classes - 1}\]The model outputs logits for each class (0, 1, 2, …, num_classes-1). Softmax is applied to get probabilities, then expected value is computed and normalized to [0, 1].
- Parameters:
query (str) – Query for reranking. Required.
topn (int, optional) – Number of top documents to return. Defaults to 10.
model_name (str, optional) –
Classification model name from HuggingFace. Should be a model fine-tuned for text classification with multiple labels. Examples: “cross-encoder/ms-marco-MiniLM-L-6-v2” (binary),
”nboost/pt-bert-base-uncased-msmarco” (binary), or any model with config.num_labels set.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
num_classes (Optional[int], optional) – Number of classes for classification. If None, will be inferred from model.config.num_labels. For binary: 2 (classes 0 and 1) For multi-class: e.g., 5 for 0-4 relevance scale. Defaults to None (auto-infer).
rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.
batch_size (int, optional) – Batch size for inference. Defaults to 32.
show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.
fusion_score_weight (float, optional) –
Weight for blending cross-encoder scores with fusion scores.
Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)
weight = 1.0 → 100% cross-encoder, 0% fusion (default)
weight = 0.8 → 80% cross-encoder, 20% fusion
weight = 0.5 → 50% cross-encoder, 50% fusion
weight = 0.0 → 0% cross-encoder, 100% fusion
Defaults to 1.0 (pure cross-encoder score).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to AutoModelForSequenceClassification and AutoTokenizer. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto” for auto-detection) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”, “sdpa”) - load_in_8bit: Enable 8-bit quantization (requires bitsandbytes) - load_in_4bit: Enable 4-bit quantization (requires bitsandbytes) - device_map: Device mapping for distributed loading (e.g., “auto”, “balanced”) Defaults to None (no additional kwargs).
Example
>>> from zvec_db.rerankers.cross_encoder import ClassificationReranker >>> >>> # Binary classification (num_classes inferred from model) >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... ) >>> >>> # Multi-level relevance with explicit num_classes >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="your-multi-class-classifier", ... num_classes=5, ... topn=10, ... ) >>> >>> reranker.fit([]) # Load model >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for private models or custom options >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="org/private-model", ... model_kwargs={"token": "hf_...", "trust_remote_code": True}, ... ) >>> reranker.fit([]) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for dtype (float16 for reduced memory) >>> import torch >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... model_kwargs={"torch_dtype": torch.float16}, ... ) >>> reranker.fit([]) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for 8-bit quantization (requires bitsandbytes) >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... model_kwargs={"load_in_8bit": True}, ... ) >>> reranker.fit([]) >>> results = reranker.rerank({"bm25": docs})
Note
Requires the transformers and torch packages
Model must be trained/fine-tuned for multi-class text classification
num_classes is inferred from model.config.num_labels if not provided
GPU acceleration available if CUDA is installed
Scores are normalized to [0, 1] via expected value
See also
OpenAIDecoderReranker: API-based classification with LLM logprobs.
- __init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
- property batch_size
- property device
- property fusion_score_weight: float
Weight for blending cross-encoder scores with fusion scores.
- Type:
- property max_length
- property model_kwargs
- property model_name
- property num_classes
- rerank(query_results, query=None)
Rerank documents using cross-encoder scoring.
- property schema: CollectionSchema | None
The collection schema if provided.
- Type:
CollectionSchema | None
- property show_progress_bar
OpenAIReranker
- class zvec_db.rerankers.OpenAIReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Cross-encoder reranker using OpenAI-compatible /rerank or /score endpoints.
Uses vLLM’s native endpoints: /rerank for query-document scoring, /score for text pair similarity. Both return scores in [0, 1].
- Parameters:
query (str) – Query for reranking. Required.
topn (int) – Number of top documents to return. Defaults to 10.
base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.
api_key (Optional[str]) – API key. Defaults to None.
model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.
endpoint (Literal["rerank", "score"]) – Endpoint to use. Defaults to “rerank”.
timeout (float) – HTTP timeout in seconds. Defaults to 30.0.
rerank_field (Optional[str]) – Document field for scoring. Defaults to None.
fusion_score_weight (float) – Weight for cross-encoder vs fusion scores. 1.0 = pure cross-encoder, 0.0 = pure fusion. Defaults to 1.0.
truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.
max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.
initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.
max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.
exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.
jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.
retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.
Example
>>> from zvec_db.rerankers.cross_encoder import OpenAIReranker >>> reranker = OpenAIReranker( ... query="machine learning", ... endpoint="rerank", ... base_url="http://localhost:8000", ... ) >>> results = reranker.rerank({"bm25": docs})
>>> # With custom retry settings for production >>> reranker = OpenAIReranker( ... query="machine learning", ... max_retries=5, ... initial_delay=2.0, ... max_delay=120.0, ... )
Note: Requires vLLM with /rerank or /score endpoint enabled.
- __init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
- Parameters:
query (str)
topn (int)
base_url (str)
api_key (str | None)
model (str)
endpoint (Literal['rerank', 'score'])
timeout (float)
rerank_field (str | None)
fusion_score_weight (float)
truncate_prompt_tokens (int | None)
max_retries (int)
initial_delay (float)
max_delay (float)
exponential_base (float)
jitter (float)
retry_config (RetryConfig | None)
- property api_key
- property base_url
- property endpoint
- property fusion_score_weight: float
Weight for blending cross-encoder scores with fusion scores.
- Type:
- property model
- rerank(query_results, query=None)
Rerank documents using cross-encoder scoring.
- property schema: CollectionSchema | None
The collection schema if provided.
- Type:
CollectionSchema | None
- property timeout
- property truncate_prompt_tokens
OpenAIEncoderReranker
- class zvec_db.rerankers.OpenAIEncoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]
Cross-encoder reranker using the /classify endpoint for encoder models.
Uses vLLM’s /classify endpoint for encoder models (BERT, RoBERTa). Computes expected value score from class probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)
- Parameters:
query (str) – Query for reranking. Required.
topn (int) – Number of top documents to return. Defaults to 10.
base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.
api_key (Optional[str]) – API key. Defaults to None.
model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.
num_classes (Optional[int]) – Number of classes. Auto-detected if None.
timeout (float) – HTTP timeout in seconds. Defaults to 30.0.
rerank_field (Optional[str]) – Document field for scoring.
fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.
separator (str) – Query-document separator. Defaults to “ “.
truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.
Example
>>> from zvec_db.rerankers.cross_encoder import OpenAIEncoderReranker >>> reranker = OpenAIEncoderReranker( ... query="machine learning", ... num_classes=2, ... base_url="http://localhost:8000", ... ) >>> results = reranker.rerank({"bm25": docs})
Note: Requires vLLM with /classify endpoint enabled.
- __init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]
- property api_key
- property base_url
- property fusion_score_weight: float
Weight for blending cross-encoder scores with fusion scores.
- Type:
- property model
- property num_classes
- rerank(query_results, query=None)
Rerank documents using cross-encoder scoring.
- property schema: CollectionSchema | None
The collection schema if provided.
- Type:
CollectionSchema | None
- property separator
- property timeout
- property truncate_prompt_tokens
OpenAIDecoderReranker
- class zvec_db.rerankers.OpenAIDecoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]
Cross-encoder reranker using LLM logprobs with structured output.
Uses /chat/completions with logprobs and regex-constrained output. Computes expected value score from log probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)
- Parameters:
query (str) – Query for reranking. Required.
topn (int) – Number of top documents to return. Defaults to 10.
base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.
api_key (Optional[str]) – API key. Defaults to None.
model (str) – Model identifier. Defaults to “gpt-4o-mini”.
num_classes (int) – Number of classes. Defaults to 2.
timeout (float) – HTTP timeout in seconds. Defaults to 30.0.
max_batch_size (Optional[int]) – Max documents per batch. Default None.
rerank_field (Optional[str]) – Document field for scoring.
fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.
concurrency (int) – Concurrent API calls. Defaults to 4.
Example
>>> from zvec_db.rerankers.cross_encoder import OpenAIDecoderReranker >>> reranker = OpenAIDecoderReranker( ... query="machine learning", ... num_classes=2, ... model="gpt-4o-mini", ... ) >>> results = reranker.rerank({"bm25": docs})
Note: Requires model with logprobs support (–enable-logprobs for vLLM).
- MAX_CLASSES = 10
- __init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]
- property api_key
- property base_url
- property concurrency
- property fusion_score_weight: float
Weight for blending cross-encoder scores with fusion scores.
- Type:
- property max_batch_size
- property model
- property num_classes
- rerank(query_results, query=None)
Rerank documents using cross-encoder scoring.
- property schema: CollectionSchema | None
The collection schema if provided.
- Type:
CollectionSchema | None
- property timeout
Utilities
Normalize
- class zvec_db.rerankers.Normalize(config=None)[source]
Callable normaliser for lists of
(uid, score)pairs.Instances behave like functions: call them with a score list and an optional
avgscoreand the result will be a new list with all scores mapped into the closed unit interval. The precise transformation is determined by the configuration supplied at construction time.- beta
Centre parameter used in Bayesian modes;
Nonetriggers median-based automatic selection.- Type:
Optional[float]
- __init__(config=None)[source]
Initialise a
Normalizeinstance.- Parameters:
config (bool, str, dict or None, optional) –
Configuration object that selects the normalisation strategy. The following forms are interpreted:
NoneorFalse: equivalent to"default"- standard index-aware scaling.truthy non-dict : also selects the default behaviour.
str: the string value is converted to lower case and used as themethodname. Supported methods: -"bayes","bayesian","bb25": Bayesian sigmoid calibration -"minmax": (x - min) / (max - min) -"percentile"(alias:"rank") : rank-based normalization -"default": standard index-aware scalingdict: a copy of the dictionary is stored, and may contain the keysmethod(string),alpha(float) andbeta(float orNone). Any missing keys will be filled with defaults (alphadefaults to1.0;betatoNone).
Notes
The configuration is shallow-copied to prevent external modification from affecting the normaliser’s internal state.
- __call__(scores, avgscore=0.0)[source]
Normalise a list of document scores.
- Parameters:
scores (ScoreList) – Sequence of
(uid, score)pairs, typically produced by a retrieval algorithm. It is assumed that the list is sorted in descending order of score; the method will use the first entry to compute the maximum when performing default scaling.avgscore (float, optional) – Average score computed over the entire corpus. This is only used by the
defaultnormalisation strategy. In Bayesian modes the value is ignored entirely.
- Returns:
New list where each score has been replaced with a value in
[0.0, 1.0]according to the chosen transformation.- Return type:
ScoreList
Notes
Multiple normalisation methods are supported:
default – scales scores relative to an estimated maximum and clips values. This keeps the relative ordering intact but bounds the range.
bayesian – applies a sigmoid function calibrated using the positive scores only. Negative or zero input scores are mapped to
0.0unconditionally. Robust to outliers.minmax – (x - min) / (max - min). Preserves relative distances.
percentile – rank-based normalization. Very robust to outliers.
cosine – no-op (identity). COSINE conversion (2-score)/2 already produces scores in [0, 1], so no additional normalization is needed.
atan – arctan-based normalization:
1 - 2*atan(s)/pifor L2,0.5 + atan(s)/pifor IP. Maps unbounded scores to [0, 1].
PipelineReranker
- class zvec_db.rerankers.PipelineReranker(rerankers, topn=10, rerank_field=None)[source]
Chain multiple rerankers sequentially.
This reranker applies a list of rerankers in sequence, passing the output of one as the input to the next. This is useful for combining different reranking strategies (e.g., RRF followed by cross-encoder).
- Parameters:
Example
>>> pipeline = PipelineReranker([ ... RrfReranker(topn=50, rank_constant=60), ... SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10) ... ]) >>> results = collection.query(..., reranker=pipeline)
- __init__(rerankers, topn=10, rerank_field=None)[source]
Initialize PipelineReranker with a list of rerankers.
- Parameters:
Example
>>> pipeline = PipelineReranker([ ... RrfReranker(topn=50, rank_constant=60), ... SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10) ... ]) >>> results = collection.query(..., reranker=pipeline)