zvec_db.rerankers.fusion

Fusion-based rerankers for combining multiple retrieval results.

class zvec_db.rerankers.fusion.RrfReranker(topn=10, rerank_field=None, rank_constant=60, weights=None, normalize=None, metrics=None, schema=None)[source]

Reciprocal Rank Fusion (RRF) reranker with optional source weighting.

RRF combines results from multiple ranked lists by computing a fused score based on the reciprocal of each document’s rank:

\[\text{RRF}(d) = \sum_{r \in R} w_r \times \frac{1}{k + \text{rank}(d, r)}\]
where:
  • \(k\) is the rank_constant (default: 60)

  • \(w_r\) is the weight for source \(r\) (default: 1.0)

By default, all sources have equal weight. Use the weights parameter to favor certain sources over others.

Parameters:
  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored by RRF. Defaults to None.

  • rank_constant (int, optional) – Smoothing constant \(k\) in the RRF formula. Larger values reduce the impact of early ranks. Defaults to 60.

  • weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).

  • normalize (Optional[Union[bool, str, dict]], optional) – Ignored for RRF. RRF uses ranks, not scores, so normalization has no effect. Setting this parameter will emit a warning. Defaults to None.

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])

  • schema (Optional['CollectionSchema'])

Example

>>> # Basic RRF with default parameters
>>> reranker = RrfReranker(topn=10)
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Weighted RRF: favor dense embeddings (70%) over BM25 (30%)
>>> reranker = RrfReranker(
...     topn=10,
...     weights={"dense": 0.7, "bm25": 0.3}
... )
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Custom rank constant (higher = more uniform ranking)
>>> reranker = RrfReranker(topn=10, rank_constant=100)
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})

Note

RRF uses only document ranks, not raw scores. This makes it robust to score scale differences between sources (e.g., BM25 scores vs. cosine similarities). Normalization is not applicable to RRF.

See also

WeightedReranker: For weighted fusion based on scores rather than ranks.

__init__(topn=10, rerank_field=None, rank_constant=60, weights=None, normalize=None, metrics=None, schema=None)[source]
Parameters:
  • topn (int)

  • rerank_field (Optional[str])

  • rank_constant (int)

  • weights (Optional[dict[str, float]])

  • normalize (Optional[Union[bool, str, dict]])

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])

  • schema (Optional['CollectionSchema'])

property rank_constant: int
property weights: dict[str, float]
property normalize: bool | str | dict | None
rerank(query_results, query=None)[source]

Apply Reciprocal Rank Fusion to combine multiple query results.

Parameters:
  • query_results (dict[str, list[Doc]]) – Results from one or more vector queries. Keys are source names (e.g., “bm25”, “dense”), values are ranked document lists.

  • query (Optional[str], optional) – Ignored. Defaults to None.

Returns:

Reranked documents with RRF scores in the score field,

sorted by descending score.

Return type:

list[Doc]

Example

>>> reranker = RrfReranker(topn=5)
>>> results = reranker.rerank({
...     "bm25": bm25_results,
...     "dense": dense_results
... })
>>> print(f"Top document: {results[0].id} (score: {results[0].score:.4f})")
class zvec_db.rerankers.fusion.WeightedReranker(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Weighted fusion with optional normalization and metric conversion.

This class combines scores from multiple sources using weighted sum:

\[\text{score}(d) = \sum_{s \in S} \text{norm}(\text{score}_s(d)) \times w_s\]

where \(w_s\) is the weight for source \(s\).

Features: - Optional distance->similarity conversion (COSINE, L2, IP) - Optional normalization per source (bayes, minmax, percentile) - Smart defaults: COSINE -> no additional normalization, others -> bayes

Distance to similarity conversion: - COSINE: (2 - score) / 2 - distance [0, 2] -> similarity [0, 1] - L2: -score - inverts order - IP: no conversion (already similarity, including BM25 scores)

Note

COSINE metric is NEVER additionally normalized - the conversion formula (2 - score) / 2 already produces scores in [0, 1]. Setting normalize for COSINE sources has no effect.

Normalization methods (applied AFTER conversion, except for COSINE): - bayes (default for non-COSINE): Bayesian sigmoid calibration - minmax: (x - min) / (max - min) - percentile: rank-based normalization - default: index-aware scaling with avgscore - atan: arctan-based normalization 0.5 + atan(s)/pi

(assumes scores already converted to “higher=better”)

Parameters:
  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

  • weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).

  • normalize (Union[bool, str, dict[str, Any], None], optional) – Normalization configuration. Can be: - True (default): Smart default - COSINE -> no norm, others -> “bayes” - str: Method name (“bayes”, “minmax”, “percentile”, “default”, “atan”) - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: None} - None or False: No normalization (raw scores after conversion)

  • metrics (Optional[Union[MetricType, dict[str, MetricType]]], optional) –

    Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g., MetricType.COSINE) applied to all sources - A dict mapping source names to their metric type

    (use MetricType.IP for sources that don’t need conversion, e.g., BM25 scores)

    • If None and schema is provided, metrics are inferred from the schema

  • schema (Optional[CollectionSchema], optional) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema (defaults to IP).

Raises:

ValueError – If neither metrics nor schema is provided.

Example

>>> # Already normalized scores [0, 1]
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3}
... )
>>> results = reranker.rerank({
...     "bm25": bm25_docs_normalized,
...     "dense": dense_docs_normalized
... })
>>> # Raw scores with smart default normalization
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize=True  # COSINE -> /2, others -> bayes
... )
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Per-source normalization config
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize={"bm25": "bayes", "dense": "cosine"}  # cosine = no-op
... )
>>> # No normalization (raw scores after conversion only)
>>> reranker = WeightedReranker(
...     metrics={"bm25": MetricType.IP},
...     normalize=None
... )
>>> # Schema auto-detection (recommended with zvec)
>>> import zvec
>>> collection = zvec.open("./my_collection")
>>> reranker = WeightedReranker(
...     schema=collection.schema,
...     weights={"dense": 0.7, "bm25": 0.3},
...     normalize=True
... )

Note

Distance to similarity conversion is applied before normalization: - COSINE: 2 - score (distance [0,2] -> similarity [0,2]) - L2: -score (inverts order) - IP: no conversion (already similarity, including BM25 scores)

See also

RrfReranker: Rank-based fusion (uses ranks, not scores).

__init__(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Initialize WeightedReranker.

Parameters:
  • topn (int) – Number of top documents to return.

  • rerank_field (Optional[str]) – Ignored.

  • weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.

  • normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: - True (default): Smart default - COSINE -> no-op, others -> “bayes” - "bayes": Bayesian sigmoid calibration for all sources - "minmax": (x - min) / (max - min) for all sources - "percentile": Rank-based normalization for all sources - "cosine": No-op (identity). COSINE scores already in [0, 1] - "default": Min-max with avgscore - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} - None or False: No normalization (raw scores after conversion)

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for distance-to-similarity conversion. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema.

  • schema (Optional[CollectionSchema]) – Collection schema to automatic extract metrics from.

Raises:

ValueError – If neither metrics nor schema is provided.

property weights: dict[str, float]
property normalize: bool | str | dict[str, Any] | None
rerank(query_results, query=None)[source]

Convert scores and compute weighted fusion.

Steps: 1. Convert metrics to ensure higher=better:

  • COSINE: (2 - score) / 2

  • L2: -score (inverts order)

  • IP: no conversion

  1. Apply normalization per source (COSINE: skipped, others: bayes by default)

  2. Filter out documents with normalized score <= 0

  3. Compute weighted fusion

Parameters:
  • query_results (dict[str, list[Doc]]) – Dictionary mapping source names to lists of documents.

  • query (Optional[str], optional) – Ignored. Defaults to None.

Returns:

Reranked documents with weighted scores.

Return type:

list[Doc]

Note

COSINE scores are NOT additionally normalized after conversion, since (2-score)/2 already produces scores in [0, 1].

class zvec_db.rerankers.fusion.MultiFieldWeightedReranker(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Reranker that combines scores from multiple sources and document fields.

This reranker extends the standard weighted fusion approach by supporting field-level weighting within documents. This is useful when documents have structured fields (e.g., title, content, tags) and you want to weight their contributions differently.

The score fusion is computed as:

\[\text{score}(d) = \sum_{s \in S} w_s \times \sum_{f \in F} w_f \times \text{norm}(\text{score}_{s,f}(d))\]
where:
  • \(w_s\) is the weight for source \(s\)

  • \(w_f\) is the weight for field \(f\)

  • \(\text{norm}\) is the normalization function (Standard or Bayesian)

This is preferred over NormalizedWeightedReranker when:

  • Documents have structured fields with different importance (title > content).

  • You need fine-grained control over score contributions.

  • Different fields use different scoring scales.

Parameters:
  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

  • metric (Optional[MetricType], optional) – Metric for RAW scores. Default “cosine” because it’s the main use case with zvec/Qdrant. - MetricType.COSINE : cosine distances [0, 2] - MetricType.L2 : L2 distances - MetricType.IP : similarities (inner product, including BM25 scores)

  • source_weights (Optional[dict[str, float]], optional) – Weight per source key. Sources not listed use weight 1.0. Defaults to None (equal weights).

  • field_weights (Optional[dict[str, float]], optional) – Weight per document field. Fields not listed use weight 1.0. The field is retrieved from doc.fields dictionary. Defaults to None (equal weights for all fields).

  • normalizer_configs (Optional[dict[str, Any]], optional) – A mapping of source keys to their specific normalization configurations.

  • default_norm_config (Union[bool, str, dict[str, Any]], optional) – The normalization method to use for keys not found in normalizer_configs. Defaults to True (standard normalization).

  • weights (Optional[dict[str, float]])

  • normalize (Union[bool, str, dict[str, Any], None])

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])

  • schema (Optional[CollectionSchema])

Note

Field scores are expected to be stored in doc.fields[field_name] as numeric values. If a field is missing or has a non-numeric value, it contributes 0 to the score.

Example

>>> reranker = MultiFieldWeightedReranker(
...     topn=20,
...     source_weights={"bm25": 0.7, "dense": 0.3},
...     field_weights={"title": 3.0, "body": 1.0, "tags": 0.5}
... )
>>> results = reranker.rerank({
...     "bm25": bm25_docs,
...     "dense": dense_docs
... })
__init__(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Initialize MultiFieldWeightedReranker.

Parameters:
  • topn (int) – Number of top documents to return.

  • rerank_field (Optional[str]) – Ignored.

  • source_weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.

  • field_weights (Optional[dict[str, float]]) – Weight per document field.

  • normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: - True (default): Smart default - COSINE → no-op, others → “bayes” - str: Method name (“bayes”, “minmax”, “percentile”, “cosine”) - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} - None or False: No normalization (raw scores after conversion)

  • Note"cosine" is a no-op (identity) since COSINE scores are already

  • [0 (in)

  • `` (1] after conversion)

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for converting distances to similarities. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema. Required if schema is not provided.

  • schema (Optional[CollectionSchema]) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema.

  • weights (Optional[dict[str, float]])

Raises:

ValueError – If neither metrics nor schema is provided.

Example

>>> # Automatic metric detection from collection schema
>>> import zvec
>>> collection = zvec.open("./my_collection")
>>> reranker = MultiFieldWeightedReranker(
...     schema=collection.schema,
...     source_weights={"bm25": 0.6, "dense": 0.4},
...     field_weights={"title": 3.0, "content": 1.0},
...     normalize=True  # Default: bayes for all
... )
rerank(query_results, query=None)[source]

Normalize scores per-source and compute weighted fusion with field weighting.

This method performs the following steps:

  1. Iterates through each source in query_results.

  2. For each document, computes a field-weighted score.

  3. Applies normalization per source (smart default: COSINE → /2, others → bayes).

  4. Filters out documents with a normalized score of 0.0.

  5. Delegates to WeightedReranker for source-weighted fusion.

Parameters:
  • query_results (dict[str, list[Doc]]) – Dictionary mapping source names to lists of documents. Each document should have id, score, and fields with numeric values for field scoring.

  • query (str | None)

Returns:

Reranked documents with weighted normalized scores in the score field, sorted by descending score.

Return type:

list[Doc]

Example

>>> query_results = {
...     "sparse_bm25": bm25_docs,
...     "dense_cosine": dense_docs
... }
>>> reranked = reranker.rerank(query_results)

Modules

multi_field

rrf

weighted