zvec_db.rerankers.fusion.weighted

Classes

WeightedReranker([topn, rerank_field, ...])

Weighted fusion with optional normalization and metric conversion.

class zvec_db.rerankers.fusion.weighted.WeightedReranker(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Weighted fusion with optional normalization and metric conversion.

This class combines scores from multiple sources using weighted sum:

\[\text{score}(d) = \sum_{s \in S} \text{norm}(\text{score}_s(d)) \times w_s\]

where \(w_s\) is the weight for source \(s\).

Features: - Optional distance->similarity conversion (COSINE, L2, IP) - Optional normalization per source (bayes, minmax, percentile) - Smart defaults: COSINE -> no additional normalization, others -> bayes

Distance to similarity conversion: - COSINE: (2 - score) / 2 - distance [0, 2] -> similarity [0, 1] - L2: -score - inverts order - IP: no conversion (already similarity, including BM25 scores)

Note

COSINE metric is NEVER additionally normalized - the conversion formula (2 - score) / 2 already produces scores in [0, 1]. Setting normalize for COSINE sources has no effect.

Normalization methods (applied AFTER conversion, except for COSINE): - bayes (default for non-COSINE): Bayesian sigmoid calibration - minmax: (x - min) / (max - min) - percentile: rank-based normalization - default: index-aware scaling with avgscore - atan: arctan-based normalization 0.5 + atan(s)/pi

(assumes scores already converted to “higher=better”)

Parameters:
  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

  • weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).

  • normalize (Union[bool, str, dict[str, Any], None], optional) – Normalization configuration. Can be: - True (default): Smart default - COSINE -> no norm, others -> “bayes” - str: Method name (“bayes”, “minmax”, “percentile”, “default”, “atan”) - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: None} - None or False: No normalization (raw scores after conversion)

  • metrics (Optional[Union[MetricType, dict[str, MetricType]]], optional) –

    Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g., MetricType.COSINE) applied to all sources - A dict mapping source names to their metric type

    (use MetricType.IP for sources that don’t need conversion, e.g., BM25 scores)

    • If None and schema is provided, metrics are inferred from the schema

  • schema (Optional[CollectionSchema], optional) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema (defaults to IP).

Raises:

ValueError – If neither metrics nor schema is provided.

Example

>>> # Already normalized scores [0, 1]
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3}
... )
>>> results = reranker.rerank({
...     "bm25": bm25_docs_normalized,
...     "dense": dense_docs_normalized
... })
>>> # Raw scores with smart default normalization
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize=True  # COSINE -> /2, others -> bayes
... )
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Per-source normalization config
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize={"bm25": "bayes", "dense": "cosine"}  # cosine = no-op
... )
>>> # No normalization (raw scores after conversion only)
>>> reranker = WeightedReranker(
...     metrics={"bm25": MetricType.IP},
...     normalize=None
... )
>>> # Schema auto-detection (recommended with zvec)
>>> import zvec
>>> collection = zvec.open("./my_collection")
>>> reranker = WeightedReranker(
...     schema=collection.schema,
...     weights={"dense": 0.7, "bm25": 0.3},
...     normalize=True
... )

Note

Distance to similarity conversion is applied before normalization: - COSINE: 2 - score (distance [0,2] -> similarity [0,2]) - L2: -score (inverts order) - IP: no conversion (already similarity, including BM25 scores)

See also

RrfReranker: Rank-based fusion (uses ranks, not scores).

__init__(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Initialize WeightedReranker.

Parameters:
  • topn (int) – Number of top documents to return.

  • rerank_field (Optional[str]) – Ignored.

  • weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.

  • normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: - True (default): Smart default - COSINE -> no-op, others -> “bayes” - "bayes": Bayesian sigmoid calibration for all sources - "minmax": (x - min) / (max - min) for all sources - "percentile": Rank-based normalization for all sources - "cosine": No-op (identity). COSINE scores already in [0, 1] - "default": Min-max with avgscore - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} - None or False: No normalization (raw scores after conversion)

  • metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for distance-to-similarity conversion. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema.

  • schema (Optional[CollectionSchema]) – Collection schema to automatic extract metrics from.

Raises:

ValueError – If neither metrics nor schema is provided.

property weights: dict[str, float]
property normalize: bool | str | dict[str, Any] | None
rerank(query_results, query=None)[source]

Convert scores and compute weighted fusion.

Steps: 1. Convert metrics to ensure higher=better:

  • COSINE: (2 - score) / 2

  • L2: -score (inverts order)

  • IP: no conversion

  1. Apply normalization per source (COSINE: skipped, others: bayes by default)

  2. Filter out documents with normalized score <= 0

  3. Compute weighted fusion

Parameters:
  • query_results (dict[str, list[Doc]]) – Dictionary mapping source names to lists of documents.

  • query (Optional[str], optional) – Ignored. Defaults to None.

Returns:

Reranked documents with weighted scores.

Return type:

list[Doc]

Note

COSINE scores are NOT additionally normalized after conversion, since (2-score)/2 already produces scores in [0, 1].