zvec_db.rerankers.fusion.weighted

Classes

WeightedReranker([topn, rerank_field, ...])

Weighted fusion with optional normalization and metric conversion.

class zvec_db.rerankers.fusion.weighted.WeightedReranker(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Weighted fusion with optional normalization and metric conversion.

This class combines scores from multiple sources using weighted sum:

\[\text{score}(d) = \sum_{s \in S} \text{norm}(\text{score}_s(d)) \times w_s\]

where \(w_s\) is the weight for source \(s\).

Features: - Optional distance->similarity conversion (COSINE, L2, IP) - Optional normalization per source (bayes, minmax, percentile) - Smart defaults: COSINE -> no additional normalization, others -> bayes

Distance to similarity conversion: - COSINE: (2 - score) / 2 - distance [0, 2] -> similarity [0, 1] - L2: -score - inverts order - IP: no conversion (already similarity, including BM25 scores)

Note

COSINE metric is NEVER additionally normalized - the conversion formula (2 - score) / 2 already produces scores in [0, 1]. Setting normalize for COSINE sources has no effect.

Normalization methods (applied AFTER conversion, except for COSINE): - bayes (default for non-COSINE): Bayesian sigmoid calibration - minmax: (x - min) / (max - min) - percentile: rank-based normalization - default: index-aware scaling with avgscore - atan: arctan-based normalization 0.5 + atan(s)/pi

(assumes scores already converted to “higher=better”)

Parameters:

topn (int, optional) – Number of top documents to return. Defaults to 10.
rerank_field (Optional[str], optional) – Ignored. Defaults to None.
weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).
normalize (Union[bool, str, dict[str, Any], None], optional) – Normalization configuration. Can be: - True (default): Smart default - COSINE -> no norm, others -> “bayes” - str: Method name (“bayes”, “minmax”, “percentile”, “default”, “atan”) - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: None} - None or False: No normalization (raw scores after conversion)
metrics (Optional[Union[MetricType, dict[str, MetricType]]], optional) –
Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g., MetricType.COSINE) applied to all sources - A dict mapping source names to their metric type

(use MetricType.IP for sources that don’t need conversion, e.g., BM25 scores)
- If None and schema is provided, metrics are inferred from the schema
schema (Optional[CollectionSchema], optional) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema (defaults to IP).

Raises:

ValueError – If neither metrics nor schema is provided.

Example

>>> # Already normalized scores [0, 1]
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3}
... )
>>> results = reranker.rerank({
...     "bm25": bm25_docs_normalized,
...     "dense": dense_docs_normalized
... })

>>> # Raw scores with smart default normalization
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize=True  # COSINE -> /2, others -> bayes
... )
>>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})

>>> # Per-source normalization config
>>> reranker = WeightedReranker(
...     weights={"bm25": 0.7, "dense": 0.3},
...     normalize={"bm25": "bayes", "dense": "cosine"}  # cosine = no-op
... )

>>> # No normalization (raw scores after conversion only)
>>> reranker = WeightedReranker(
...     metrics={"bm25": MetricType.IP},
...     normalize=None
... )

>>> # Schema auto-detection (recommended with zvec)
>>> import zvec
>>> collection = zvec.open("./my_collection")
>>> reranker = WeightedReranker(
...     schema=collection.schema,
...     weights={"dense": 0.7, "bm25": 0.3},
...     normalize=True
... )

Note

Distance to similarity conversion is applied before normalization: - COSINE: 2 - score (distance [0,2] -> similarity [0,2]) - L2: -score (inverts order) - IP: no conversion (already similarity, including BM25 scores)