zvec_db.rerankers.fusion.weighted
Classes
|
Weighted fusion with optional normalization and metric conversion. |
- class zvec_db.rerankers.fusion.weighted.WeightedReranker(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Weighted fusion with optional normalization and metric conversion.
This class combines scores from multiple sources using weighted sum:
\[\text{score}(d) = \sum_{s \in S} \text{norm}(\text{score}_s(d)) \times w_s\]where \(w_s\) is the weight for source \(s\).
Features: - Optional distance->similarity conversion (COSINE, L2, IP) - Optional normalization per source (bayes, minmax, percentile) - Smart defaults: COSINE -> no additional normalization, others -> bayes
Distance to similarity conversion: - COSINE:
(2 - score) / 2- distance [0, 2] -> similarity [0, 1] - L2:-score- inverts order - IP: no conversion (already similarity, including BM25 scores)Note
COSINE metric is NEVER additionally normalized - the conversion formula
(2 - score) / 2already produces scores in [0, 1]. Setting normalize for COSINE sources has no effect.Normalization methods (applied AFTER conversion, except for COSINE): - bayes (default for non-COSINE): Bayesian sigmoid calibration - minmax: (x - min) / (max - min) - percentile: rank-based normalization - default: index-aware scaling with avgscore - atan: arctan-based normalization
0.5 + atan(s)/pi(assumes scores already converted to “higher=better”)
- Parameters:
topn (int, optional) – Number of top documents to return. Defaults to 10.
rerank_field (Optional[str], optional) – Ignored. Defaults to None.
weights (Optional[dict[str, float]], optional) – Weight per source. Sources not listed use weight 1.0. Defaults to None (equal weights).
normalize (Union[bool, str, dict[str, Any], None], optional) – Normalization configuration. Can be: -
True(default): Smart default - COSINE -> no norm, others -> “bayes” -str: Method name (“bayes”, “minmax”, “percentile”, “default”, “atan”) -dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: None} -NoneorFalse: No normalization (raw scores after conversion)metrics (Optional[Union[MetricType, dict[str, MetricType]]], optional) –
Metric type(s) for converting distances to similarities. Can be: - A single MetricType (e.g.,
MetricType.COSINE) applied to all sources - A dict mapping source names to their metric type(use
MetricType.IPfor sources that don’t need conversion, e.g., BM25 scores)If None and schema is provided, metrics are inferred from the schema
schema (Optional[CollectionSchema], optional) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema (defaults to IP).
- Raises:
ValueError – If neither metrics nor schema is provided.
Example
>>> # Already normalized scores [0, 1] >>> reranker = WeightedReranker( ... weights={"bm25": 0.7, "dense": 0.3} ... ) >>> results = reranker.rerank({ ... "bm25": bm25_docs_normalized, ... "dense": dense_docs_normalized ... })
>>> # Raw scores with smart default normalization >>> reranker = WeightedReranker( ... weights={"bm25": 0.7, "dense": 0.3}, ... normalize=True # COSINE -> /2, others -> bayes ... ) >>> results = reranker.rerank({"bm25": bm25_docs, "dense": dense_docs})
>>> # Per-source normalization config >>> reranker = WeightedReranker( ... weights={"bm25": 0.7, "dense": 0.3}, ... normalize={"bm25": "bayes", "dense": "cosine"} # cosine = no-op ... )
>>> # No normalization (raw scores after conversion only) >>> reranker = WeightedReranker( ... metrics={"bm25": MetricType.IP}, ... normalize=None ... )
>>> # Schema auto-detection (recommended with zvec) >>> import zvec >>> collection = zvec.open("./my_collection") >>> reranker = WeightedReranker( ... schema=collection.schema, ... weights={"dense": 0.7, "bm25": 0.3}, ... normalize=True ... )
Note
Distance to similarity conversion is applied before normalization: - COSINE:
2 - score(distance [0,2] -> similarity [0,2]) - L2:-score(inverts order) - IP: no conversion (already similarity, including BM25 scores)See also
RrfReranker: Rank-based fusion (uses ranks, not scores).
- __init__(topn=10, rerank_field=None, weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Initialize WeightedReranker.
- Parameters:
topn (int) – Number of top documents to return.
rerank_field (Optional[str]) – Ignored.
weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.
normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: -
True(default): Smart default - COSINE -> no-op, others -> “bayes” -"bayes": Bayesian sigmoid calibration for all sources -"minmax": (x - min) / (max - min) for all sources -"percentile": Rank-based normalization for all sources -"cosine": No-op (identity). COSINE scores already in [0, 1] -"default": Min-max with avgscore -dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} -NoneorFalse: No normalization (raw scores after conversion)metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for distance-to-similarity conversion. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema.
schema (Optional[CollectionSchema]) – Collection schema to automatic extract metrics from.
- Raises:
ValueError – If neither metrics nor schema is provided.
- rerank(query_results, query=None)[source]
Convert scores and compute weighted fusion.
Steps: 1. Convert metrics to ensure higher=better:
COSINE: (2 - score) / 2
L2: -score (inverts order)
IP: no conversion
Apply normalization per source (COSINE: skipped, others: bayes by default)
Filter out documents with normalized score <= 0
Compute weighted fusion
- Parameters:
- Returns:
Reranked documents with weighted scores.
- Return type:
list[Doc]
Note
COSINE scores are NOT additionally normalized after conversion, since (2-score)/2 already produces scores in [0, 1].