zvec_db.rerankers.fusion.multi_field
Classes
|
Reranker that combines scores from multiple sources and document fields. |
- class zvec_db.rerankers.fusion.multi_field.MultiFieldWeightedReranker(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Reranker that combines scores from multiple sources and document fields.
This reranker extends the standard weighted fusion approach by supporting field-level weighting within documents. This is useful when documents have structured fields (e.g., title, content, tags) and you want to weight their contributions differently.
The score fusion is computed as:
\[\text{score}(d) = \sum_{s \in S} w_s \times \sum_{f \in F} w_f \times \text{norm}(\text{score}_{s,f}(d))\]- where:
\(w_s\) is the weight for source \(s\)
\(w_f\) is the weight for field \(f\)
\(\text{norm}\) is the normalization function (Standard or Bayesian)
This is preferred over
NormalizedWeightedRerankerwhen:Documents have structured fields with different importance (title > content).
You need fine-grained control over score contributions.
Different fields use different scoring scales.
- Parameters:
topn (int, optional) – Number of top documents to return. Defaults to 10.
rerank_field (Optional[str], optional) – Ignored. Defaults to None.
metric (Optional[MetricType], optional) – Metric for RAW scores. Default “cosine” because it’s the main use case with zvec/Qdrant. -
MetricType.COSINE: cosine distances [0, 2] -MetricType.L2: L2 distances -MetricType.IP: similarities (inner product, including BM25 scores)source_weights (Optional[dict[str, float]], optional) – Weight per source key. Sources not listed use weight 1.0. Defaults to None (equal weights).
field_weights (Optional[dict[str, float]], optional) – Weight per document field. Fields not listed use weight 1.0. The field is retrieved from
doc.fieldsdictionary. Defaults to None (equal weights for all fields).normalizer_configs (Optional[dict[str, Any]], optional) – A mapping of source keys to their specific normalization configurations.
default_norm_config (Union[bool, str, dict[str, Any]], optional) – The normalization method to use for keys not found in
normalizer_configs. Defaults to True (standard normalization).metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])
schema (Optional[CollectionSchema])
Note
Field scores are expected to be stored in
doc.fields[field_name]as numeric values. If a field is missing or has a non-numeric value, it contributes 0 to the score.Example
>>> reranker = MultiFieldWeightedReranker( ... topn=20, ... source_weights={"bm25": 0.7, "dense": 0.3}, ... field_weights={"title": 3.0, "body": 1.0, "tags": 0.5} ... ) >>> results = reranker.rerank({ ... "bm25": bm25_docs, ... "dense": dense_docs ... })
- __init__(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]
Initialize MultiFieldWeightedReranker.
- Parameters:
topn (int) – Number of top documents to return.
rerank_field (Optional[str]) – Ignored.
source_weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.
field_weights (Optional[dict[str, float]]) – Weight per document field.
normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: -
True(default): Smart default - COSINE → no-op, others → “bayes” -str: Method name (“bayes”, “minmax”, “percentile”, “cosine”) -dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} -NoneorFalse: No normalization (raw scores after conversion)Note –
"cosine"is a no-op (identity) since COSINE scores are already[0 (in)
`` (1] after conversion)
metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for converting distances to similarities. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema. Required if schema is not provided.
schema (Optional[CollectionSchema]) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema.
- Raises:
ValueError – If neither metrics nor schema is provided.
Example
>>> # Automatic metric detection from collection schema >>> import zvec >>> collection = zvec.open("./my_collection") >>> reranker = MultiFieldWeightedReranker( ... schema=collection.schema, ... source_weights={"bm25": 0.6, "dense": 0.4}, ... field_weights={"title": 3.0, "content": 1.0}, ... normalize=True # Default: bayes for all ... )
- rerank(query_results, query=None)[source]
Normalize scores per-source and compute weighted fusion with field weighting.
This method performs the following steps:
Iterates through each source in
query_results.For each document, computes a field-weighted score.
Applies normalization per source (smart default: COSINE → /2, others → bayes).
Filters out documents with a normalized score of 0.0.
Delegates to
WeightedRerankerfor source-weighted fusion.
- Parameters:
- Returns:
Reranked documents with weighted normalized scores in the
scorefield, sorted by descending score.- Return type:
list[Doc]
Example
>>> query_results = { ... "sparse_bm25": bm25_docs, ... "dense_cosine": dense_docs ... } >>> reranked = reranker.rerank(query_results)