zvec_db.rerankers.fusion.multi_field

Classes

MultiFieldWeightedReranker([topn, ...])

Reranker that combines scores from multiple sources and document fields.

class zvec_db.rerankers.fusion.multi_field.MultiFieldWeightedReranker(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Reranker that combines scores from multiple sources and document fields.

This reranker extends the standard weighted fusion approach by supporting field-level weighting within documents. This is useful when documents have structured fields (e.g., title, content, tags) and you want to weight their contributions differently.

The score fusion is computed as:

\[\text{score}(d) = \sum_{s \in S} w_s \times \sum_{f \in F} w_f \times \text{norm}(\text{score}_{s,f}(d))\]

where:

\(w_s\) is the weight for source \(s\)
\(w_f\) is the weight for field \(f\)
\(\text{norm}\) is the normalization function (Standard or Bayesian)

This is preferred over NormalizedWeightedReranker when:

Documents have structured fields with different importance (title > content).
You need fine-grained control over score contributions.
Different fields use different scoring scales.

Parameters:

topn (int, optional) – Number of top documents to return. Defaults to 10.
rerank_field (Optional[str], optional) – Ignored. Defaults to None.
metric (Optional[MetricType], optional) – Metric for RAW scores. Default “cosine” because it’s the main use case with zvec/Qdrant. - MetricType.COSINE : cosine distances [0, 2] - MetricType.L2 : L2 distances - MetricType.IP : similarities (inner product, including BM25 scores)
source_weights (Optional[dict[str, float]], optional) – Weight per source key. Sources not listed use weight 1.0. Defaults to None (equal weights).
field_weights (Optional[dict[str, float]], optional) – Weight per document field. Fields not listed use weight 1.0. The field is retrieved from doc.fields dictionary. Defaults to None (equal weights for all fields).
normalizer_configs (Optional[dict[str, Any]], optional) – A mapping of source keys to their specific normalization configurations.
default_norm_config (Union[bool, str, dict[str, Any]], optional) – The normalization method to use for keys not found in normalizer_configs. Defaults to True (standard normalization).
weights (Optional[dict[str, float]])
normalize (Union[bool, str, dict[str, Any], None])
metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]])
schema (Optional[CollectionSchema])

Note

Field scores are expected to be stored in doc.fields[field_name] as numeric values. If a field is missing or has a non-numeric value, it contributes 0 to the score.

Example

>>> reranker = MultiFieldWeightedReranker(
...     topn=20,
...     source_weights={"bm25": 0.7, "dense": 0.3},
...     field_weights={"title": 3.0, "body": 1.0, "tags": 0.5}
... )
>>> results = reranker.rerank({
...     "bm25": bm25_docs,
...     "dense": dense_docs
... })

__init__(topn=10, rerank_field=None, weights=None, source_weights=None, field_weights=None, normalize=True, metrics=<object object>, schema=None)[source]

Initialize MultiFieldWeightedReranker.

Parameters:

topn (int) – Number of top documents to return.
rerank_field (Optional[str]) – Ignored.
source_weights (Optional[dict[str, float]]) – Weight per source. Defaults to equal weights.
field_weights (Optional[dict[str, float]]) – Weight per document field.
normalize (Union[bool, str, dict[str, Any], None]) – Normalization configuration. Can be: - True (default): Smart default - COSINE → no-op, others → “bayes” - str: Method name (“bayes”, “minmax”, “percentile”, “cosine”) - dict: Per-source config, e.g., {“sparse”: “bayes”, “dense”: “cosine”} - None or False: No normalization (raw scores after conversion)
Note – "cosine" is a no-op (identity) since COSINE scores are already
[0 (in)
`` (1] after conversion)
metrics (Optional[Union[MetricType, dict[str, Union[str, MetricType, None]]]]) – Metric type(s) for converting distances to similarities. Can be a single MetricType for all sources, or a dict for per-source metrics. If None and schema is provided, metrics are inferred from the schema. Required if schema is not provided.
schema (Optional[CollectionSchema]) – Collection schema to automatically extract metrics from. If provided and metrics is None, metrics are inferred from the schema.
weights (Optional[dict[str, float]])

Raises:

ValueError – If neither metrics nor schema is provided.

Example

>>> # Automatic metric detection from collection schema
>>> import zvec
>>> collection = zvec.open("./my_collection")
>>> reranker = MultiFieldWeightedReranker(
...     schema=collection.schema,
...     source_weights={"bm25": 0.6, "dense": 0.4},
...     field_weights={"title": 3.0, "content": 1.0},
...     normalize=True  # Default: bayes for all
... )

rerank(query_results, query=None)[source]

Normalize scores per-source and compute weighted fusion with field weighting.

This method performs the following steps:

Iterates through each source in query_results.
For each document, computes a field-weighted score.
Applies normalization per source (smart default: COSINE → /2, others → bayes).
Filters out documents with a normalized score of 0.0.
Delegates to WeightedReranker for source-weighted fusion.

Parameters:

query_results (dict[str, list[Doc]]) – Dictionary mapping source names to lists of documents. Each document should have id, score, and fields with numeric values for field scoring.
query (str | None)

Returns:

Reranked documents with weighted normalized scores in the score field, sorted by descending score.

Return type:

list[Doc]

Example

>>> query_results = {
...     "sparse_bm25": bm25_docs,
...     "dense_cosine": dense_docs
... }
>>> reranked = reranker.rerank(query_results)