zvec_db.rerankers.utils

Utility classes for reranking operations.

class zvec_db.rerankers.utils.Normalize(config=None)[source]

Callable normaliser for lists of (uid, score) pairs.

Instances behave like functions: call them with a score list and an optional avgscore and the result will be a new list with all scores mapped into the closed unit interval. The precise transformation is determined by the configuration supplied at construction time.

Parameters:

config (Union[bool, str, Dict[str, Any], None])

method

Lowercase string naming the chosen normalisation algorithm.

Type:

str

alpha

Scale parameter used in Bayesian modes.

Type:

float

beta

Centre parameter used in Bayesian modes; None triggers median-based automatic selection.

Type:

Optional[float]

__init__(config=None)[source]

Initialise a Normalize instance.

Parameters:

config (bool, str, dict or None, optional) –

Configuration object that selects the normalisation strategy. The following forms are interpreted:

  • None or False : equivalent to "default" - standard index-aware scaling.

  • truthy non-dict : also selects the default behaviour.

  • str : the string value is converted to lower case and used as the method name. Supported methods: - "bayes", "bayesian", "bb25" : Bayesian sigmoid calibration - "minmax" : (x - min) / (max - min) - "percentile" (alias: "rank") : rank-based normalization - "default" : standard index-aware scaling

  • dict : a copy of the dictionary is stored, and may contain the keys method (string), alpha (float) and beta (float or None). Any missing keys will be filled with defaults (alpha defaults to 1.0; beta to None).

Notes

The configuration is shallow-copied to prevent external modification from affecting the normaliser’s internal state.

__call__(scores, avgscore=0.0)[source]

Normalise a list of document scores.

Parameters:
  • scores (ScoreList) – Sequence of (uid, score) pairs, typically produced by a retrieval algorithm. It is assumed that the list is sorted in descending order of score; the method will use the first entry to compute the maximum when performing default scaling.

  • avgscore (float, optional) – Average score computed over the entire corpus. This is only used by the default normalisation strategy. In Bayesian modes the value is ignored entirely.

Returns:

New list where each score has been replaced with a value in [0.0, 1.0] according to the chosen transformation.

Return type:

ScoreList

Notes

Multiple normalisation methods are supported:

  • default – scales scores relative to an estimated maximum and clips values. This keeps the relative ordering intact but bounds the range.

  • bayesian – applies a sigmoid function calibrated using the positive scores only. Negative or zero input scores are mapped to 0.0 unconditionally. Robust to outliers.

  • minmax – (x - min) / (max - min). Preserves relative distances.

  • percentile – rank-based normalization. Very robust to outliers.

  • cosine – no-op (identity). COSINE conversion (2-score)/2 already produces scores in [0, 1], so no additional normalization is needed.

  • atan – arctan-based normalization: 1 - 2*atan(s)/pi for L2, 0.5 + atan(s)/pi for IP. Maps unbounded scores to [0, 1].

class zvec_db.rerankers.utils.PipelineReranker(rerankers, topn=10, rerank_field=None)[source]

Chain multiple rerankers sequentially.

This reranker applies a list of rerankers in sequence, passing the output of one as the input to the next. This is useful for combining different reranking strategies (e.g., RRF followed by cross-encoder).

Parameters:
  • rerankers (list) – List of rerankers to apply in order.

  • topn (int, optional) – Number of final documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

Example

>>> pipeline = PipelineReranker([
...     RrfReranker(topn=50, rank_constant=60),
...     SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10)
... ])
>>> results = collection.query(..., reranker=pipeline)
__init__(rerankers, topn=10, rerank_field=None)[source]

Initialize PipelineReranker with a list of rerankers.

Parameters:
  • rerankers (list) – List of reranker instances to apply in order. Each reranker must implement the rerank() method.

  • topn (int, optional) – Number of final documents to return. Defaults to 10.

  • rerank_field (Optional[str], optional) – Ignored. Defaults to None.

Example

>>> pipeline = PipelineReranker([
...     RrfReranker(topn=50, rank_constant=60),
...     SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10)
... ])
>>> results = collection.query(..., reranker=pipeline)
rerank(query_results, query=None)[source]

Apply rerankers sequentially.

Parameters:
  • query_results (dict[str, list[Doc]]) – Results from vector queries.

  • query (Optional[str], optional) – The search query. Passed to underlying rerankers. Defaults to None.

Returns:

Final re-ranked documents after all rerankers applied.

Return type:

list[Doc]

zvec_db.rerankers.utils.extract_score(doc)[source]

Extract score from a document, handling various numeric types.

Parameters:

doc (Doc) – Document with a score attribute.

Returns:

Score as a float, or 0.0 if score is None or invalid.

Return type:

float

Example

>>> doc = Doc(id="1", score=0.8)
>>> extract_score(doc)
0.8
>>> doc_no_score = Doc(id="2", score=None)
>>> extract_score(doc_no_score)
0.0
zvec_db.rerankers.utils.extract_field_score(doc, field_name)[source]

Extract score from a specific document field.

Parameters:
  • doc (Doc) – Document with fields attribute.

  • field_name (str) – Name of the field to extract score from.

Returns:

Field score as a float, or 0.0 if field is missing or non-numeric.

Return type:

float

Example

>>> doc = Doc(id="1", fields={"title_score": 0.9, "content_score": 0.7})
>>> extract_field_score(doc, "title_score")
0.9
>>> extract_field_score(doc, "missing_field")
0.0
zvec_db.rerankers.utils.get_document_text(doc, rerank_field=None)[source]

Extract document text for scoring or embedding.

This function attempts to extract text content from a document using the following strategy:

  1. If rerank_field is specified and the document has that field, use it.

  2. Otherwise, try common field names: “content”, “text”, “body”, “passage”.

  3. If no field matches, concatenate all fields.

  4. As a last resort, return the document ID as a string.

Parameters:
  • doc (Doc) – Document to extract text from.

  • rerank_field (Optional[str]) – Specific field name to use. If None, uses the fallback strategy. Defaults to None.

Returns:

Extracted document text.

Return type:

str

Example

>>> doc = Doc(id="1", fields={"content": "Hello world", "title": "Test"})
>>> get_document_text(doc)
'Hello world'
>>> get_document_text(doc, rerank_field="title")
'Test'

Modules

base_utils

Common utilities for rerankers in zvec-db.

normalize

Normalization utilities for post-processing raw document relevance scores.

pipeline

PipelineReranker for zvec-db.