zvec_db.rerankers.utils
Utility classes for reranking operations.
- class zvec_db.rerankers.utils.Normalize(config=None)[source]
Callable normaliser for lists of
(uid, score)pairs.Instances behave like functions: call them with a score list and an optional
avgscoreand the result will be a new list with all scores mapped into the closed unit interval. The precise transformation is determined by the configuration supplied at construction time.- beta
Centre parameter used in Bayesian modes;
Nonetriggers median-based automatic selection.- Type:
Optional[float]
- __init__(config=None)[source]
Initialise a
Normalizeinstance.- Parameters:
config (bool, str, dict or None, optional) –
Configuration object that selects the normalisation strategy. The following forms are interpreted:
NoneorFalse: equivalent to"default"- standard index-aware scaling.truthy non-dict : also selects the default behaviour.
str: the string value is converted to lower case and used as themethodname. Supported methods: -"bayes","bayesian","bb25": Bayesian sigmoid calibration -"minmax": (x - min) / (max - min) -"percentile"(alias:"rank") : rank-based normalization -"default": standard index-aware scalingdict: a copy of the dictionary is stored, and may contain the keysmethod(string),alpha(float) andbeta(float orNone). Any missing keys will be filled with defaults (alphadefaults to1.0;betatoNone).
Notes
The configuration is shallow-copied to prevent external modification from affecting the normaliser’s internal state.
- __call__(scores, avgscore=0.0)[source]
Normalise a list of document scores.
- Parameters:
scores (ScoreList) – Sequence of
(uid, score)pairs, typically produced by a retrieval algorithm. It is assumed that the list is sorted in descending order of score; the method will use the first entry to compute the maximum when performing default scaling.avgscore (float, optional) – Average score computed over the entire corpus. This is only used by the
defaultnormalisation strategy. In Bayesian modes the value is ignored entirely.
- Returns:
New list where each score has been replaced with a value in
[0.0, 1.0]according to the chosen transformation.- Return type:
ScoreList
Notes
Multiple normalisation methods are supported:
default – scales scores relative to an estimated maximum and clips values. This keeps the relative ordering intact but bounds the range.
bayesian – applies a sigmoid function calibrated using the positive scores only. Negative or zero input scores are mapped to
0.0unconditionally. Robust to outliers.minmax – (x - min) / (max - min). Preserves relative distances.
percentile – rank-based normalization. Very robust to outliers.
cosine – no-op (identity). COSINE conversion (2-score)/2 already produces scores in [0, 1], so no additional normalization is needed.
atan – arctan-based normalization:
1 - 2*atan(s)/pifor L2,0.5 + atan(s)/pifor IP. Maps unbounded scores to [0, 1].
- class zvec_db.rerankers.utils.PipelineReranker(rerankers, topn=10, rerank_field=None)[source]
Chain multiple rerankers sequentially.
This reranker applies a list of rerankers in sequence, passing the output of one as the input to the next. This is useful for combining different reranking strategies (e.g., RRF followed by cross-encoder).
- Parameters:
Example
>>> pipeline = PipelineReranker([ ... RrfReranker(topn=50, rank_constant=60), ... SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10) ... ]) >>> results = collection.query(..., reranker=pipeline)
- __init__(rerankers, topn=10, rerank_field=None)[source]
Initialize PipelineReranker with a list of rerankers.
- Parameters:
Example
>>> pipeline = PipelineReranker([ ... RrfReranker(topn=50, rank_constant=60), ... SentenceTransformerReranker(model_name="ms-marco-MiniLM-L-6-v2", topn=10) ... ]) >>> results = collection.query(..., reranker=pipeline)
- zvec_db.rerankers.utils.extract_score(doc)[source]
Extract score from a document, handling various numeric types.
- Parameters:
doc (Doc) – Document with a score attribute.
- Returns:
Score as a float, or 0.0 if score is None or invalid.
- Return type:
Example
>>> doc = Doc(id="1", score=0.8) >>> extract_score(doc) 0.8 >>> doc_no_score = Doc(id="2", score=None) >>> extract_score(doc_no_score) 0.0
- zvec_db.rerankers.utils.extract_field_score(doc, field_name)[source]
Extract score from a specific document field.
- Parameters:
doc (Doc) – Document with fields attribute.
field_name (str) – Name of the field to extract score from.
- Returns:
Field score as a float, or 0.0 if field is missing or non-numeric.
- Return type:
Example
>>> doc = Doc(id="1", fields={"title_score": 0.9, "content_score": 0.7}) >>> extract_field_score(doc, "title_score") 0.9 >>> extract_field_score(doc, "missing_field") 0.0
- zvec_db.rerankers.utils.get_document_text(doc, rerank_field=None)[source]
Extract document text for scoring or embedding.
This function attempts to extract text content from a document using the following strategy:
If
rerank_fieldis specified and the document has that field, use it.Otherwise, try common field names: “content”, “text”, “body”, “passage”.
If no field matches, concatenate all fields.
As a last resort, return the document ID as a string.
- Parameters:
doc (Doc) – Document to extract text from.
rerank_field (Optional[str]) – Specific field name to use. If None, uses the fallback strategy. Defaults to None.
- Returns:
Extracted document text.
- Return type:
Example
>>> doc = Doc(id="1", fields={"content": "Hello world", "title": "Test"}) >>> get_document_text(doc) 'Hello world' >>> get_document_text(doc, rerank_field="title") 'Test'
Modules
Common utilities for rerankers in zvec-db. |
|
Normalization utilities for post-processing raw document relevance scores. |
|
PipelineReranker for zvec-db. |