zvec_db.rerankers.cross_encoder
Cross-encoder rerankers for accurate pairwise scoring.
This module provides cross-encoder based reranking implementations.
Available Classes
- BaseCrossEncoderReranker
Abstract base class for all cross-encoder rerankers.
- SentenceTransformerReranker
Local binary cross-encoder using Sentence Transformers. Uses sigmoid output for relevance scoring.
- ClassificationReranker
Local multi-class classification using HuggingFace transformers. Uses softmax + expected value for scoring.
- OpenAIReranker
API-based reranker using /rerank or /score endpoints. Supports vLLM and OpenAI-compatible APIs.
- OpenAIEncoderReranker
API-based reranker using /classify endpoint. Uses encoder models (BERT, RoBERTa) with expected value scoring.
- OpenAIDecoderReranker
API-based reranker using /chat/completions with logprobs. Uses LLM with structured output and expected value scoring.
Example Usage
from zvec_db.rerankers.cross_encoder import (
SentenceTransformerReranker,
OpenAIReranker,
OpenAIDecoderReranker,
)
# Local binary cross-encoder
reranker = SentenceTransformerReranker(
model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
query="machine learning"
)
results = reranker.rerank({"bm25": docs})
# API /rerank endpoint
reranker = OpenAIReranker(
endpoint="rerank",
base_url="http://localhost:8000",
model="BAAI/bge-reranker-v2-m3",
query="machine learning"
)
results = reranker.rerank({"bm25": docs})
# LLM with logprobs (binary)
reranker = OpenAIDecoderReranker(
num_classes=2,
model="gpt-4o-mini",
query="machine learning"
)
results = reranker.rerank({"bm25": docs})
# LLM with logprobs (multi-class 0-4)
reranker = OpenAIDecoderReranker(
num_classes=5,
model="meta-llama/Llama-3-8b-instruct",
query="machine learning"
)
results = reranker.rerank({"bm25": docs})
- class zvec_db.rerankers.cross_encoder.BaseCrossEncoderReranker(query, topn=10, rerank_field=None, fusion_score_weight=1.0)[source]
Abstract base class for cross-encoder reranking.
This class provides the common infrastructure for cross-encoder scoring. Subclasses must implement the _compute_scores_batch() method to define their scoring strategy.
- Parameters:
query (str) – Query for reranking. Required.
topn (int, optional) – Number of top documents to return after reranking. Defaults to 10.
rerank_field (Optional[str], optional) – Document field to use for reranking. If None, uses the entire document content. Defaults to None.
fusion_score_weight (float, optional) –
Weight for blending cross-encoder scores with fusion scores.
Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)
weight = 1.0 → 100% cross-encoder, 0% fusion (pure cross-encoder, default)
weight = 0.8 → 80% cross-encoder, 20% fusion
weight = 0.5 → 50% cross-encoder, 50% fusion
weight = 0.0 → 0% cross-encoder, 100% fusion (pure fusion)
Defaults to 1.0 (pure cross-encoder score).
Note
Subclasses must implement _compute_scores_batch() or _compute_score()
Cross-encoder reranking is more accurate but slower than score fusion
For large document sets, consider using max_batch_size to limit API calls
- property fusion_score_weight: float
Weight for blending cross-encoder scores with fusion scores.
- Type:
- class zvec_db.rerankers.cross_encoder.SentenceTransformerReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Cross-encoder reranker using Sentence Transformers models locally.
This reranker uses the CrossEncoder class from sentence-transformers to compute relevance scores between query and document pairs. Unlike API-based cross-encoders, this runs entirely locally on CPU or GPU.
SentenceTransformer CrossEncoder models output a single score via sigmoid for binary relevance (relevant/not relevant).
- Parameters:
query (str) – Query for reranking. Required.
topn (int, optional) – Number of top documents to return. Defaults to 10.
model_name (str, optional) – CrossEncoder model name from HuggingFace. Examples: - “cross-encoder/ms-marco-MiniLM-L-6-v2” (fast, good quality) - “cross-encoder/ms-marco-TinyBERT-L-2-v2” (very fast) - “cross-encoder/stsb-distilroberta-base” (semantic similarity) Defaults to “cross-encoder/ms-marco-MiniLM-L-6-v2”.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.
batch_size (int, optional) – Batch size for inference. Defaults to 32.
show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.
fusion_score_weight (float, optional) –
Weight for blending cross-encoder scores with fusion scores.
Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)
weight = 1.0 → 100% cross-encoder, 0% fusion (default)
weight = 0.8 → 80% cross-encoder, 20% fusion
weight = 0.5 → 50% cross-encoder, 50% fusion
weight = 0.0 → 0% cross-encoder, 100% fusion
Defaults to 1.0 (pure cross-encoder score).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to CrossEncoder constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).
Example
>>> from zvec_db.rerankers.cross_encoder import SentenceTransformerReranker >>> >>> # Binary relevance reranker >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... ) >>> >>> results = reranker.rerank({"bm25": bm25_docs}) >>> >>> # Blended scores: 80% cross-encoder + 20% fusion >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... fusion_score_weight=0.8, ... ) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for private models >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="org/private-model", ... model_kwargs={"token": "hf_..."}, ... ) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for dtype (float16 for reduced memory) >>> import torch >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... model_kwargs={"torch_dtype": torch.float16}, ... ) >>> results = reranker.rerank({"bm25": docs})
Note
Requires the sentence-transformers package
Models are downloaded automatically on first use
GPU acceleration available if CUDA is installed
Models output scores in [0, 1] via sigmoid
See also
OpenAIReranker: API-based cross-encoder with LLM.
- __init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
- fit(documents)[source]
Initialize the reranker by loading the model.
For Sentence Transformers CrossEncoder, this loads the model. No training is performed as models are pre-trained.
- property batch_size
- property device
- property max_length
- property model_kwargs
- property model_name
- property show_progress_bar
- class zvec_db.rerankers.cross_encoder.ClassificationReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Multi-class classification reranker using HuggingFace transformers.
This reranker uses a multi-class classification model from HuggingFace (via the transformers library) and computes the expected value of the class distribution:
\[E[\text{score}] = \frac{\sum_{i} prob_i \times i}{num\_classes - 1}\]The model outputs logits for each class (0, 1, 2, …, num_classes-1). Softmax is applied to get probabilities, then expected value is computed and normalized to [0, 1].
- Parameters:
query (str) – Query for reranking. Required.
topn (int, optional) – Number of top documents to return. Defaults to 10.
model_name (str, optional) –
Classification model name from HuggingFace. Should be a model fine-tuned for text classification with multiple labels. Examples: “cross-encoder/ms-marco-MiniLM-L-6-v2” (binary),
”nboost/pt-bert-base-uncased-msmarco” (binary), or any model with config.num_labels set.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
num_classes (Optional[int], optional) – Number of classes for classification. If None, will be inferred from model.config.num_labels. For binary: 2 (classes 0 and 1) For multi-class: e.g., 5 for 0-4 relevance scale. Defaults to None (auto-infer).
rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.
batch_size (int, optional) – Batch size for inference. Defaults to 32.
show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.
fusion_score_weight (float, optional) –
Weight for blending cross-encoder scores with fusion scores.
Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)
weight = 1.0 → 100% cross-encoder, 0% fusion (default)
weight = 0.8 → 80% cross-encoder, 20% fusion
weight = 0.5 → 50% cross-encoder, 50% fusion
weight = 0.0 → 0% cross-encoder, 100% fusion
Defaults to 1.0 (pure cross-encoder score).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to AutoModelForSequenceClassification and AutoTokenizer. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto” for auto-detection) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”, “sdpa”) - load_in_8bit: Enable 8-bit quantization (requires bitsandbytes) - load_in_4bit: Enable 4-bit quantization (requires bitsandbytes) - device_map: Device mapping for distributed loading (e.g., “auto”, “balanced”) Defaults to None (no additional kwargs).
Example
>>> from zvec_db.rerankers.cross_encoder import ClassificationReranker >>> >>> # Binary classification (num_classes inferred from model) >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... ) >>> >>> # Multi-level relevance with explicit num_classes >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="your-multi-class-classifier", ... num_classes=5, ... topn=10, ... ) >>> >>> reranker.fit([]) # Load model >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for private models or custom options >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="org/private-model", ... model_kwargs={"token": "hf_...", "trust_remote_code": True}, ... ) >>> reranker.fit([]) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for dtype (float16 for reduced memory) >>> import torch >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... model_kwargs={"torch_dtype": torch.float16}, ... ) >>> reranker.fit([]) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for 8-bit quantization (requires bitsandbytes) >>> reranker = ClassificationReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... model_kwargs={"load_in_8bit": True}, ... ) >>> reranker.fit([]) >>> results = reranker.rerank({"bm25": docs})
Note
Requires the transformers and torch packages
Model must be trained/fine-tuned for multi-class text classification
num_classes is inferred from model.config.num_labels if not provided
GPU acceleration available if CUDA is installed
Scores are normalized to [0, 1] via expected value
See also
OpenAIDecoderReranker: API-based classification with LLM logprobs.
- __init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
- property batch_size
- property device
- property max_length
- property model_kwargs
- property model_name
- property num_classes
- property show_progress_bar
- class zvec_db.rerankers.cross_encoder.OpenAIReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Cross-encoder reranker using OpenAI-compatible /rerank or /score endpoints.
Uses vLLM’s native endpoints: /rerank for query-document scoring, /score for text pair similarity. Both return scores in [0, 1].
- Parameters:
query (str) – Query for reranking. Required.
topn (int) – Number of top documents to return. Defaults to 10.
base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.
api_key (Optional[str]) – API key. Defaults to None.
model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.
endpoint (Literal["rerank", "score"]) – Endpoint to use. Defaults to “rerank”.
timeout (float) – HTTP timeout in seconds. Defaults to 30.0.
rerank_field (Optional[str]) – Document field for scoring. Defaults to None.
fusion_score_weight (float) – Weight for cross-encoder vs fusion scores. 1.0 = pure cross-encoder, 0.0 = pure fusion. Defaults to 1.0.
truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.
max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.
initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.
max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.
exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.
jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.
retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.
Example
>>> from zvec_db.rerankers.cross_encoder import OpenAIReranker >>> reranker = OpenAIReranker( ... query="machine learning", ... endpoint="rerank", ... base_url="http://localhost:8000", ... ) >>> results = reranker.rerank({"bm25": docs})
>>> # With custom retry settings for production >>> reranker = OpenAIReranker( ... query="machine learning", ... max_retries=5, ... initial_delay=2.0, ... max_delay=120.0, ... )
Note: Requires vLLM with /rerank or /score endpoint enabled.
- __init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
- Parameters:
query (str)
topn (int)
base_url (str)
api_key (str | None)
model (str)
endpoint (Literal['rerank', 'score'])
timeout (float)
rerank_field (str | None)
fusion_score_weight (float)
truncate_prompt_tokens (int | None)
max_retries (int)
initial_delay (float)
max_delay (float)
exponential_base (float)
jitter (float)
retry_config (RetryConfig | None)
- property api_key
- property base_url
- property endpoint
- property model
- property timeout
- property truncate_prompt_tokens
- class zvec_db.rerankers.cross_encoder.OpenAIEncoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]
Cross-encoder reranker using the /classify endpoint for encoder models.
Uses vLLM’s /classify endpoint for encoder models (BERT, RoBERTa). Computes expected value score from class probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)
- Parameters:
query (str) – Query for reranking. Required.
topn (int) – Number of top documents to return. Defaults to 10.
base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.
api_key (Optional[str]) – API key. Defaults to None.
model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.
num_classes (Optional[int]) – Number of classes. Auto-detected if None.
timeout (float) – HTTP timeout in seconds. Defaults to 30.0.
rerank_field (Optional[str]) – Document field for scoring.
fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.
separator (str) – Query-document separator. Defaults to “ “.
truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.
Example
>>> from zvec_db.rerankers.cross_encoder import OpenAIEncoderReranker >>> reranker = OpenAIEncoderReranker( ... query="machine learning", ... num_classes=2, ... base_url="http://localhost:8000", ... ) >>> results = reranker.rerank({"bm25": docs})
Note: Requires vLLM with /classify endpoint enabled.
- __init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]
- property api_key
- property base_url
- property model
- property num_classes
- property separator
- property timeout
- property truncate_prompt_tokens
- class zvec_db.rerankers.cross_encoder.OpenAIDecoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]
Cross-encoder reranker using LLM logprobs with structured output.
Uses /chat/completions with logprobs and regex-constrained output. Computes expected value score from log probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)
- Parameters:
query (str) – Query for reranking. Required.
topn (int) – Number of top documents to return. Defaults to 10.
base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.
api_key (Optional[str]) – API key. Defaults to None.
model (str) – Model identifier. Defaults to “gpt-4o-mini”.
num_classes (int) – Number of classes. Defaults to 2.
timeout (float) – HTTP timeout in seconds. Defaults to 30.0.
max_batch_size (Optional[int]) – Max documents per batch. Default None.
rerank_field (Optional[str]) – Document field for scoring.
fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.
concurrency (int) – Concurrent API calls. Defaults to 4.
Example
>>> from zvec_db.rerankers.cross_encoder import OpenAIDecoderReranker >>> reranker = OpenAIDecoderReranker( ... query="machine learning", ... num_classes=2, ... model="gpt-4o-mini", ... ) >>> results = reranker.rerank({"bm25": docs})
Note: Requires model with logprobs support (–enable-logprobs for vLLM).
- MAX_CLASSES = 10
- __init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]
- property api_key
- property base_url
- property concurrency
- property max_batch_size
- property model
- property num_classes
- property timeout
Modules
Base class for cross-encoder reranking. |
|
Multi-class classification reranking using HuggingFace transformers. |
|
OpenAI-compatible API reranker using /rerank and /score endpoints. |
|
OpenAI-compatible API reranker using /chat/completions with logprobs. |
|
OpenAI-compatible API reranker using /classify endpoint for encoder models. |
|
Sentence Transformer binary cross-encoder reranker. |