zvec_db.rerankers.cross_encoder

Cross-encoder rerankers for accurate pairwise scoring.

This module provides cross-encoder based reranking implementations.

Available Classes

BaseCrossEncoderReranker

Abstract base class for all cross-encoder rerankers.

SentenceTransformerReranker

Local binary cross-encoder using Sentence Transformers. Uses sigmoid output for relevance scoring.

ClassificationReranker

Local multi-class classification using HuggingFace transformers. Uses softmax + expected value for scoring.

OpenAIReranker

API-based reranker using /rerank or /score endpoints. Supports vLLM and OpenAI-compatible APIs.

OpenAIEncoderReranker

API-based reranker using /classify endpoint. Uses encoder models (BERT, RoBERTa) with expected value scoring.

OpenAIDecoderReranker

API-based reranker using /chat/completions with logprobs. Uses LLM with structured output and expected value scoring.

Example Usage

from zvec_db.rerankers.cross_encoder import (
    SentenceTransformerReranker,
    OpenAIReranker,
    OpenAIDecoderReranker,
)

# Local binary cross-encoder
reranker = SentenceTransformerReranker(
    model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
    query="machine learning"
)
results = reranker.rerank({"bm25": docs})

# API /rerank endpoint
reranker = OpenAIReranker(
    endpoint="rerank",
    base_url="http://localhost:8000",
    model="BAAI/bge-reranker-v2-m3",
    query="machine learning"
)
results = reranker.rerank({"bm25": docs})

# LLM with logprobs (binary)
reranker = OpenAIDecoderReranker(
    num_classes=2,
    model="gpt-4o-mini",
    query="machine learning"
)
results = reranker.rerank({"bm25": docs})

# LLM with logprobs (multi-class 0-4)
reranker = OpenAIDecoderReranker(
    num_classes=5,
    model="meta-llama/Llama-3-8b-instruct",
    query="machine learning"
)
results = reranker.rerank({"bm25": docs})
class zvec_db.rerankers.cross_encoder.BaseCrossEncoderReranker(query, topn=10, rerank_field=None, fusion_score_weight=1.0)[source]

Abstract base class for cross-encoder reranking.

This class provides the common infrastructure for cross-encoder scoring. Subclasses must implement the _compute_scores_batch() method to define their scoring strategy.

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int, optional) – Number of top documents to return after reranking. Defaults to 10.

  • rerank_field (Optional[str], optional) – Document field to use for reranking. If None, uses the entire document content. Defaults to None.

  • fusion_score_weight (float, optional) –

    Weight for blending cross-encoder scores with fusion scores.

    Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)

    • weight = 1.0 → 100% cross-encoder, 0% fusion (pure cross-encoder, default)

    • weight = 0.8 → 80% cross-encoder, 20% fusion

    • weight = 0.5 → 50% cross-encoder, 50% fusion

    • weight = 0.0 → 0% cross-encoder, 100% fusion (pure fusion)

    Defaults to 1.0 (pure cross-encoder score).

Note

  • Subclasses must implement _compute_scores_batch() or _compute_score()

  • Cross-encoder reranking is more accurate but slower than score fusion

  • For large document sets, consider using max_batch_size to limit API calls

__init__(query, topn=10, rerank_field=None, fusion_score_weight=1.0)[source]
Parameters:
  • query (str)

  • topn (int)

  • rerank_field (str | None)

  • fusion_score_weight (float)

property query: str

Default query for reranking.

Type:

str

property fusion_score_weight: float

Weight for blending cross-encoder scores with fusion scores.

Type:

float

rerank(query_results, query=None)[source]

Rerank documents using cross-encoder scoring.

Parameters:
  • query_results (dict[str, list[Doc]]) – Results from one or more vector queries.

  • query (Optional[str], optional) – Query for reranking. Overrides constructor value if provided.

Returns:

Reranked documents with cross-encoder scores.

Return type:

list[Doc]

class zvec_db.rerankers.cross_encoder.SentenceTransformerReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]

Cross-encoder reranker using Sentence Transformers models locally.

This reranker uses the CrossEncoder class from sentence-transformers to compute relevance scores between query and document pairs. Unlike API-based cross-encoders, this runs entirely locally on CPU or GPU.

SentenceTransformer CrossEncoder models output a single score via sigmoid for binary relevance (relevant/not relevant).

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • model_name (str, optional) – CrossEncoder model name from HuggingFace. Examples: - “cross-encoder/ms-marco-MiniLM-L-6-v2” (fast, good quality) - “cross-encoder/ms-marco-TinyBERT-L-2-v2” (very fast) - “cross-encoder/stsb-distilroberta-base” (semantic similarity) Defaults to “cross-encoder/ms-marco-MiniLM-L-6-v2”.

  • device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.

  • max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.

  • rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.

  • batch_size (int, optional) – Batch size for inference. Defaults to 32.

  • show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.

  • fusion_score_weight (float, optional) –

    Weight for blending cross-encoder scores with fusion scores.

    Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)

    • weight = 1.0 → 100% cross-encoder, 0% fusion (default)

    • weight = 0.8 → 80% cross-encoder, 20% fusion

    • weight = 0.5 → 50% cross-encoder, 50% fusion

    • weight = 0.0 → 0% cross-encoder, 100% fusion

    Defaults to 1.0 (pure cross-encoder score).

  • model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to CrossEncoder constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).

Example

>>> from zvec_db.rerankers.cross_encoder import SentenceTransformerReranker
>>>
>>> # Binary relevance reranker
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     topn=10,
... )
>>>
>>> results = reranker.rerank({"bm25": bm25_docs})
>>>
>>> # Blended scores: 80% cross-encoder + 20% fusion
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     topn=10,
...     fusion_score_weight=0.8,
... )
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for private models
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="org/private-model",
...     model_kwargs={"token": "hf_..."},
... )
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for dtype (float16 for reduced memory)
>>> import torch
>>> reranker = SentenceTransformerReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     model_kwargs={"torch_dtype": torch.float16},
... )
>>> results = reranker.rerank({"bm25": docs})

Note

  • Requires the sentence-transformers package

  • Models are downloaded automatically on first use

  • GPU acceleration available if CUDA is installed

  • Models output scores in [0, 1] via sigmoid

See also

OpenAIReranker: API-based cross-encoder with LLM.

__init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • model_name (str)

  • device (str | None)

  • max_length (int | None)

  • rerank_field (str | None)

  • batch_size (int)

  • show_progress_bar (bool)

  • fusion_score_weight (float)

  • model_kwargs (Mapping[str, Any] | None)

fit(documents)[source]

Initialize the reranker by loading the model.

For Sentence Transformers CrossEncoder, this loads the model. No training is performed as models are pre-trained.

Parameters:

documents (list[str]) – List of documents (not used, for API compatibility).

Returns:

For method chaining.

Return type:

self

property batch_size
property device
property max_length
property model_kwargs
property model_name
property show_progress_bar
class zvec_db.rerankers.cross_encoder.ClassificationReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]

Multi-class classification reranker using HuggingFace transformers.

This reranker uses a multi-class classification model from HuggingFace (via the transformers library) and computes the expected value of the class distribution:

\[E[\text{score}] = \frac{\sum_{i} prob_i \times i}{num\_classes - 1}\]

The model outputs logits for each class (0, 1, 2, …, num_classes-1). Softmax is applied to get probabilities, then expected value is computed and normalized to [0, 1].

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int, optional) – Number of top documents to return. Defaults to 10.

  • model_name (str, optional) –

    Classification model name from HuggingFace. Should be a model fine-tuned for text classification with multiple labels. Examples: “cross-encoder/ms-marco-MiniLM-L-6-v2” (binary),

    ”nboost/pt-bert-base-uncased-msmarco” (binary), or any model with config.num_labels set.

  • device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.

  • max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.

  • num_classes (Optional[int], optional) – Number of classes for classification. If None, will be inferred from model.config.num_labels. For binary: 2 (classes 0 and 1) For multi-class: e.g., 5 for 0-4 relevance scale. Defaults to None (auto-infer).

  • rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.

  • batch_size (int, optional) – Batch size for inference. Defaults to 32.

  • show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.

  • fusion_score_weight (float, optional) –

    Weight for blending cross-encoder scores with fusion scores.

    Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)

    • weight = 1.0 → 100% cross-encoder, 0% fusion (default)

    • weight = 0.8 → 80% cross-encoder, 20% fusion

    • weight = 0.5 → 50% cross-encoder, 50% fusion

    • weight = 0.0 → 0% cross-encoder, 100% fusion

    Defaults to 1.0 (pure cross-encoder score).

  • model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to AutoModelForSequenceClassification and AutoTokenizer. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto” for auto-detection) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”, “sdpa”) - load_in_8bit: Enable 8-bit quantization (requires bitsandbytes) - load_in_4bit: Enable 4-bit quantization (requires bitsandbytes) - device_map: Device mapping for distributed loading (e.g., “auto”, “balanced”) Defaults to None (no additional kwargs).

Example

>>> from zvec_db.rerankers.cross_encoder import ClassificationReranker
>>>
>>> # Binary classification (num_classes inferred from model)
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     topn=10,
... )
>>>
>>> # Multi-level relevance with explicit num_classes
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="your-multi-class-classifier",
...     num_classes=5,
...     topn=10,
... )
>>>
>>> reranker.fit([])  # Load model
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for private models or custom options
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="org/private-model",
...     model_kwargs={"token": "hf_...", "trust_remote_code": True},
... )
>>> reranker.fit([])
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for dtype (float16 for reduced memory)
>>> import torch
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     model_kwargs={"torch_dtype": torch.float16},
... )
>>> reranker.fit([])
>>> results = reranker.rerank({"bm25": docs})
>>>
>>> # With model_kwargs for 8-bit quantization (requires bitsandbytes)
>>> reranker = ClassificationReranker(
...     query="machine learning",
...     model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
...     model_kwargs={"load_in_8bit": True},
... )
>>> reranker.fit([])
>>> results = reranker.rerank({"bm25": docs})

Note

  • Requires the transformers and torch packages

  • Model must be trained/fine-tuned for multi-class text classification

  • num_classes is inferred from model.config.num_labels if not provided

  • GPU acceleration available if CUDA is installed

  • Scores are normalized to [0, 1] via expected value

See also

OpenAIDecoderReranker: API-based classification with LLM logprobs.

__init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, num_classes=None, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • model_name (str)

  • device (str | None)

  • max_length (int | None)

  • num_classes (int | None)

  • rerank_field (str | None)

  • batch_size (int)

  • show_progress_bar (bool)

  • fusion_score_weight (float)

  • model_kwargs (Mapping[str, Any] | None)

fit(documents)[source]

Initialize the reranker by loading the model.

Parameters:

documents (list[str]) – List of documents (not used, for API compatibility).

Returns:

For method chaining.

Return type:

self

property batch_size
property device
property max_length
property model_kwargs
property model_name
property num_classes
property show_progress_bar
class zvec_db.rerankers.cross_encoder.OpenAIReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]

Cross-encoder reranker using OpenAI-compatible /rerank or /score endpoints.

Uses vLLM’s native endpoints: /rerank for query-document scoring, /score for text pair similarity. Both return scores in [0, 1].

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int) – Number of top documents to return. Defaults to 10.

  • base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.

  • api_key (Optional[str]) – API key. Defaults to None.

  • model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.

  • endpoint (Literal["rerank", "score"]) – Endpoint to use. Defaults to “rerank”.

  • timeout (float) – HTTP timeout in seconds. Defaults to 30.0.

  • rerank_field (Optional[str]) – Document field for scoring. Defaults to None.

  • fusion_score_weight (float) – Weight for cross-encoder vs fusion scores. 1.0 = pure cross-encoder, 0.0 = pure fusion. Defaults to 1.0.

  • truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.

  • max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.

  • initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.

  • max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.

  • exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.

  • jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.

  • retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.

Example

>>> from zvec_db.rerankers.cross_encoder import OpenAIReranker
>>> reranker = OpenAIReranker(
...     query="machine learning",
...     endpoint="rerank",
...     base_url="http://localhost:8000",
... )
>>> results = reranker.rerank({"bm25": docs})
>>> # With custom retry settings for production
>>> reranker = OpenAIReranker(
...     query="machine learning",
...     max_retries=5,
...     initial_delay=2.0,
...     max_delay=120.0,
... )

Note: Requires vLLM with /rerank or /score endpoint enabled.

__init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • base_url (str)

  • api_key (str | None)

  • model (str)

  • endpoint (Literal['rerank', 'score'])

  • timeout (float)

  • rerank_field (str | None)

  • fusion_score_weight (float)

  • truncate_prompt_tokens (int | None)

  • max_retries (int)

  • initial_delay (float)

  • max_delay (float)

  • exponential_base (float)

  • jitter (float)

  • retry_config (RetryConfig | None)

property api_key
property base_url
property endpoint
property model
property timeout
property truncate_prompt_tokens
class zvec_db.rerankers.cross_encoder.OpenAIEncoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]

Cross-encoder reranker using the /classify endpoint for encoder models.

Uses vLLM’s /classify endpoint for encoder models (BERT, RoBERTa). Computes expected value score from class probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int) – Number of top documents to return. Defaults to 10.

  • base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.

  • api_key (Optional[str]) – API key. Defaults to None.

  • model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.

  • num_classes (Optional[int]) – Number of classes. Auto-detected if None.

  • timeout (float) – HTTP timeout in seconds. Defaults to 30.0.

  • rerank_field (Optional[str]) – Document field for scoring.

  • fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.

  • separator (str) – Query-document separator. Defaults to “ “.

  • truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.

Example

>>> from zvec_db.rerankers.cross_encoder import OpenAIEncoderReranker
>>> reranker = OpenAIEncoderReranker(
...     query="machine learning",
...     num_classes=2,
...     base_url="http://localhost:8000",
... )
>>> results = reranker.rerank({"bm25": docs})

Note: Requires vLLM with /classify endpoint enabled.

__init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', num_classes=None, timeout=30.0, rerank_field=None, fusion_score_weight=1.0, separator=' ', truncate_prompt_tokens=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • base_url (str)

  • api_key (str | None)

  • model (str)

  • num_classes (int | None)

  • timeout (float)

  • rerank_field (str | None)

  • fusion_score_weight (float)

  • separator (str)

  • truncate_prompt_tokens (int | None)

property api_key
property base_url
property model
property num_classes
property separator
property timeout
property truncate_prompt_tokens
class zvec_db.rerankers.cross_encoder.OpenAIDecoderReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]

Cross-encoder reranker using LLM logprobs with structured output.

Uses /chat/completions with logprobs and regex-constrained output. Computes expected value score from log probabilities: E[score] = sum(prob_i * i) / (num_classes - 1)

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int) – Number of top documents to return. Defaults to 10.

  • base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.

  • api_key (Optional[str]) – API key. Defaults to None.

  • model (str) – Model identifier. Defaults to “gpt-4o-mini”.

  • num_classes (int) – Number of classes. Defaults to 2.

  • timeout (float) – HTTP timeout in seconds. Defaults to 30.0.

  • max_batch_size (Optional[int]) – Max documents per batch. Default None.

  • rerank_field (Optional[str]) – Document field for scoring.

  • fusion_score_weight (float) – Cross-encoder vs fusion weight. Default 1.0.

  • concurrency (int) – Concurrent API calls. Defaults to 4.

Example

>>> from zvec_db.rerankers.cross_encoder import OpenAIDecoderReranker
>>> reranker = OpenAIDecoderReranker(
...     query="machine learning",
...     num_classes=2,
...     model="gpt-4o-mini",
... )
>>> results = reranker.rerank({"bm25": docs})

Note: Requires model with logprobs support (–enable-logprobs for vLLM).

MAX_CLASSES = 10
__init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='gpt-4o-mini', num_classes=2, timeout=30.0, max_batch_size=None, rerank_field=None, fusion_score_weight=1.0, concurrency=4)[source]
Parameters:
  • query (str)

  • topn (int)

  • base_url (str)

  • api_key (str | None)

  • model (str)

  • num_classes (int)

  • timeout (float)

  • max_batch_size (int | None)

  • rerank_field (str | None)

  • fusion_score_weight (float)

  • concurrency (int)

property api_key
property base_url
property concurrency
property max_batch_size
property model
property num_classes
property timeout

Modules

base

Base class for cross-encoder reranking.

classification

Multi-class classification reranking using HuggingFace transformers.

openai

OpenAI-compatible API reranker using /rerank and /score endpoints.

openai_decoder

OpenAI-compatible API reranker using /chat/completions with logprobs.

openai_encoder

OpenAI-compatible API reranker using /classify endpoint for encoder models.

sentence_transformer

Sentence Transformer binary cross-encoder reranker.