zvec_db.rerankers.cross_encoder.sentence_transformer
Sentence Transformer binary cross-encoder reranker.
Classes
|
Cross-encoder reranker using Sentence Transformers models locally. |
- class zvec_db.rerankers.cross_encoder.sentence_transformer.SentenceTransformerReranker(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
Cross-encoder reranker using Sentence Transformers models locally.
This reranker uses the CrossEncoder class from sentence-transformers to compute relevance scores between query and document pairs. Unlike API-based cross-encoders, this runs entirely locally on CPU or GPU.
SentenceTransformer CrossEncoder models output a single score via sigmoid for binary relevance (relevant/not relevant).
- Parameters:
query (str) – Query for reranking. Required.
topn (int, optional) – Number of top documents to return. Defaults to 10.
model_name (str, optional) – CrossEncoder model name from HuggingFace. Examples: - “cross-encoder/ms-marco-MiniLM-L-6-v2” (fast, good quality) - “cross-encoder/ms-marco-TinyBERT-L-2-v2” (very fast) - “cross-encoder/stsb-distilroberta-base” (semantic similarity) Defaults to “cross-encoder/ms-marco-MiniLM-L-6-v2”.
device (Optional[str], optional) – Device to run model on. “cpu”, “cuda”, or None for auto-detect. Defaults to None.
max_length (Optional[int], optional) – Maximum sequence length. Defaults to 512.
rerank_field (Optional[str], optional) – Document field to use for scoring. If None, uses the entire document content. Defaults to None.
batch_size (int, optional) – Batch size for inference. Defaults to 32.
show_progress_bar (bool, optional) – Show progress bar during inference. Defaults to False.
fusion_score_weight (float, optional) –
Weight for blending cross-encoder scores with fusion scores.
Formula: final_score = cross_encoder_score × weight + fusion_score × (1 - weight)
weight = 1.0 → 100% cross-encoder, 0% fusion (default)
weight = 0.8 → 80% cross-encoder, 20% fusion
weight = 0.5 → 50% cross-encoder, 50% fusion
weight = 0.0 → 0% cross-encoder, 100% fusion
Defaults to 1.0 (pure cross-encoder score).
model_kwargs (Optional[Mapping[str, Any]], optional) – Additional keyword arguments passed to CrossEncoder constructor. Useful for options like: - torch_dtype: Model dtype (torch.float16, torch.bfloat16, “auto”) - trust_remote_code: Trust remote code from HuggingFace Hub - token: HuggingFace API token for private models - revision: Model revision to load - cache_dir: Custom cache directory - local_files_only: Load only local files - attn_implementation: Attention implementation (e.g., “flash_attention_2”) Defaults to None (no additional kwargs).
Example
>>> from zvec_db.rerankers.cross_encoder import SentenceTransformerReranker >>> >>> # Binary relevance reranker >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... ) >>> >>> results = reranker.rerank({"bm25": bm25_docs}) >>> >>> # Blended scores: 80% cross-encoder + 20% fusion >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... topn=10, ... fusion_score_weight=0.8, ... ) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for private models >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="org/private-model", ... model_kwargs={"token": "hf_..."}, ... ) >>> results = reranker.rerank({"bm25": docs}) >>> >>> # With model_kwargs for dtype (float16 for reduced memory) >>> import torch >>> reranker = SentenceTransformerReranker( ... query="machine learning", ... model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ... model_kwargs={"torch_dtype": torch.float16}, ... ) >>> results = reranker.rerank({"bm25": docs})
Note
Requires the sentence-transformers package
Models are downloaded automatically on first use
GPU acceleration available if CUDA is installed
Models output scores in [0, 1] via sigmoid
See also
OpenAIReranker: API-based cross-encoder with LLM.
- __init__(query, topn=10, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2', device=None, max_length=512, rerank_field=None, batch_size=32, show_progress_bar=False, fusion_score_weight=1.0, model_kwargs=None)[source]
- fit(documents)[source]
Initialize the reranker by loading the model.
For Sentence Transformers CrossEncoder, this loads the model. No training is performed as models are pre-trained.
- property batch_size
- property device
- property max_length
- property model_kwargs
- property model_name
- property show_progress_bar