zvec_db.rerankers.cross_encoder.openai

OpenAI-compatible API reranker using /rerank and /score endpoints.

Classes

OpenAIReranker(query[, topn, base_url, ...])

Cross-encoder reranker using OpenAI-compatible /rerank or /score endpoints.

class zvec_db.rerankers.cross_encoder.openai.OpenAIReranker(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]

Cross-encoder reranker using OpenAI-compatible /rerank or /score endpoints.

Uses vLLM’s native endpoints: /rerank for query-document scoring, /score for text pair similarity. Both return scores in [0, 1].

Parameters:
  • query (str) – Query for reranking. Required.

  • topn (int) – Number of top documents to return. Defaults to 10.

  • base_url (str) – API base URL. Defaults to “http://localhost:8000/v1”.

  • api_key (Optional[str]) – API key. Defaults to None.

  • model (str) – Model identifier. Defaults to “BAAI/bge-reranker-v2-m3”.

  • endpoint (Literal["rerank", "score"]) – Endpoint to use. Defaults to “rerank”.

  • timeout (float) – HTTP timeout in seconds. Defaults to 30.0.

  • rerank_field (Optional[str]) – Document field for scoring. Defaults to None.

  • fusion_score_weight (float) – Weight for cross-encoder vs fusion scores. 1.0 = pure cross-encoder, 0.0 = pure fusion. Defaults to 1.0.

  • truncate_prompt_tokens (Optional[int]) – Max tokens for truncation.

  • max_retries (int, optional) – Maximum number of retry attempts for transient failures. Set to 0 to disable retries. Defaults to 3.

  • initial_delay (float, optional) – Initial delay before first retry in seconds. Defaults to 1.0.

  • max_delay (float, optional) – Maximum delay cap in seconds. Defaults to 60.0.

  • exponential_base (float, optional) – Base for exponential backoff. Defaults to 2.0.

  • jitter (float, optional) – Random jitter factor (0.0-1.0) to avoid thundering herd. Defaults to 0.1.

  • retry_config (Optional[RetryConfig], optional) – Pre-configured retry settings. If provided, overrides individual retry parameters. Defaults to None.

Example

>>> from zvec_db.rerankers.cross_encoder import OpenAIReranker
>>> reranker = OpenAIReranker(
...     query="machine learning",
...     endpoint="rerank",
...     base_url="http://localhost:8000",
... )
>>> results = reranker.rerank({"bm25": docs})
>>> # With custom retry settings for production
>>> reranker = OpenAIReranker(
...     query="machine learning",
...     max_retries=5,
...     initial_delay=2.0,
...     max_delay=120.0,
... )

Note: Requires vLLM with /rerank or /score endpoint enabled.

__init__(query, topn=10, base_url='http://localhost:8000/v1', api_key=None, model='BAAI/bge-reranker-v2-m3', endpoint='rerank', timeout=30.0, rerank_field=None, fusion_score_weight=1.0, truncate_prompt_tokens=None, max_retries=3, initial_delay=1.0, max_delay=60.0, exponential_base=2.0, jitter=0.1, retry_config=None)[source]
Parameters:
  • query (str)

  • topn (int)

  • base_url (str)

  • api_key (str | None)

  • model (str)

  • endpoint (Literal['rerank', 'score'])

  • timeout (float)

  • rerank_field (str | None)

  • fusion_score_weight (float)

  • truncate_prompt_tokens (int | None)

  • max_retries (int)

  • initial_delay (float)

  • max_delay (float)

  • exponential_base (float)

  • jitter (float)

  • retry_config (RetryConfig | None)

property api_key
property base_url
property endpoint
property model
property timeout
property truncate_prompt_tokens