Session 11· 05· 20 min

Hybrid Retrieval with RRF

What you'll learn
  • Explain why merging raw BM25 and cosine scores fails
  • Implement Reciprocal Rank Fusion (RRF) to combine ranked lists
  • Build a Retriever class that combines vector and BM25 results

The score scale problem

BM25 scores are unbounded positive numbers (e.g., 3.7, 12.4). Cosine scores are bounded between -1 and 1 (e.g., 0.82, 0.61). You cannot add or average these directly — the BM25 scores would dominate. Reciprocal Rank Fusion solves this by converting scores to ranks first.

Never add raw scores across retrieval systems
Adding a BM25 score of 8.3 to a cosine score of 0.75 produces a meaningless number. The scales are incompatible. Always fuse via ranks, not raw scores.

Reciprocal Rank Fusion formula

hybrid.py
# RRF score for a document at rank r (1-indexed):
# rrf(r) = 1 / (k + r)   where k = 60
#
# Total RRF score = sum of rrf scores across all lists
# A document in both lists gets contributions from both

Retriever class with RRF

hybrid.py
from collections import defaultdict

class Retriever:
    def __init__(self, chunks: list[str]):
        self.vector_index = VectorIndex()
        self.vector_index.add(chunks)
        self.bm25_index = BM25Index()
        self.bm25_index.add(chunks)

    def search(self, query: str, top_k: int = 5, rrf_k: int = 60) -> list[str]:
        vec_results = self.vector_index.search(query, top_k=top_k * 2)
        bm25_results = self.bm25_index.search(query, top_k=top_k * 2)

        rrf_scores: dict[str, float] = defaultdict(float)
        for rank, (_, chunk) in enumerate(vec_results, start=1):
            rrf_scores[chunk] += 1.0 / (rrf_k + rank)
        for rank, (_, chunk) in enumerate(bm25_results, start=1):
            rrf_scores[chunk] += 1.0 / (rrf_k + rank)

        ranked = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
        return [chunk for chunk, _ in ranked[:top_k]]
Retrieve top_k*2 from each system to ensure enough candidates for RRF fusion
defaultdict(float) accumulates RRF contributions — chunks in both lists get double credit
rank is 1-indexed: the top result contributes 1/61, rank 2 contributes 1/62, etc.
rrf_k=60 is the universally accepted default; only change it with measured evidence
Final output is chunks only — scores are an internal detail of the fusion step
Hybrid retrieval consistently beats either approach alone
In benchmarks, hybrid retrieval with RRF matches or beats pure semantic search and pure BM25 across almost all query types. It is the production default for most RAG systems.
Knowledge Check
A chunk appears at rank 1 in the BM25 list and rank 4 in the vector list. With k=60, what is its RRF score?
Recap — what you just learned
  • BM25 and cosine scores are on incompatible scales — never add them directly
  • RRF converts each result list to ranks and sums 1/(k+rank) contributions
  • k=60 is the standard default; it dampens the winner-takes-all effect of rank 1
  • The Retriever class runs both indexes and fuses results with RRF in ~15 lines
  • Hybrid retrieval beats either approach alone on mixed query workloads
Next up: Complete RAG Pipeline