Session 11· 05· 20 min
Hybrid Retrieval with RRF
What you'll learn
- ▸Explain why merging raw BM25 and cosine scores fails
- ▸Implement Reciprocal Rank Fusion (RRF) to combine ranked lists
- ▸Build a Retriever class that combines vector and BM25 results
The score scale problem
BM25 scores are unbounded positive numbers (e.g., 3.7, 12.4). Cosine scores are bounded between -1 and 1 (e.g., 0.82, 0.61). You cannot add or average these directly — the BM25 scores would dominate. Reciprocal Rank Fusion solves this by converting scores to ranks first.
Never add raw scores across retrieval systems
Adding a BM25 score of 8.3 to a cosine score of 0.75 produces a meaningless number. The scales are incompatible. Always fuse via ranks, not raw scores.
Reciprocal Rank Fusion formula
hybrid.py
# RRF score for a document at rank r (1-indexed):
# rrf(r) = 1 / (k + r) where k = 60
#
# Total RRF score = sum of rrf scores across all lists
# A document in both lists gets contributions from bothRetriever class with RRF
hybrid.py
from collections import defaultdict
class Retriever:
def __init__(self, chunks: list[str]):
self.vector_index = VectorIndex()
self.vector_index.add(chunks)
self.bm25_index = BM25Index()
self.bm25_index.add(chunks)
def search(self, query: str, top_k: int = 5, rrf_k: int = 60) -> list[str]:
vec_results = self.vector_index.search(query, top_k=top_k * 2)
bm25_results = self.bm25_index.search(query, top_k=top_k * 2)
rrf_scores: dict[str, float] = defaultdict(float)
for rank, (_, chunk) in enumerate(vec_results, start=1):
rrf_scores[chunk] += 1.0 / (rrf_k + rank)
for rank, (_, chunk) in enumerate(bm25_results, start=1):
rrf_scores[chunk] += 1.0 / (rrf_k + rank)
ranked = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
return [chunk for chunk, _ in ranked[:top_k]]①Retrieve top_k*2 from each system to ensure enough candidates for RRF fusion
②defaultdict(float) accumulates RRF contributions — chunks in both lists get double credit
③rank is 1-indexed: the top result contributes 1/61, rank 2 contributes 1/62, etc.
④rrf_k=60 is the universally accepted default; only change it with measured evidence
⑤Final output is chunks only — scores are an internal detail of the fusion step
Hybrid retrieval consistently beats either approach alone
In benchmarks, hybrid retrieval with RRF matches or beats pure semantic search and pure BM25 across almost all query types. It is the production default for most RAG systems.
Knowledge Check
A chunk appears at rank 1 in the BM25 list and rank 4 in the vector list. With k=60, what is its RRF score?
Recap — what you just learned
- ✓BM25 and cosine scores are on incompatible scales — never add them directly
- ✓RRF converts each result list to ranks and sums 1/(k+rank) contributions
- ✓k=60 is the standard default; it dampens the winner-takes-all effect of rank 1
- ✓The Retriever class runs both indexes and fuses results with RRF in ~15 lines
- ✓Hybrid retrieval beats either approach alone on mixed query workloads