Session 11· 03· 25 min

Semantic Search

What you'll learn
  • Build a vector index from scratch using chunking and embeddings
  • Perform semantic search over document chunks
  • Identify when semantic search fails and why

VectorIndex class

vector_index.py
class VectorIndex:
    def __init__(self):
        self.chunks: list[str] = []
        self.vectors: list[list[float]] = []

    def add(self, chunks: list[str]) -> None:
        for chunk in chunks:
            self.chunks.append(chunk)
            self.vectors.append(generate_embedding(chunk))

    def search(self, query: str, top_k: int = 3) -> list[tuple[float, str]]:
        q_vec = generate_embedding(query)
        scores = [
            (cosine_similarity(q_vec, v), c)
            for v, c in zip(self.vectors, self.chunks)
        ]
        scores.sort(key=lambda x: x[0], reverse=True)
        return scores[:top_k]
chunks and vectors are parallel lists — index i always corresponds to the same piece of text
add() embeds each chunk immediately so search() only embeds the query
search() computes cosine similarity against every stored vector — fine for < 10k chunks
Results are sorted descending so index 0 is always the best match
top_k=3 is a sensible default; more chunks means more context but higher token cost

When semantic search fails

vector_index.py
# This query will likely fail — "E_DEADLOCK_0x8F3" is an exact token
results = index.search('What causes error E_DEADLOCK_0x8F3?')
# The vector for this query has no meaningful relationship
# to chunks that contain the literal string "E_DEADLOCK_0x8F3"
Semantic search blind spots
Error codes, function names, product identifiers, and rare proper nouns are poorly served by semantic search. The embedding model has not seen enough examples to place these tokens in a meaningful region of vector space. For such queries, keyword search (BM25) is much more effective — covered in lesson 04.
Knowledge Check
A user searches for "numpy.einsum signature". Semantic search returns chunks about "matrix multiplication" and "tensor operations" instead. Why?
Recap — what you just learned
  • VectorIndex stores parallel lists of chunks and their embeddings for O(n) similarity search
  • At query time, embed the query and rank all chunks by cosine similarity
  • Semantic search excels at concept matching but misses exact tokens like error codes and function names
  • For exact-token queries, add BM25 keyword search (next lesson)
Next up: Keyword Search with BM25