What Is Approximate Nearest Neighbor Search?
Approximate Nearest Neighbor (ANN) search finds the vectors in a dataset that are closest to a query vector — without exhaustively comparing against every stored vector. The trade-off: a small, controllable probability of missing the exact nearest neighbor in exchange for search times that are orders of magnitude faster.
In a vector database with 10 million 1536-dimensional vectors, exact nearest-neighbor (brute force) search requires ~60 billion floating-point operations. ANN algorithms reduce this to thousands of operations with >95% recall.
Exact vs Approximate Nearest Neighbor
| Exact (Flat) | Approximate (ANN) | |
|---|---|---|
| Recall | 100% | 95–99% (tunable) |
| Query time | O(n × d) | O(log n) or O(polylog n) |
| At 1M vectors, 1536d | ~seconds | ~milliseconds |
| Use case | Tiny datasets | Production scale |
The accuracy loss is almost always acceptable for RAG — missing 1 in 50 nearest neighbors rarely affects answer quality.
ANN Algorithm Families
Graph-Based: HNSW
Builds a multi-layer navigable graph. Each query starts at a high-level node and greedily traverses edges toward the query. The most widely used algorithm in production.
- Pros: Highest recall/speed trade-off, supports incremental inserts
- Cons: High memory usage (~100 bytes per node overhead)
Inverted File Index (IVF)
Clusters vectors into Voronoi cells at build time using k-means. At query time, searches only the nearest N clusters.
- Pros: Lower memory than HNSW
- Cons: Requires offline training, slower on inserts, lower recall at same speed
Product Quantization (PQ)
Compresses vectors by splitting them into sub-vectors and quantizing each. Often combined with IVF (IVF-PQ).
- Pros: Massive memory reduction (4–64×)
- Cons: Lower recall due to compression loss
LSH (Locality Sensitive Hashing)
Maps similar vectors to the same hash bucket with high probability.
- Pros: Theoretically elegant, simple implementation
- Cons: Lower accuracy than HNSW in practice, largely superseded
Key Libraries
| Library | Algorithms | Notes |
|---|---|---|
| FAISS (Meta) | HNSW, IVF, IVF-PQ | GPU support, highly optimized |
| hnswlib | HNSW | Lightweight, C++/Python |
| ScaNN (Google) | Asymmetric hashing | High throughput |
| Annoy (Spotify) | Tree-based | Read-heavy workloads |
Most vector databases (Pinecone, Weaviate, Qdrant, Typesense) use FAISS or hnswlib internally.
Tuning Recall vs Speed
ANN algorithms expose parameters to trade recall for speed:
# hnswlib example
index = hnswlib.Index(space="cosine", dim=1536)
index.init_index(max_elements=1_000_000, ef_construction=200, M=16)
index.set_ef(50) # higher ef = better recall, slower queries
labels, distances = index.knn_query(query_vector, k=10)
Benchmark your specific workload to find the ef value that satisfies your latency SLA while maintaining acceptable recall.
ANN in KnowledgeSDK
Typesense (the search backend powering KnowledgeSDK) uses HNSW for its vector index. When you call POST /v1/extract, chunk embeddings are inserted into the HNSW graph incrementally. When you call POST /v1/search, the HNSW traversal finds the approximate nearest neighbors to your query embedding in milliseconds, regardless of how many knowledge items are in your collection.
This means KnowledgeSDK's search latency stays low even as your knowledge base grows to hundreds of thousands of documents.