Approximate Nearest Neighbor

A class of algorithms that find vectors approximately closest to a query vector, trading perfect accuracy for massive speed gains.

What Is Approximate Nearest Neighbor Search?

Approximate Nearest Neighbor (ANN) search finds the vectors in a dataset that are closest to a query vector — without exhaustively comparing against every stored vector. The trade-off: a small, controllable probability of missing the exact nearest neighbor in exchange for search times that are orders of magnitude faster.

In a vector database with 10 million 1536-dimensional vectors, exact nearest-neighbor (brute force) search requires ~60 billion floating-point operations. ANN algorithms reduce this to thousands of operations with >95% recall.

Exact vs Approximate Nearest Neighbor

	Exact (Flat)	Approximate (ANN)
Recall	100%	95–99% (tunable)
Query time	O(n × d)	O(log n) or O(polylog n)
At 1M vectors, 1536d	~seconds	~milliseconds
Use case	Tiny datasets	Production scale

The accuracy loss is almost always acceptable for RAG — missing 1 in 50 nearest neighbors rarely affects answer quality.

ANN Algorithm Families

Graph-Based: HNSW

Builds a multi-layer navigable graph. Each query starts at a high-level node and greedily traverses edges toward the query. The most widely used algorithm in production.

Pros: Highest recall/speed trade-off, supports incremental inserts
Cons: High memory usage (~100 bytes per node overhead)

Inverted File Index (IVF)

Clusters vectors into Voronoi cells at build time using k-means. At query time, searches only the nearest N clusters.

Pros: Lower memory than HNSW
Cons: Requires offline training, slower on inserts, lower recall at same speed

Product Quantization (PQ)

Compresses vectors by splitting them into sub-vectors and quantizing each. Often combined with IVF (IVF-PQ).

Pros: Massive memory reduction (4–64×)
Cons: Lower recall due to compression loss

LSH (Locality Sensitive Hashing)

Maps similar vectors to the same hash bucket with high probability.

Pros: Theoretically elegant, simple implementation
Cons: Lower accuracy than HNSW in practice, largely superseded

Key Libraries

Library	Algorithms	Notes
FAISS (Meta)	HNSW, IVF, IVF-PQ	GPU support, highly optimized
hnswlib	HNSW	Lightweight, C++/Python
ScaNN (Google)	Asymmetric hashing	High throughput
Annoy (Spotify)	Tree-based	Read-heavy workloads

Most vector databases (Pinecone, Weaviate, Qdrant, Typesense) use FAISS or hnswlib internally.

Tuning Recall vs Speed

ANN algorithms expose parameters to trade recall for speed:

# hnswlib example
index = hnswlib.Index(space="cosine", dim=1536)
index.init_index(max_elements=1_000_000, ef_construction=200, M=16)
index.set_ef(50)  # higher ef = better recall, slower queries

labels, distances = index.knn_query(query_vector, k=10)

Benchmark your specific workload to find the ef value that satisfies your latency SLA while maintaining acceptable recall.

ANN in KnowledgeSDK

Typesense (the search backend powering KnowledgeSDK) uses HNSW for its vector index. When you call POST /v1/extract, chunk embeddings are inserted into the HNSW graph incrementally. When you call POST /v1/search, the HNSW traversal finds the approximate nearest neighbors to your query embedding in milliseconds, regardless of how many knowledge items are in your collection.

This means KnowledgeSDK's search latency stays low even as your knowledge base grows to hundreds of thousands of documents.

Related Terms

RAG & Retrievaladvanced

HNSW

Hierarchical Navigable Small World — a graph-based algorithm for fast approximate nearest-neighbor search in high-dimensional vector spaces.

RAG & Retrievalbeginner

Vector Database

A specialized database that stores high-dimensional embedding vectors and enables fast similarity search.

RAG & Retrievalintermediate

Dense Retrieval

A retrieval method that represents both queries and documents as dense vectors and finds matches via nearest-neighbor search.

← API Scraping Async API →

Try it now

Build with Approximate Nearest Neighbor using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary