Embedding

A dense numerical vector representation of text, images, or other data that captures semantic meaning in a high-dimensional space.

What Is an Embedding?

An embedding is a dense vector — an ordered list of floating-point numbers — that represents a piece of content (text, image, audio) in a high-dimensional geometric space. Points that are close together in that space have similar meaning; points far apart are semantically different.

For example, the sentences "How do I cancel my plan?" and "I want to unsubscribe" will produce vectors that are very close to each other, even though they share no words.

How Embeddings Are Generated

An embedding model (typically a transformer) maps input text to a fixed-size vector:

import openai

response = openai.embeddings.create(
    model="text-embedding-3-small",
    input="How do I cancel my subscription?"
)

vector = response.data[0].embedding  # list of 1536 floats

Common embedding models:

Model	Dimensions	Provider
text-embedding-3-small	1536	OpenAI
text-embedding-3-large	3072	OpenAI
embed-english-v3.0	1024	Cohere
all-MiniLM-L6-v2	384	Sentence Transformers

Why Dimensionality Matters

Higher dimensions generally capture more nuance but cost more to store and query. A 1536-dimension embedding for 1 million chunks requires roughly 6 GB of float32 storage. Many systems use quantization (int8 or binary) to reduce this by 4–32x with minimal accuracy loss.

Embedding Properties

Directionality — the angle between vectors encodes semantic similarity (see cosine similarity)
Compositionality — related concepts cluster together; analogies can sometimes be solved by vector arithmetic
Model-specificity — vectors from different models are not comparable; always use the same model for indexing and querying

Bi-Encoders vs Cross-Encoders

Bi-encoder — encodes query and document independently into vectors, then compares. Fast but less accurate. Used for initial retrieval.
Cross-encoder — processes query and document together, producing a relevance score. Slower but more accurate. Used for re-ranking.

Most RAG pipelines use a bi-encoder for retrieval and a cross-encoder for re-ranking.

Embeddings in KnowledgeSDK

When you call POST /v1/extract on a URL, KnowledgeSDK automatically:

Scrapes and cleans the page content
Splits it into chunks
Embeds each chunk using a high-quality embedding model
Stores the vectors in your dedicated Typesense collection

When you call POST /v1/search, your query is embedded with the same model and compared against stored chunk vectors. You get back the most semantically relevant passages without writing a single line of embedding code.

Practical Tips

Always use the same model for indexing and querying — mixing models produces garbage results
Embed at the right granularity — too short (single sentence) loses context; too long (full page) dilutes specificity
Re-embed after model upgrades — new model versions produce incompatible vector spaces
Cache embeddings — embedding is the most expensive step; cache results for repeated content

Related Terms

RAG & Retrievalbeginner

Vector Database

A specialized database that stores high-dimensional embedding vectors and enables fast similarity search.

RAG & Retrievalbeginner

Semantic Search

A search approach that finds results based on meaning and intent rather than exact keyword matching.

RAG & Retrievalbeginner

Cosine Similarity

A metric that measures the angle between two vectors, commonly used to compare how semantically similar two embeddings are.

← DOM Parsing Entity Extraction →

Try it now

Build with Embedding using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary