knowledgesdk.com/glossary/embedding
RAG & Retrievalbeginner

Also known as: vector embedding, text embedding

Embedding

A dense numerical vector representation of text, images, or other data that captures semantic meaning in a high-dimensional space.

What Is an Embedding?

An embedding is a dense vector — an ordered list of floating-point numbers — that represents a piece of content (text, image, audio) in a high-dimensional geometric space. Points that are close together in that space have similar meaning; points far apart are semantically different.

For example, the sentences "How do I cancel my plan?" and "I want to unsubscribe" will produce vectors that are very close to each other, even though they share no words.

How Embeddings Are Generated

An embedding model (typically a transformer) maps input text to a fixed-size vector:

import openai

response = openai.embeddings.create(
    model="text-embedding-3-small",
    input="How do I cancel my subscription?"
)

vector = response.data[0].embedding  # list of 1536 floats

Common embedding models:

Model Dimensions Provider
text-embedding-3-small 1536 OpenAI
text-embedding-3-large 3072 OpenAI
embed-english-v3.0 1024 Cohere
all-MiniLM-L6-v2 384 Sentence Transformers

Why Dimensionality Matters

Higher dimensions generally capture more nuance but cost more to store and query. A 1536-dimension embedding for 1 million chunks requires roughly 6 GB of float32 storage. Many systems use quantization (int8 or binary) to reduce this by 4–32x with minimal accuracy loss.

Embedding Properties

  • Directionality — the angle between vectors encodes semantic similarity (see cosine similarity)
  • Compositionality — related concepts cluster together; analogies can sometimes be solved by vector arithmetic
  • Model-specificity — vectors from different models are not comparable; always use the same model for indexing and querying

Bi-Encoders vs Cross-Encoders

  • Bi-encoder — encodes query and document independently into vectors, then compares. Fast but less accurate. Used for initial retrieval.
  • Cross-encoder — processes query and document together, producing a relevance score. Slower but more accurate. Used for re-ranking.

Most RAG pipelines use a bi-encoder for retrieval and a cross-encoder for re-ranking.

Embeddings in KnowledgeSDK

When you call POST /v1/extract on a URL, KnowledgeSDK automatically:

  1. Scrapes and cleans the page content
  2. Splits it into chunks
  3. Embeds each chunk using a high-quality embedding model
  4. Stores the vectors in your dedicated Typesense collection

When you call POST /v1/search, your query is embedded with the same model and compared against stored chunk vectors. You get back the most semantically relevant passages without writing a single line of embedding code.

Practical Tips

  • Always use the same model for indexing and querying — mixing models produces garbage results
  • Embed at the right granularity — too short (single sentence) loses context; too long (full page) dilutes specificity
  • Re-embed after model upgrades — new model versions produce incompatible vector spaces
  • Cache embeddings — embedding is the most expensive step; cache results for repeated content

Related Terms

RAG & Retrievalbeginner
Vector Database
A specialized database that stores high-dimensional embedding vectors and enables fast similarity search.
RAG & Retrievalbeginner
Semantic Search
A search approach that finds results based on meaning and intent rather than exact keyword matching.
RAG & Retrievalbeginner
Cosine Similarity
A metric that measures the angle between two vectors, commonly used to compare how semantically similar two embeddings are.
DOM ParsingEntity Extraction

Try it now

Build with Embedding using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary