knowledgesdk.com/glossary/chunking
RAG & Retrievalbeginner

Also known as: text chunking, document splitting

Chunking

The process of splitting long documents into smaller, overlapping or non-overlapping segments before embedding and indexing.

What Is Chunking?

Chunking is the process of dividing long documents into smaller segments before embedding and storing them in a vector database. It is one of the most impactful — and most underestimated — decisions in any RAG pipeline.

LLMs and embedding models both have token limits. A 50-page PDF cannot be embedded as a single unit, and even if it could, the resulting vector would be too diffuse to match specific queries accurately.

Why Chunk Size Matters

  • Too small (e.g., 50 tokens) — individual chunks lose context; retrieved passages are hard for the LLM to use
  • Too large (e.g., 2000 tokens) — vectors are diluted; a chunk about 10 different topics will score mediocrely for all of them
  • Sweet spot — typically 256–512 tokens for most use cases, with overlap to prevent boundary artifacts

Common Chunking Strategies

Fixed-Size Chunking

Split every N tokens regardless of sentence or paragraph boundaries. Simple and fast, but can cut mid-sentence.

def chunk_fixed(text, size=400, overlap=50):
    tokens = tokenize(text)
    chunks = []
    for i in range(0, len(tokens), size - overlap):
        chunks.append(tokens[i:i + size])
    return chunks

Sentence-Aware Chunking

Split on sentence boundaries, accumulating until a token budget is reached. Produces more coherent chunks.

Recursive Character Splitting

Tries to split on paragraph (\n\n), then sentence (. ), then word, falling back to character. This is the default strategy in LangChain's RecursiveCharacterTextSplitter.

Semantic Chunking

Embeds each sentence and splits when cosine similarity between adjacent sentences drops below a threshold. More expensive but topic-coherent.

Structured Chunking

For documents with known structure (Markdown, HTML), split on headings (##, ###) or HTML section tags. Preserves natural logical units.

Overlap

Most strategies include an overlap — the last N tokens of chunk i are repeated at the start of chunk i+1. This prevents important context from falling into the gap between chunks.

Chunk 1: [tokens 1–400]
Chunk 2: [tokens 350–750]   ← 50-token overlap
Chunk 3: [tokens 700–1100]

What to Attach as Metadata

Every chunk should store:

  • Source URL or document ID
  • Page number or section heading
  • Creation timestamp
  • Any category or tag from the source

This metadata enables filtered retrieval and source attribution in LLM responses.

Chunking with KnowledgeSDK

When you call POST /v1/extract, KnowledgeSDK automatically handles chunking optimized for web content — splitting on semantic boundaries (headings, paragraphs) while respecting token budgets. Each chunk is stored as a knowledge_item with source metadata attached.

curl -X POST https://api.knowledgesdk.com/v1/extract \
  -H "x-api-key: knowledgesdk_live_..." \
  -d '{"url": "https://docs.example.com/guide"}'

The chunks are immediately searchable via POST /v1/search without any additional configuration.

Chunking Best Practices

  • Start with 512 tokens, 50-token overlap, and iterate based on retrieval quality
  • Use sentence-aware or heading-aware splitting for documentation
  • Always store the parent document reference alongside each chunk
  • Evaluate chunking quality by checking whether retrieved chunks are self-contained and relevant

Related Terms

RAG & Retrievalbeginner
Retrieval-Augmented Generation
A technique that grounds LLM responses by retrieving relevant documents from an external knowledge base before generation.
RAG & Retrievalbeginner
Sliding Window Chunking
A chunking strategy where consecutive chunks overlap by a fixed number of tokens to preserve context at chunk boundaries.
RAG & Retrievalintermediate
Parent-Child Chunking
A hierarchical chunking strategy that indexes small child chunks for retrieval but returns their larger parent context to the LLM.
RAG & Retrievalbeginner
Indexing
The process of transforming raw content into a searchable structure — embeddings, inverted indexes, or graph nodes — that enables fast retrieval.
Change DetectionContent Deduplication

Try it now

Build with Chunking using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary