Parent-Child Chunking

A hierarchical chunking strategy that indexes small child chunks for retrieval but returns their larger parent context to the LLM.

What Is Parent-Child Chunking?

Parent-child chunking is a hierarchical document splitting strategy that decouples the retrieval unit from the context unit. Small "child" chunks are embedded and indexed for precise retrieval, but when a child chunk is retrieved, its larger "parent" chunk (or full document section) is returned to the LLM instead.

This gives you the precision of small-chunk retrieval with the rich context of large-chunk injection.

The Core Problem It Solves

There is a fundamental tension in chunking:

Small chunks (50–100 tokens) → precise embeddings, accurate retrieval, but insufficient context for the LLM to generate a good answer
Large chunks (500–1000 tokens) → rich context for the LLM, but diluted embeddings that retrieve less precisely

Parent-child chunking resolves this tension by using small chunks for retrieval and large chunks for generation.

How It Works

Document: Full page (2000 tokens)
    │
    ├── Parent Chunk A (500 tokens): "Billing and Subscription"
    │       ├── Child Chunk A1 (100 tokens): "Monthly billing cycle"
    │       ├── Child Chunk A2 (100 tokens): "Annual subscription discount"
    │       └── Child Chunk A3 (100 tokens): "Invoice delivery"
    │
    └── Parent Chunk B (500 tokens): "Cancellation Policy"
            ├── Child Chunk B1 (100 tokens): "How to cancel"
            └── Child Chunk B2 (100 tokens): "Refund window"

Index (for retrieval): Child chunks A1, A2, A3, B1, B2
Stored (for context):  Parent chunks A, B

When a query matches child chunk B1, the system fetches parent chunk B and injects the full 500-token section into the LLM prompt.

Variants

Sentence Window Retrieval

The most granular form: each sentence is a child chunk. When retrieved, a window of N sentences around it (the "parent") is returned. Common window sizes: ±2 sentences.

# Retrieve by sentence, return sentence + context window
def get_sentence_window(sentence_id, window=2):
    start = max(0, sentence_id - window)
    end = sentence_id + window + 1
    return " ".join(all_sentences[start:end])

Section-to-Document

Child chunks are document sections; parent is the full document. Useful when document-level context is needed (e.g., legal documents, research papers).

Recursive Hierarchy

Multiple levels: sentence → paragraph → section → document. The system fetches the appropriate level based on confidence of retrieval.

Implementation with LlamaIndex

LlamaIndex has native support for parent-child chunking:

from llama_index.core.node_parser import HierarchicalNodeParser, get_leaf_nodes
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import AutoMergingRetriever

parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]  # document → section → sentence
)
nodes = parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)  # only index the smallest chunks

index = VectorStoreIndex(leaf_nodes)
base_retriever = index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(base_retriever, index.storage_context)

When to Use Parent-Child Chunking

Parent-child chunking is most valuable when:

Your source documents have clear hierarchical structure (headings, sections, subsections)
Small chunks retrieve precisely but the LLM needs more surrounding context
You are building a Q&A system over long technical documentation
Retrieved chunks are frequently cited out of context and produce poor LLM answers

Parent-Child Chunking and KnowledgeSDK

When building RAG on top of KnowledgeSDK, you can implement a parent-child pattern by calling POST /v1/extract to index full pages, then using POST /v1/search to retrieve relevant passages. The search results include source URLs, so you can optionally fetch the full parent section from your own document store when finer-grained context is needed. This hybrid approach gives you KnowledgeSDK's managed retrieval with flexible context expansion in your application layer.

Related Terms

RAG & Retrievalbeginner

Chunking

The process of splitting long documents into smaller, overlapping or non-overlapping segments before embedding and indexing.

RAG & Retrievalbeginner

Sliding Window Chunking

A chunking strategy where consecutive chunks overlap by a fixed number of tokens to preserve context at chunk boundaries.

RAG & Retrievalbeginner

Retrieval-Augmented Generation

A technique that grounds LLM responses by retrieving relevant documents from an external knowledge base before generation.

← Orchestrator Agent Planner (Agent) →

Try it now

Build with Parent-Child Chunking using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary