What Is Parent-Child Chunking?
Parent-child chunking is a hierarchical document splitting strategy that decouples the retrieval unit from the context unit. Small "child" chunks are embedded and indexed for precise retrieval, but when a child chunk is retrieved, its larger "parent" chunk (or full document section) is returned to the LLM instead.
This gives you the precision of small-chunk retrieval with the rich context of large-chunk injection.
The Core Problem It Solves
There is a fundamental tension in chunking:
- Small chunks (50–100 tokens) → precise embeddings, accurate retrieval, but insufficient context for the LLM to generate a good answer
- Large chunks (500–1000 tokens) → rich context for the LLM, but diluted embeddings that retrieve less precisely
Parent-child chunking resolves this tension by using small chunks for retrieval and large chunks for generation.
How It Works
Document: Full page (2000 tokens)
│
├── Parent Chunk A (500 tokens): "Billing and Subscription"
│ ├── Child Chunk A1 (100 tokens): "Monthly billing cycle"
│ ├── Child Chunk A2 (100 tokens): "Annual subscription discount"
│ └── Child Chunk A3 (100 tokens): "Invoice delivery"
│
└── Parent Chunk B (500 tokens): "Cancellation Policy"
├── Child Chunk B1 (100 tokens): "How to cancel"
└── Child Chunk B2 (100 tokens): "Refund window"
Index (for retrieval): Child chunks A1, A2, A3, B1, B2
Stored (for context): Parent chunks A, B
When a query matches child chunk B1, the system fetches parent chunk B and injects the full 500-token section into the LLM prompt.
Variants
Sentence Window Retrieval
The most granular form: each sentence is a child chunk. When retrieved, a window of N sentences around it (the "parent") is returned. Common window sizes: ±2 sentences.
# Retrieve by sentence, return sentence + context window
def get_sentence_window(sentence_id, window=2):
start = max(0, sentence_id - window)
end = sentence_id + window + 1
return " ".join(all_sentences[start:end])
Section-to-Document
Child chunks are document sections; parent is the full document. Useful when document-level context is needed (e.g., legal documents, research papers).
Recursive Hierarchy
Multiple levels: sentence → paragraph → section → document. The system fetches the appropriate level based on confidence of retrieval.
Implementation with LlamaIndex
LlamaIndex has native support for parent-child chunking:
from llama_index.core.node_parser import HierarchicalNodeParser, get_leaf_nodes
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import AutoMergingRetriever
parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128] # document → section → sentence
)
nodes = parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes) # only index the smallest chunks
index = VectorStoreIndex(leaf_nodes)
base_retriever = index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(base_retriever, index.storage_context)
When to Use Parent-Child Chunking
Parent-child chunking is most valuable when:
- Your source documents have clear hierarchical structure (headings, sections, subsections)
- Small chunks retrieve precisely but the LLM needs more surrounding context
- You are building a Q&A system over long technical documentation
- Retrieved chunks are frequently cited out of context and produce poor LLM answers
Parent-Child Chunking and KnowledgeSDK
When building RAG on top of KnowledgeSDK, you can implement a parent-child pattern by calling POST /v1/extract to index full pages, then using POST /v1/search to retrieve relevant passages. The search results include source URLs, so you can optionally fetch the full parent section from your own document store when finer-grained context is needed. This hybrid approach gives you KnowledgeSDK's managed retrieval with flexible context expansion in your application layer.