What Is Sliding Window Chunking?
Sliding window chunking is a text splitting strategy where each chunk overlaps with the previous chunk by a fixed number of tokens. Instead of cutting documents into clean, non-overlapping segments, consecutive chunks share a "window" of repeated content at their boundaries.
This overlap prevents important information — especially context-setting sentences that appear near chunk boundaries — from being split in a way that renders either chunk unusable.
The Problem It Solves
Consider a document about a software feature:
... The feature uses OAuth 2.0 for authentication. | This means you must
register your app in the developer portal before | making API calls.
You'll receive a client_id and client_secret. ...
If the chunk boundary (|) falls in the middle of a conceptual unit, the first chunk ends with incomplete context and the second chunk begins without the setup. The LLM receives an incoherent fragment.
With sliding window, the second chunk repeats the tail of the first chunk, bridging the gap.
How It Works
Document tokens: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...]
Chunk size: 6 tokens
Overlap: 2 tokens
Chunk 1: [1, 2, 3, 4, 5, 6]
Chunk 2: [5, 6, 7, 8, 9, 10] ← tokens 5-6 repeated
Chunk 3: [9, 10, 11, 12, 13, 14] ← tokens 9-10 repeated
The overlap region ensures that a sentence spanning a boundary appears fully in at least one chunk.
Choosing Chunk Size and Overlap
| Parameter | Recommendation | Notes |
|---|---|---|
| Chunk size | 256–512 tokens | Larger for dense prose, smaller for Q&A |
| Overlap | 10–20% of chunk size | 50 tokens on a 400-token chunk is typical |
Overlap that is too small: boundary information is still lost. Overlap that is too large: excessive duplication, index bloat, diluted embeddings.
Implementation
def sliding_window_chunks(text, chunk_size=400, overlap=50):
tokens = tokenizer.encode(text)
chunks = []
start = 0
while start < len(tokens):
end = min(start + chunk_size, len(tokens))
chunk_tokens = tokens[start:end]
chunks.append(tokenizer.decode(chunk_tokens))
if end == len(tokens):
break
start += chunk_size - overlap # advance by (chunk_size - overlap)
return chunks
LangChain's RecursiveCharacterTextSplitter implements this with a chunk_overlap parameter:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=400,
chunk_overlap=50,
length_function=count_tokens
)
chunks = splitter.split_text(document)
Sliding Window vs Fixed Split
| Fixed (no overlap) | Sliding Window | |
|---|---|---|
| Boundary artifacts | Yes | Minimized |
| Index size | Smaller | Larger (~10–20%) |
| Retrieval coherence | Lower | Higher |
| Implementation | Trivial | Simple |
Trade-offs
- Slightly larger index — each overlapping token is stored twice, increasing storage by ~(overlap / chunk_size) × 100%
- Duplicate retrieval — adjacent chunks may be retrieved for the same query; deduplicate before LLM injection
- Diluted embeddings — if overlap is too large, chunk embeddings become too similar to adjacent chunks, reducing retrieval specificity
Sliding Window in KnowledgeSDK
When POST /v1/extract processes a URL, KnowledgeSDK applies overlap-aware chunking tuned for web content. The extracted knowledge_items are chunked at semantic boundaries with enough overlap to preserve context across sections, so retrieved chunks are coherent and usable without additional post-processing.