Sliding Window Chunking

A chunking strategy where consecutive chunks overlap by a fixed number of tokens to preserve context at chunk boundaries.

What Is Sliding Window Chunking?

Sliding window chunking is a text splitting strategy where each chunk overlaps with the previous chunk by a fixed number of tokens. Instead of cutting documents into clean, non-overlapping segments, consecutive chunks share a "window" of repeated content at their boundaries.

This overlap prevents important information — especially context-setting sentences that appear near chunk boundaries — from being split in a way that renders either chunk unusable.

The Problem It Solves

Consider a document about a software feature:

... The feature uses OAuth 2.0 for authentication. | This means you must
register your app in the developer portal before | making API calls.
You'll receive a client_id and client_secret. ...

If the chunk boundary (|) falls in the middle of a conceptual unit, the first chunk ends with incomplete context and the second chunk begins without the setup. The LLM receives an incoherent fragment.

With sliding window, the second chunk repeats the tail of the first chunk, bridging the gap.

How It Works

Document tokens: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...]

Chunk size: 6 tokens
Overlap:    2 tokens

Chunk 1: [1, 2, 3, 4, 5, 6]
Chunk 2:          [5, 6, 7, 8, 9, 10]   ← tokens 5-6 repeated
Chunk 3:                    [9, 10, 11, 12, 13, 14]   ← tokens 9-10 repeated

The overlap region ensures that a sentence spanning a boundary appears fully in at least one chunk.

Choosing Chunk Size and Overlap

Parameter	Recommendation	Notes
Chunk size	256–512 tokens	Larger for dense prose, smaller for Q&A
Overlap	10–20% of chunk size	50 tokens on a 400-token chunk is typical

Overlap that is too small: boundary information is still lost. Overlap that is too large: excessive duplication, index bloat, diluted embeddings.

Implementation

def sliding_window_chunks(text, chunk_size=400, overlap=50):
    tokens = tokenizer.encode(text)
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + chunk_size, len(tokens))
        chunk_tokens = tokens[start:end]
        chunks.append(tokenizer.decode(chunk_tokens))
        if end == len(tokens):
            break
        start += chunk_size - overlap  # advance by (chunk_size - overlap)
    return chunks

LangChain's RecursiveCharacterTextSplitter implements this with a chunk_overlap parameter:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=50,
    length_function=count_tokens
)
chunks = splitter.split_text(document)

Sliding Window vs Fixed Split

	Fixed (no overlap)	Sliding Window
Boundary artifacts	Yes	Minimized
Index size	Smaller	Larger (~10–20%)
Retrieval coherence	Lower	Higher
Implementation	Trivial	Simple

Trade-offs

Slightly larger index — each overlapping token is stored twice, increasing storage by ~(overlap / chunk_size) × 100%
Duplicate retrieval — adjacent chunks may be retrieved for the same query; deduplicate before LLM injection
Diluted embeddings — if overlap is too large, chunk embeddings become too similar to adjacent chunks, reducing retrieval specificity

Sliding Window in KnowledgeSDK

When POST /v1/extract processes a URL, KnowledgeSDK applies overlap-aware chunking tuned for web content. The extracted knowledge_items are chunked at semantic boundaries with enough overlap to preserve context across sections, so retrieved chunks are coherent and usable without additional post-processing.

Related Terms

RAG & Retrievalbeginner

Chunking

The process of splitting long documents into smaller, overlapping or non-overlapping segments before embedding and indexing.

RAG & Retrievalintermediate

Parent-Child Chunking

A hierarchical chunking strategy that indexes small child chunks for retrieval but returns their larger parent context to the LLM.

RAG & Retrievalbeginner

Retrieval-Augmented Generation

A technique that grounds LLM responses by retrieving relevant documents from an external knowledge base before generation.

← Skill (Agent)Sparse Retrieval →

Try it now

Build with Sliding Window Chunking using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary