knowledgesdk.com/glossary/sliding-window
RAG & Retrievalbeginner

Also known as: overlapping chunks, sliding window

Sliding Window Chunking

A chunking strategy where consecutive chunks overlap by a fixed number of tokens to preserve context at chunk boundaries.

What Is Sliding Window Chunking?

Sliding window chunking is a text splitting strategy where each chunk overlaps with the previous chunk by a fixed number of tokens. Instead of cutting documents into clean, non-overlapping segments, consecutive chunks share a "window" of repeated content at their boundaries.

This overlap prevents important information — especially context-setting sentences that appear near chunk boundaries — from being split in a way that renders either chunk unusable.

The Problem It Solves

Consider a document about a software feature:

... The feature uses OAuth 2.0 for authentication. | This means you must
register your app in the developer portal before | making API calls.
You'll receive a client_id and client_secret. ...

If the chunk boundary (|) falls in the middle of a conceptual unit, the first chunk ends with incomplete context and the second chunk begins without the setup. The LLM receives an incoherent fragment.

With sliding window, the second chunk repeats the tail of the first chunk, bridging the gap.

How It Works

Document tokens: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...]

Chunk size: 6 tokens
Overlap:    2 tokens

Chunk 1: [1, 2, 3, 4, 5, 6]
Chunk 2:          [5, 6, 7, 8, 9, 10]   ← tokens 5-6 repeated
Chunk 3:                    [9, 10, 11, 12, 13, 14]   ← tokens 9-10 repeated

The overlap region ensures that a sentence spanning a boundary appears fully in at least one chunk.

Choosing Chunk Size and Overlap

Parameter Recommendation Notes
Chunk size 256–512 tokens Larger for dense prose, smaller for Q&A
Overlap 10–20% of chunk size 50 tokens on a 400-token chunk is typical

Overlap that is too small: boundary information is still lost. Overlap that is too large: excessive duplication, index bloat, diluted embeddings.

Implementation

def sliding_window_chunks(text, chunk_size=400, overlap=50):
    tokens = tokenizer.encode(text)
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + chunk_size, len(tokens))
        chunk_tokens = tokens[start:end]
        chunks.append(tokenizer.decode(chunk_tokens))
        if end == len(tokens):
            break
        start += chunk_size - overlap  # advance by (chunk_size - overlap)
    return chunks

LangChain's RecursiveCharacterTextSplitter implements this with a chunk_overlap parameter:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=50,
    length_function=count_tokens
)
chunks = splitter.split_text(document)

Sliding Window vs Fixed Split

Fixed (no overlap) Sliding Window
Boundary artifacts Yes Minimized
Index size Smaller Larger (~10–20%)
Retrieval coherence Lower Higher
Implementation Trivial Simple

Trade-offs

  • Slightly larger index — each overlapping token is stored twice, increasing storage by ~(overlap / chunk_size) × 100%
  • Duplicate retrieval — adjacent chunks may be retrieved for the same query; deduplicate before LLM injection
  • Diluted embeddings — if overlap is too large, chunk embeddings become too similar to adjacent chunks, reducing retrieval specificity

Sliding Window in KnowledgeSDK

When POST /v1/extract processes a URL, KnowledgeSDK applies overlap-aware chunking tuned for web content. The extracted knowledge_items are chunked at semantic boundaries with enough overlap to preserve context across sections, so retrieved chunks are coherent and usable without additional post-processing.

Related Terms

RAG & Retrievalbeginner
Chunking
The process of splitting long documents into smaller, overlapping or non-overlapping segments before embedding and indexing.
RAG & Retrievalintermediate
Parent-Child Chunking
A hierarchical chunking strategy that indexes small child chunks for retrieval but returns their larger parent context to the LLM.
RAG & Retrievalbeginner
Retrieval-Augmented Generation
A technique that grounds LLM responses by retrieving relevant documents from an external knowledge base before generation.
Skill (Agent)Sparse Retrieval

Try it now

Build with Sliding Window Chunking using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary