Context Engineering

The practice of carefully designing and managing the information provided to an LLM's context window to maximize response quality.

What Is Context Engineering?

Context engineering is the discipline of deciding what information to put into an LLM's context window — and what to leave out — in order to produce the best possible response. As AI systems have grown more capable, it has become clear that the quality of an LLM's output is often determined less by the model itself and more by the quality of the context it is given.

Where prompt engineering focuses on how to phrase instructions and questions, context engineering focuses on what information surrounds those instructions. The two disciplines are complementary.

Why Context Engineering Matters

An LLM can only reason about what is in its context window. If the relevant information is missing, the model will hallucinate or produce a generic answer. If the context is cluttered with irrelevant information, the model may get confused or dilute its attention on what matters.

Context engineering treats the context window as a resource to be managed:

Too little — The model lacks grounding and guesses.
Too much — The model loses focus and relevant facts get buried.
Wrong information — The model reasons confidently toward the wrong answer.
Right information, well-structured — The model produces accurate, grounded, useful output.

Key Techniques

Selective Retrieval

Rather than dumping an entire document into context, retrieve only the most relevant passages using vector search. KnowledgeSDK's /v1/search endpoint enables semantic retrieval so only the pertinent sections of a knowledge base enter the context window.

Structured Formatting

Information formatted as structured JSON or markdown tables is easier for models to parse than unstructured prose. KnowledgeSDK's /v1/extract outputs structured data specifically to make downstream LLM reasoning more reliable.

Context Prioritization

Place the most important information early in the context (for models with primacy bias) or late (for models with recency bias). Most current frontier models handle both positions well, but testing matters.

Summarization

When a source document is too long to fit in context, summarize it first. An agent can produce a compressed version of a scraped webpage before passing it to the main reasoning step.

Memory Management

In long-running agents, earlier conversation turns and tool results must be selectively retained or compressed to prevent context overflow. This is the agent memory problem, and it is fundamentally a context engineering challenge.

Context Engineering in Agentic Systems

In an agent loop, every tool call result becomes new content that must be managed in context. A well-engineered agent:

Stores raw tool outputs in an external memory store.
Retrieves only the relevant portions when needed.
Compresses or summarizes older turns.
Maintains a concise working memory of the current task state.

This prevents the context from becoming a cluttered transcript and keeps the model focused on the immediate reasoning task.

Practical Advice

Always test what your agent sees at each step — log the exact context being sent to the model.
Prefer extracting structured data over passing raw HTML or unformatted text.
Use system prompts to establish stable, high-priority context (persona, goals, constraints).
Leave room for tool outputs — overloaded system prompts leave no space for retrieved information.