Memory (AI Agents)

The mechanisms by which an AI agent stores and retrieves information across turns, sessions, or tasks to maintain continuity.

What Is Memory in AI Agents?

An LLM has no persistent memory between API calls. Every request starts fresh — the model knows nothing about previous interactions unless you explicitly include that history in the prompt. Agent memory refers to the systems and strategies developers build to give agents continuity: the ability to remember what they have done, learned, and been told across multiple turns, sessions, or even across separate task runs.

Memory is what turns a stateless language model into a system that feels like it knows you and can work on a task over time.

Types of Agent Memory

In-Context Memory (Working Memory)

The simplest form: everything stored directly in the current context window. The agent's full conversation history, tool results, and intermediate reasoning are included in every prompt. Fast to implement but limited — context windows have a finite size, and large histories become expensive and degrade quality.

External Memory (Long-Term Memory)

Information persisted outside the model in a database or vector store. The agent writes important facts or summaries to external storage and retrieves them on demand using search. This scales indefinitely and persists across sessions.

Episodic memory — Records of specific past events or interactions ("Last time the user asked about Acme Corp, I found these three competitors").
Semantic memory — General facts and knowledge extracted from documents and stored as embeddings for retrieval.

Procedural Memory

Encoded in the model weights themselves or in the system prompt as persistent instructions. This is how the agent knows how to behave — its identity, capabilities, and rules — rather than what it knows about the world.

The Memory Problem in Long-Running Agents

As an agent executes a long task, its context fills with tool call results, intermediate reasoning, and conversation history. Eventually it hits the context window limit. Good memory management strategies include:

Summarization — Periodically compress earlier conversation turns into a concise summary.
Selective retention — Only keep the most relevant tool results; offload the rest to external storage.
Retrieval on demand — Store all observations externally and retrieve only what is needed at each step.

Memory and KnowledgeSDK

KnowledgeSDK's /v1/search endpoint enables semantic memory retrieval: extracted knowledge is stored in a per-account vector index, and agents can search it with natural-language queries to retrieve relevant facts without needing to re-extract source pages.

A practical pattern:

Agent extracts a company's website with /v1/extract — structured data is indexed automatically.
In a later session, the agent queries /v1/search for facts about that company — no re-extraction needed.
Only the retrieved snippets enter the context window, keeping it lean.

Choosing the Right Memory Strategy

Use Case	Recommended Approach
Short single-turn tasks	In-context memory only
Multi-turn conversations	In-context + summarization
Long-running research agents	External vector memory + selective retrieval
Shared knowledge across agents	Centralized external store with semantic search