What Is Memory in AI Agents?
An LLM has no persistent memory between API calls. Every request starts fresh — the model knows nothing about previous interactions unless you explicitly include that history in the prompt. Agent memory refers to the systems and strategies developers build to give agents continuity: the ability to remember what they have done, learned, and been told across multiple turns, sessions, or even across separate task runs.
Memory is what turns a stateless language model into a system that feels like it knows you and can work on a task over time.
Types of Agent Memory
In-Context Memory (Working Memory)
The simplest form: everything stored directly in the current context window. The agent's full conversation history, tool results, and intermediate reasoning are included in every prompt. Fast to implement but limited — context windows have a finite size, and large histories become expensive and degrade quality.
External Memory (Long-Term Memory)
Information persisted outside the model in a database or vector store. The agent writes important facts or summaries to external storage and retrieves them on demand using search. This scales indefinitely and persists across sessions.
- Episodic memory — Records of specific past events or interactions ("Last time the user asked about Acme Corp, I found these three competitors").
- Semantic memory — General facts and knowledge extracted from documents and stored as embeddings for retrieval.
Procedural Memory
Encoded in the model weights themselves or in the system prompt as persistent instructions. This is how the agent knows how to behave — its identity, capabilities, and rules — rather than what it knows about the world.
The Memory Problem in Long-Running Agents
As an agent executes a long task, its context fills with tool call results, intermediate reasoning, and conversation history. Eventually it hits the context window limit. Good memory management strategies include:
- Summarization — Periodically compress earlier conversation turns into a concise summary.
- Selective retention — Only keep the most relevant tool results; offload the rest to external storage.
- Retrieval on demand — Store all observations externally and retrieve only what is needed at each step.
Memory and KnowledgeSDK
KnowledgeSDK's /v1/search endpoint enables semantic memory retrieval: extracted knowledge is stored in a per-account vector index, and agents can search it with natural-language queries to retrieve relevant facts without needing to re-extract source pages.
A practical pattern:
- Agent extracts a company's website with
/v1/extract— structured data is indexed automatically. - In a later session, the agent queries
/v1/searchfor facts about that company — no re-extraction needed. - Only the retrieved snippets enter the context window, keeping it lean.
Choosing the Right Memory Strategy
| Use Case | Recommended Approach |
|---|---|
| Short single-turn tasks | In-context memory only |
| Multi-turn conversations | In-context + summarization |
| Long-running research agents | External vector memory + selective retrieval |
| Shared knowledge across agents | Centralized external store with semantic search |