Semantic Memory for AI Agents: Beyond Conversation History

Conversation history is just one type of agent memory. Semantic memory — structured knowledge about the world — is what lets agents reason about facts, not just recall chat logs.

Most tutorials on AI agent memory cover the same pattern: take the last N messages, stuff them into the context window, let the model "remember" what was said. This is conversation history — simple, useful, and widely understood.

What gets far less attention is semantic memory: the structured knowledge about the world that lets an agent reason about facts rather than just recall what was said in previous turns. The difference matters more than most developers realize, and the architecture required to support it is meaningfully different.

The Agent Memory Taxonomy

Cognitive science describes four types of long-term memory in humans. The same taxonomy maps cleanly onto AI agent architecture.

1. Working memory — what is active right now. In agent terms, this is the context window: the current conversation, the retrieved documents, the tool call results that are live in the prompt. It is fast, immediately accessible, and limited in size.

2. Episodic memory — what happened in past experiences. For agents, this is the conversation history across sessions: what the user said last week, what decisions were made in previous interactions, what topics have already been covered. Tools like Mem0, Zep, and Supermemory specialize in this layer.

3. Semantic memory — structured knowledge about the world. This is not what happened — it is what is true. "The company's pricing plan costs $29/month." "The API rate limit is 100 requests per minute." "Competitor X launched a new feature last Tuesday." These are facts about the world that exist independently of any conversation.

4. Procedural memory — how to do things. In agents, this manifests as the system prompt (role definition, behavioral rules), few-shot examples (demonstrated patterns), and tool definitions (available capabilities). It is implicit knowledge baked into the agent's configuration rather than retrieved dynamically.

Most developer tutorials address working memory (context window management) and episodic memory (conversation history persistence). Semantic memory is the layer that most production agents are missing.

What Semantic Memory Looks Like in Practice

Semantic memory is not a chat log. It is not a user fact ("Alice works at Acme Corp"). It is knowledge about the external world that the agent needs to reason about accurately.

Examples of semantic memory facts:

"The Starter plan includes 1,000 API calls per month and costs $29."
"The competitor's latest release added support for Python 3.12."
"The company's headquarters moved to Austin in 2024."
"The regulatory requirement changed in Q1 2026."

These facts are true regardless of what any user has ever said in any conversation. They need to be sourced from documents, websites, and databases — not extracted from chat history.

An agent without semantic memory has to either hallucinate these facts (dangerous), refuse to answer questions about them (frustrating), or be given all relevant facts in a static system prompt (brittle, stale, and quickly too large to fit).

Where Semantic Memory Comes From

Episodic memory is populated by conversations — the agent observes what users say and stores it. Semantic memory requires a different ingestion pipeline.

Sources of semantic memory:

Your own documentation and knowledge base — product docs, API references, support articles, FAQs
Competitor and market intelligence — extracted from competitor websites, product pages, pricing pages
Industry content — standards documents, regulatory filings, news, research papers
External data feeds — structured data from APIs, converted to indexed text

All of these need to be extracted, chunked, embedded, and stored in a searchable vector store. The retrieval loop — query arrives, search semantic memory, inject relevant facts into context — is the same regardless of source.

The Semantic Memory Retrieval Loop

The loop is simple but powerful:

User sends a message to the agent
Search semantic memory with the user's query
Retrieve the top-K most relevant fact chunks
Inject retrieved facts into the system prompt as context
LLM generates a response grounded in retrieved facts

This is standard RAG, applied specifically to the semantic memory layer. The architectural insight is framing it as memory — something the agent knows about the world — rather than just "search."

KnowledgeSDK as a Semantic Memory Layer

KnowledgeSDK's POST /v1/extract endpoint handles the ingestion side: extract a URL, and its content is automatically indexed in your private collection. POST /v1/search handles retrieval: query your collection semantically to find relevant facts.

This makes KnowledgeSDK a direct implementation of the semantic memory layer:

Ingestion: POST /v1/extract → web content becomes indexed knowledge
Retrieval: POST /v1/search → query returns relevant facts from that knowledge

The collection acts as your agent's semantic memory store. Facts are extracted from the web (or any URL) and retrieved on-demand during agent execution.

Contrast with Episodic Memory Tools

Mem0, Zep, and Supermemory are episodic memory tools. They track what was said within your application — user preferences expressed in conversation, decisions made across sessions, the relationship history between user and agent.

The distinction is the data source:

Episodic tools: populated by what users say → stores user-stated facts
Semantic tools: populated by external sources → stores world facts

An agent needs both. A user asking "what plan should I upgrade to?" needs:

Episodic context: what they told you about their usage needs last week (Mem0/Zep)
Semantic context: what the plans actually include and cost right now (KnowledgeSDK)

Neither alone is sufficient. The best agents compose all memory types.

Building a Semantic Memory Store

Four steps to implement the semantic memory layer:

Step 1: Define your knowledge domains. What does your agent need to know about the world? Competitor products, your own documentation, industry standards, regulatory requirements. Each domain is a set of source URLs or documents.

Step 2: Extract each domain's sources.

import KnowledgeSDK from '@knowledgesdk/node';
const ks = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY });

const knowledgeDomains = {
  ownDocs: ['https://docs.yourproduct.com', 'https://docs.yourproduct.com/api'],
  competitors: ['https://competitor-a.com/pricing', 'https://competitor-b.com/features'],
  industry: ['https://industry-standard.org/specs'],
};

for (const urls of Object.values(knowledgeDomains)) {
  for (const url of urls) {
    await ks.extract({ url });
  }
}

Step 3: Search on every agent query.

Step 4: Update periodically or on change. Semantic memory goes stale as the world changes. Implement TTL-based re-indexing for each domain based on how frequently that type of content changes.

Freshness: Semantic Memory's Achilles Heel

Episodic memory records what was said — that is immutable. Semantic memory records what is true — and that changes. A pricing page from six months ago is not just stale; it is actively misleading.

The fix is TTL-based re-indexing. Tag each extracted URL with a freshness tier (high-churn: daily, medium-churn: weekly, low-churn: monthly), and run scheduled re-extraction jobs for each tier. KnowledgeSDK's re-extraction automatically replaces the previous indexed content for the same URL — no manual deduplication required.

Composing All Four Memory Types

The agents with the highest response quality use all four memory types together:

async function agent(userId: string, message: string): Promise<string> {
  // Working memory: current conversation (already in context)

  // Episodic memory: what this user has told us previously
  const userHistory = await mem0.search({ userId, query: message });

  // Semantic memory: what the world looks like right now
  const worldKnowledge = await ks.search({ query: message, limit: 5 });

  // Procedural memory: how the agent should behave (system prompt)
  const systemPrompt = `
You are a helpful assistant. Use the following context to answer accurately.

User history: ${userHistory.map(m => m.memory).join('\n')}

Current world knowledge:
${worldKnowledge.results.map(r => r.content).join('\n\n')}
  `.trim();

  const response = await llm.complete({
    system: systemPrompt,
    messages: [{ role: 'user', content: message }],
  });

  // Store this interaction in episodic memory for next time
  await mem0.add({ userId, messages: [{ role: 'user', content: message }] });

  return response.content;
}

This agent knows what the user has told it (episodic), what is true about the world (semantic), and how it should behave (procedural) — all within a single context window (working memory).

The semantic memory layer is what elevates an agent from a sophisticated chatbot to a genuinely knowledgeable assistant. Conversation history alone produces agents that remember you but cannot tell you what is actually true. Semantic memory produces agents that know things — and stay current as those things change.

Try it now