Building Stateful AI Agents With Live Web Knowledge

Stateless agents forget everything. Stateful agents with web knowledge are unstoppable. Here's how to build agents that persist context AND stay current with the web.

Building Stateful AI Agents With Live Web Knowledge

Most AI agent tutorials fall into one of two categories: they show you how to persist conversation history across sessions, or they show you how to plug in a web search tool. Very few show you how to do both — and how to make them work together properly.

This article covers both dimensions. By the end, you'll have a blueprint for agents that remember what users said last week AND know what the web says right now.

The Two Dimensions of Agent State

When developers talk about "stateful agents," they usually mean memory of past interactions. The agent remembers that you prefer Python over JavaScript, that your company is called Acme Corp, that you asked about competitor pricing last Tuesday.

But there's a second dimension of state that gets far less attention: knowledge of the current web. What did your competitor announce this week? What does their pricing page say today vs. three months ago? What's in the latest version of a library's changelog?

A truly capable agent needs both:

Memory state — persistent recall of past interactions and user context
Knowledge state — current, accurate understanding of external web content

Most production agents handle dimension 1 reasonably well. Dimension 2 is where they fall apart, because the web changes and training data doesn't.

The Stateless Problem

A stateless agent starts fresh every turn. No memory of previous sessions, no awareness of web content beyond its training cutoff. This design is simple and fast — and it fails in almost every real use case.

Here's what stateless looks like in practice. A user asks a competitive intelligence agent: "How does Acme's pricing compare to their competitors?" A stateless agent either:

Hallucates based on potentially outdated training data
Admits it doesn't know and asks the user to paste in the competitor's pricing page
Returns confidently wrong information because it's reasoning from stale knowledge

Neither outcome is acceptable in a production tool. Users stop trusting agents that get basic factual questions wrong.

Architecture: Memory + Knowledge

The solution is a two-store architecture:

User Message
     │
     ▼
  Agent Core
     │
     ├──► Memory Store (past interactions)
     │         └── Returns: conversation history, user preferences, prior conclusions
     │
     └──► Knowledge Store (web content)
               └── Returns: current web content, indexed pages, fresh extractions
     │
     ▼
  LLM Call (system + memory + knowledge + user message)
     │
     ▼
  Response → stored back to Memory Store

The Memory Store handles persistence: what has this user asked before, what conclusions were reached, what preferences have they expressed. This can be a simple database table for small-scale apps, or a purpose-built memory layer like Mem0 or Zep for production.

The Knowledge Store handles web currency: what does the web say right now about topics relevant to this query. This is where KnowledgeSDK lives.

Setting Up Web Knowledge with KnowledgeSDK

Before your agent can search web knowledge, you need to index it. The pattern is: identify the URLs relevant to your use case, extract them once, then set up refresh logic.

import KnowledgeSDK from "@knowledgesdk/node";

const ks = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY! });

// Index your knowledge base upfront
const competitorUrls = [
  "https://competitor-a.com/pricing",
  "https://competitor-a.com/features",
  "https://competitor-b.com/pricing",
  "https://competitor-b.com/changelog",
];

for (const url of competitorUrls) {
  const result = await ks.extract({ url });
  console.log(`Indexed: ${url} → ${result.title}`);
}

POST /v1/extract does the heavy lifting: renders JavaScript, extracts clean text, chunks it, embeds it, and indexes it. The content is immediately searchable via POST /v1/search.

Building the Stateful Agent

Here's a complete Node.js implementation of a stateful research agent with both memory and web knowledge:

import KnowledgeSDK from "@knowledgesdk/node";
import OpenAI from "openai";

const ks = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY! });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

interface Message {
  role: "user" | "assistant" | "system";
  content: string;
}

// In-memory store for this example; use a DB in production
const sessionHistory: Record<string, Message[]> = {};

async function agentTurn(
  sessionId: string,
  userMessage: string
): Promise<string> {
  // 1. Load conversation history for this session
  if (!sessionHistory[sessionId]) {
    sessionHistory[sessionId] = [];
  }
  const history = sessionHistory[sessionId];

  // 2. Query the knowledge store for relevant web content
  const knowledgeResults = await ks.search({
    query: userMessage,
    limit: 3,
  });

  const webContext = knowledgeResults.results
    .map((r) => `### ${r.title}\nSource: ${r.url}\n\n${r.content}`)
    .join("\n\n---\n\n");

  // 3. Build the message array for the LLM
  const messages: Message[] = [
    {
      role: "system",
      content: `You are a competitive intelligence assistant.
You have access to live web knowledge indexed from competitor sites.
Always cite the source URL when referencing web content.

## Current Web Knowledge
${webContext || "No relevant web content found for this query."}`,
    },
    ...history, // full conversation history for this session
    {
      role: "user",
      content: userMessage,
    },
  ];

  // 4. Call the LLM
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
  });

  const assistantMessage = response.choices[0].message.content ?? "";

  // 5. Persist both turns to session history
  history.push({ role: "user", content: userMessage });
  history.push({ role: "assistant", content: assistantMessage });

  return assistantMessage;
}

The key pattern: web knowledge is fetched fresh on every turn, but conversation history accumulates across turns. The agent always sees what the web says right now, plus everything that's been discussed before.

Real Use Case: Competitive Intelligence Agent

Consider a competitive intelligence agent for a SaaS company. The knowledge base is indexed with competitor pricing pages, feature comparison pages, and changelog entries.

Turn 1: "What's Acme Corp's starter plan?"

Memory: empty (first session)
Web knowledge: returns pricing page chunk → "$29/month, 5 users, 10GB storage"
Response: accurate, sourced answer

Turn 2: "How does that compare to what you found last week?"

Memory: has the previous answer in history
Web knowledge: queries again, returns current pricing
Response: "Last week you saw $29/month. Currently it still shows $29/month — no change detected."

Turn 3 (a month later, new session): "What's Acme's pricing now?"

Memory: this is a new session, so history is clean
Web knowledge: returns the latest indexed version of the pricing page
Response: accurate current answer, even if pricing changed

The memory dimension ensures continuity within a session. The knowledge dimension ensures accuracy regardless of when the session happens.

Keeping Knowledge Fresh

A static index goes stale. Competitor pricing changes. Changelogs get new entries. New features get announced.

The simplest freshness strategy is scheduled re-extraction:

import cron from "node-cron";

// Re-extract high-change pages daily
cron.schedule("0 6 * * *", async () => {
  const highChangePagesUrls = [
    "https://competitor.com/pricing",
    "https://competitor.com/changelog",
  ];

  for (const url of highChangePagesUrls) {
    await ks.extract({ url });
    console.log(`Re-indexed: ${url}`);
  }
});

For production, you want smarter freshness: webhook triggers that fire when a monitored page changes, so you re-extract only when content actually updates rather than on a blind schedule. KnowledgeSDK supports this pattern — the re-extraction call overwrites the previous indexed version, so search always returns current content.

Performance: Cache the Knowledge Queries

If your agent operates in a high-traffic environment, hitting the knowledge API on every single turn adds latency. A simple cache with a short TTL solves this:

const knowledgeCache = new Map<string, { results: any; ts: number }>();
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes

async function searchWithCache(query: string) {
  const cached = knowledgeCache.get(query);
  if (cached && Date.now() - cached.ts < CACHE_TTL) {
    return cached.results;
  }

  const results = await ks.search({ query, limit: 3 });
  knowledgeCache.set(query, { results, ts: Date.now() });
  return results;
}

Five minutes is short enough to catch most important content changes, and long enough to dramatically reduce API calls during a busy session.

Moving to Production

The in-memory session store in the example above won't survive a server restart. For production:

Memory persistence: store conversation history in Postgres or Redis, keyed by session ID
Memory scaling: use Mem0 or Zep for semantic memory compression — they summarize old turns rather than keeping them verbatim, saving context window budget
Knowledge freshness: set up scheduled re-extraction for pages that change frequently; use KnowledgeSDK webhooks for event-driven updates
Search caching: Redis with a 5-minute TTL per query string is a solid starting point

The architecture doesn't change — just swap in production-grade storage for the in-memory stubs.

Summary

Most agents are either stateful (they remember conversations) or web-aware (they can access current information), but rarely both. The agents that are genuinely useful in production are the ones that handle both dimensions well.

The pattern is straightforward: a memory store for conversation history, a knowledge store for web content, and an assembly step that pulls from both before each LLM call. KnowledgeSDK handles the knowledge store end of that equation — extract once, search many times, re-extract when things change.

Build both dimensions from the start and your agents won't just remember what users told them — they'll actually know what's true right now.

Try it now