Context Engineering with Live Web Data: Keep Your AI Agents Current

Context engineering is the defining AI skill of 2026. Learn how to pipe live web data into agent context using KnowledgeSDK — just-in-time scraping, webhooks, and temporal metadata.

Context Engineering with Live Web Data: Keep Your AI Agents Current

The most important shift in AI engineering in early 2026 was not a new model release. It was a conceptual reframe: the recognition that what you put in the context window matters more than which model you use.

Anthropic called this "context engineering" in their Q1 2026 blog post — the discipline of constructing context windows that give models the right information, in the right format, at the right level of granularity. The framing resonated immediately with developers who had been noticing the same thing empirically: swapping from GPT-4o to Claude 3.5 Sonnet gave marginal improvements, but improving the quality of the retrieved context transformed output quality.

Context engineering encompasses many things: how you chunk documents, how you rank retrieved results, how you format system prompts, how you structure conversation history. But one dimension that is often underweighted is freshness.

A context window filled with stale information degrades answer quality in ways that feel mysterious until you identify the root cause. An AI agent that confidently answers based on web data from six months ago is not less intelligent — it is working with bad inputs.

This article covers the patterns for piping live web data into agent context windows using KnowledgeSDK, and how to make freshness a first-class property of your context engineering practice.

The Freshness Problem in AI Contexts

Language models have a training cutoff. This is widely understood. Less understood is that RAG systems and retrieval pipelines have their own effective knowledge cutoffs — determined not by the model's training date, but by the last time your retrieval index was updated.

Consider a customer support agent backed by a documentation RAG system. If the documentation was indexed three months ago, the agent might:

Describe API endpoints that have since been deprecated
Quote pricing that has changed
Reference features that were renamed in a recent release
Omit new features that were launched after indexing

From the user's perspective, the agent sounds authoritative — it is citing real documentation. But the documentation it is citing is stale. The agent is confidently wrong.

This is a context engineering failure. The information in the context window does not reflect current reality.

Freshness is a dimension of context quality. A retrieval system that returns highly relevant but six-month-old content is failing in a way that high relevance scores will not reveal.

Three Patterns for Live Web Data in Agent Context

There are three distinct patterns for incorporating live web data into agent contexts, each with different latency, freshness, and complexity trade-offs.

Pattern 1: Just-in-Time Scraping

Fetch web content at query time, immediately before generating the response. The context always reflects the current state of the page.

User Query
    │
    ▼
Determine relevant URLs
    │
    ▼
Extract URLs (KnowledgeSDK /v1/extract)
    │
    ▼
Inject fresh content into context window
    │
    ▼
LLM generates response

Trade-offs:

Maximum freshness (content is seconds old)
Adds 1–3 seconds of latency per page
Higher API cost per query (every query triggers scraping)
Works for any URL — no pre-indexing required

# Python — Just-in-time scraping pattern
import os
from knowledgesdk import KnowledgeSDK
from openai import OpenAI

client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])
openai = OpenAI(api_key=os.environ["OPENAI_API_KEY"])


def answer_with_live_context(question: str, source_urls: list[str]) -> str:
    """Answer a question using live-scraped web content."""

    # Fetch current content from each URL
    context_parts = []
    for url in source_urls:
        result = client.scrape(url=url)
        scraped_at = result.metadata.get("scraped_at", "unknown time")
        context_parts.append(
            f"[Fetched at {scraped_at}]\nSource: {url}\n\n{result.markdown[:2500]}"
        )

    context = "\n\n---\n\n".join(context_parts)

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a research assistant. Answer using the live web content below.
All content was fetched within the last few seconds and reflects the current state of these pages.

{context}""",
            },
            {"role": "user", "content": question},
        ],
    )

    return response.choices[0].message.content


# Example: Live competitor pricing check
answer = answer_with_live_context(
    question="What is OpenAI's current pricing for GPT-4o?",
    source_urls=["https://openai.com/pricing"],
)
print(answer)

// TypeScript — Just-in-time scraping in a Next.js API route
import { KnowledgeSDK } from "@knowledgesdk/node";
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY! });

export async function POST(req: Request) {
  const { question, sourceUrls } = await req.json();

  // Fetch all URLs in parallel for minimum latency
  const scraped = await Promise.all(
    (sourceUrls as string[]).map(async (url) => {
      const result = await client.scrape({ url });
      return `[Fetched: ${new Date().toISOString()}]\nSource: ${url}\n\n${result.markdown?.slice(0, 2500)}`;
    })
  );

  const context = scraped.join("\n\n---\n\n");

  // Stream the response for better perceived latency
  const result = streamText({
    model: openai("gpt-4o"),
    system: `You are a research assistant answering questions from live web content.
All content was fetched within the last few seconds.

${context}`,
    prompt: question,
  });

  return result.toDataStreamResponse();
}

Pattern 2: Background Refresh with Webhooks

Pre-index your source URLs, then use webhooks to re-index whenever the content changes. Queries hit the index (fast) rather than scraping live (slow).

[Setup]
Source URLs → Index (fast retrieval, fresh enough)
     │
     │  Content changes on source site
     ▼
Webhook fires → Re-index changed page
     │
[Query]
User Query → Search Index → Fresh Content → LLM Response

Trade-offs:

Near-real-time freshness (minutes after source changes)
Sub-200ms retrieval latency at query time
Lower cost per query (scraping happens only when content changes)
Requires webhook setup and a receive endpoint

# Python — Set up background refresh for monitored pages
from knowledgesdk import KnowledgeSDK
from flask import Flask, request

client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])
app = Flask(__name__)

# Initial indexing of source pages
MONITORED_URLS = [
    "https://openai.com/pricing",
    "https://anthropic.com/pricing",
    "https://stripe.com/pricing",
    "https://vercel.com/pricing",
]

def initial_index():
    """Index all monitored pages on startup."""
    for url in MONITORED_URLS:
        result = client.extract(url=url)
        print(f"Indexed: {url}")

    # Set up webhook to detect future changes
    webhook = client.webhooks.create(
        url="https://your-app.com/webhooks/content-changed",
        events=["page.changed"],
        urls=MONITORED_URLS,
    )
    print(f"Monitoring {len(MONITORED_URLS)} pages for changes (webhook: {webhook.id})")


@app.route("/webhooks/content-changed", methods=["POST"])
def handle_content_change():
    """Re-index a page when KnowledgeSDK detects it changed."""
    payload = request.json
    changed_url = payload["url"]
    change_type = payload.get("change_type", "unknown")

    print(f"Content changed [{change_type}]: {changed_url}")

    # Re-extract and re-index the changed page
    result = client.extract(url=changed_url)
    print(f"Re-indexed: {changed_url}")

    # Optionally: invalidate any caches that contained this content
    # invalidate_cache(changed_url)

    return {"status": "ok"}


def answer_from_index(question: str) -> str:
    """Answer using the pre-indexed (and auto-refreshed) knowledge."""
    results = client.search(query=question, limit=5)

    context = "\n\n---\n\n".join(
        f"Source: {r.url}\nIndexed: {r.metadata.get('indexed_at', 'unknown')}\n\n{r.content}"
        for r in results.results
    )

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"Answer using the indexed content below:\n\n{context}",
            },
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content

Pattern 3: Temporal Metadata Injection

Annotate every piece of retrieved content with its freshness metadata, and inject this metadata into the system prompt so the model can reason about data age and flag when information might be outdated.

This pattern works with both just-in-time scraping and pre-indexed content. The key addition is making the model aware of how old its context is.

# Python — Temporal metadata injection
from datetime import datetime, timezone
from typing import Optional
import json


def format_context_with_temporal_metadata(
    content: str,
    source_url: str,
    scraped_at: Optional[str] = None,
    published_at: Optional[str] = None,
) -> str:
    """Format content with temporal metadata for context injection."""

    now = datetime.now(timezone.utc).isoformat()
    metadata = {
        "source": source_url,
        "retrieved_at": scraped_at or now,
        "published_at": published_at,
        "age_warning": None,
    }

    # Calculate content age and add warning if stale
    if scraped_at:
        scraped_dt = datetime.fromisoformat(scraped_at.replace("Z", "+00:00"))
        age_hours = (datetime.now(timezone.utc) - scraped_dt).total_seconds() / 3600

        if age_hours > 168:  # Older than 1 week
            metadata["age_warning"] = f"CAUTION: This content is {int(age_hours/24)} days old and may be outdated."
        elif age_hours > 24:  # Older than 1 day
            metadata["age_warning"] = f"NOTE: This content is {int(age_hours)} hours old."

    metadata_block = json.dumps(metadata, indent=2)
    return f"```metadata\n{metadata_block}\n```\n\n{content}"


def answer_with_temporal_awareness(question: str, source_urls: list[str]) -> str:
    """Answer with explicit temporal awareness in the context."""

    context_parts = []
    for url in source_urls:
        result = client.scrape(url=url)
        formatted = format_context_with_temporal_metadata(
            content=result.markdown[:2500],
            source_url=url,
            scraped_at=result.metadata.get("scraped_at"),
            published_at=result.metadata.get("published_at"),
        )
        context_parts.append(formatted)

    context = "\n\n---\n\n".join(context_parts)

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a research assistant with access to web content.
Pay attention to the metadata blocks — they tell you when content was retrieved and whether it might be stale.
If content has an age_warning, mention this caveat in your answer.
If you are uncertain whether information is current, say so.

Web Content:
{context}""",
            },
            {"role": "user", "content": question},
        ],
    )

    return response.choices[0].message.content

// TypeScript — Temporal metadata injection in TypeScript
interface ContentWithMetadata {
  source: string;
  retrievedAt: string;
  publishedAt?: string;
  ageWarningHours?: number;
  content: string;
}

function formatWithTemporalMetadata(item: ContentWithMetadata): string {
  const ageMs = Date.now() - new Date(item.retrievedAt).getTime();
  const ageHours = ageMs / (1000 * 60 * 60);

  let ageNote = "";
  if (ageHours > 168) {
    ageNote = `CAUTION: ${Math.floor(ageHours / 24)}-day-old content — may be outdated.`;
  } else if (ageHours > 24) {
    ageNote = `NOTE: ${Math.floor(ageHours)}-hour-old content.`;
  } else {
    ageNote = "Content is fresh (retrieved within the last 24 hours).";
  }

  return `[Source: ${item.source}]
[Retrieved: ${item.retrievedAt}]
[Status: ${ageNote}]

${item.content}`;
}

async function answerWithTemporalAwareness(
  question: string,
  sourceUrls: string[]
): Promise<string> {
  const scraped = await Promise.all(
    sourceUrls.map(async (url) => {
      const result = await client.scrape({ url });
      return formatWithTemporalMetadata({
        source: url,
        retrievedAt: new Date().toISOString(),
        content: result.markdown?.slice(0, 2500) ?? "",
      });
    })
  );

  const context = scraped.join("\n\n---\n\n");

  const { text } = await generateText({
    model: openai("gpt-4o"),
    system: `You are a research assistant. The content below includes temporal metadata.
If content has a CAUTION or NOTE label, acknowledge the potential staleness in your answer.

${context}`,
    prompt: question,
  });

  return text;
}

Choosing Between the Three Patterns

Scenario	Recommended Pattern
Answering questions about any arbitrary URL	Just-in-time scraping
Competitor or pricing monitoring	Background refresh + webhooks
Documentation assistant	Background refresh + webhooks
News aggregation agent	Just-in-time scraping
Research agent with unpredictable sources	Just-in-time scraping
Internal knowledge base	Background refresh + webhooks
Financial data (requires high freshness)	Just-in-time scraping
Large corpus, cost-sensitive	Background refresh + webhooks

Most production systems combine all three: background refresh for known high-value sources, just-in-time scraping for user-provided URLs, and temporal metadata injection throughout to keep the model honest about data age.

Why Context Engineering Matters More Than Model Selection

A concrete illustration: consider an AI agent that answers questions about competitor pricing. Two configurations:

Configuration A: GPT-4o with pricing data indexed four months ago Configuration B: GPT-4o mini with pricing data scraped in real time

Configuration B produces more accurate, trustworthy answers — despite using a cheaper model. The model processes the context; the context determines the answer. Fresh, relevant context from a smaller model beats stale context from a frontier model.

This is the core insight of context engineering applied to web data. The model is not the bottleneck for data-intensive AI applications. The retrieval pipeline is.

Improving context quality has higher ROI than upgrading models for most production AI use cases. And web data freshness is one of the highest-leverage dimensions of context quality.

Building a Context Quality Score

A practical addition to any retrieval pipeline is a context quality score — a lightweight metric that tells you how confident you should be in a response before it reaches the user.

# Python — Context quality scoring
from dataclasses import dataclass
from datetime import datetime, timezone


@dataclass
class ContextQualityScore:
    relevance_score: float  # 0-1: semantic similarity to query
    freshness_score: float  # 0-1: how recent the content is
    source_count: int       # number of distinct sources
    overall_score: float    # weighted combination


def score_context_quality(
    query: str,
    retrieved_results: list,
    max_age_hours: float = 24.0,
) -> ContextQualityScore:
    """Score the quality of a retrieved context."""

    if not retrieved_results:
        return ContextQualityScore(0, 0, 0, 0)

    # Relevance: average of semantic similarity scores
    relevance_scores = [r.score for r in retrieved_results if hasattr(r, "score")]
    avg_relevance = sum(relevance_scores) / len(relevance_scores) if relevance_scores else 0.5

    # Freshness: how recent is the content?
    freshness_scores = []
    for result in retrieved_results:
        indexed_at = result.metadata.get("indexed_at")
        if indexed_at:
            age_hours = (
                datetime.now(timezone.utc)
                - datetime.fromisoformat(indexed_at.replace("Z", "+00:00"))
            ).total_seconds() / 3600
            # Score decays from 1.0 (fresh) to 0.0 (max_age_hours old)
            freshness = max(0, 1 - (age_hours / max_age_hours))
            freshness_scores.append(freshness)

    avg_freshness = sum(freshness_scores) / len(freshness_scores) if freshness_scores else 0.5

    # Overall: weight relevance higher than freshness for most use cases
    overall = (avg_relevance * 0.6) + (avg_freshness * 0.4)

    return ContextQualityScore(
        relevance_score=avg_relevance,
        freshness_score=avg_freshness,
        source_count=len(retrieved_results),
        overall_score=overall,
    )


def answer_with_quality_gate(question: str, min_quality: float = 0.6) -> dict:
    """Answer only if context quality meets the threshold."""
    results = client.search(query=question, limit=5)
    quality = score_context_quality(question, results.results)

    if quality.overall_score < min_quality:
        # Fall back to just-in-time scraping for fresher context
        # or flag for human review
        return {
            "answer": None,
            "quality_score": quality.overall_score,
            "needs_refresh": True,
            "reason": f"Context quality {quality.overall_score:.2f} below threshold {min_quality}",
        }

    context = "\n\n".join(r.content for r in results.results)
    answer = generate_answer(question, context)

    return {
        "answer": answer,
        "quality_score": quality.overall_score,
        "needs_refresh": False,
        "freshness_score": quality.freshness_score,
    }

Context Engineering as a Discipline

The patterns in this article are part of a broader shift in how AI systems are built. In 2024, the conversation was about prompt engineering — how to phrase instructions to get better outputs. In 2025, it shifted to RAG architecture — how to retrieve and inject relevant documents.

In 2026, context engineering is the synthesis: treating the entire context window as a design surface. What information should be in it? How fresh should it be? How should it be formatted? How much should you inject before it becomes noise?

Web data introduces a unique constraint: the source of truth is not under your control and changes without notice. Pricing pages update. Documentation changes. Blog posts get edited. API behaviors change.

Building systems that treat freshness as a first-class property — using just-in-time scraping for high-stakes queries, background refresh for monitored sources, and temporal metadata to make the model aware of its own uncertainty — is what separates production-grade AI agents from demo prototypes.

Start piping live web data into your agent context — get started free at knowledgesdk.com

Try it now