AI Personalization Without Fine-Tuning: Live Web Data as User Context

Fine-tuning is expensive and stale. The fastest path to personalized AI is injecting the right web context at query time. Here's how to build it.

The most common complaint about AI assistants in business software is that they're generic. They don't know your industry. They don't know your company's products. They don't know your competitors. They give the same answer to a SaaS founder as they give to a brick-and-mortar retailer, because they have no context about either.

The standard solution people reach for is fine-tuning: train the model on your specific data and it'll be more relevant. Fine-tuning works — but it's expensive (thousands of dollars in compute), slow (days to weeks for a training run), and stale the moment the web changes. A fine-tuned model trained on competitor data from six months ago is already outdated.

There's a faster, cheaper, and more accurate approach: dynamic web context injection. Instead of baking knowledge into the model weights, you fetch the right web content at query time and inject it into the prompt.

Three Approaches to AI Personalization

Option 1: Fine-tuning — train a model on your domain data. Pro: model "knows" your domain without explicit prompting. Con: expensive ($1,000–$10,000+ per run), takes days, becomes stale immediately, requires significant labeled data, and doesn't adapt to changing web content without re-training.

Option 2: Long system prompts — write a detailed system prompt that describes your user's industry, company, products, and competitors. Pro: cheap and immediate. Con: manually maintained (someone has to update it), token-heavy (more tokens = higher cost + longer latency), and doesn't scale when you have thousands of different user contexts.

Option 3: Dynamic web context — at query time, search a pre-indexed collection of relevant websites for the user's context and inject the top results into the prompt. Pro: always fresh (re-index nightly or on-change), cost-efficient (extraction is cents per URL, search is milliseconds), scales to thousands of user contexts without custom prompts.

Option 3 is almost always the right starting point. Fine-tuning makes sense only when you have a very stable domain with a massive amount of training data and extremely high volume at inference time.

What Dynamic Web Context Looks Like

The mental model: every user has a private collection of websites that define their context. When they ask a question, you search their collection first and inject the relevant results into the system prompt.

For a sales rep using an AI assistant:

Their collection includes: their company's product pages, the prospect's website, the prospect's recent press releases, the competitor the prospect mentioned.
Query: "How should I position our product against [Competitor] for this prospect?"
The AI sees the prospect's business focus, the competitor's feature list, and your product's differentiators — all live, not baked-in.

For an e-commerce operator:

Their collection includes: their product catalog pages, three competitor storefronts, relevant industry news.
Query: "Is our pricing competitive for waterproof hiking boots?"
The AI sees current competitor pricing, not pricing from six months ago.

Implementation

The implementation has three steps: extract relevant URLs when a user onboards, search their collection on every query, and inject the top results into the system prompt.

Step 1: Extract Relevant URLs on User Onboarding

When a new user signs up or adds context, extract their relevant URLs into a private collection:

import KnowledgeSDK from "@knowledgesdk/node";

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY! });

async function onboardUser(userId: string, urls: string[]): Promise<void> {
  console.log(`Extracting ${urls.length} URLs for user ${userId}...`);

  for (const url of urls) {
    await client.extract({
      url,
      metadata: { userId, addedAt: new Date().toISOString() },
    });
    console.log(`Indexed: ${url}`);
  }

  console.log(`User ${userId} context ready.`);
}

// When a sales rep adds a prospect
await onboardUser("user_abc", [
  "https://prospect.com",
  "https://prospect.com/about",
  "https://news.ycombinator.com/item?id=prospect-funding-news",
  "https://competitor.com/pricing",
]);

Each URL takes a few seconds to extract and is immediately searchable. No infrastructure to set up, no scraping code to write.

Step 2: Search on Every Query

Before calling the LLM, search the user's collection:

async function getWebContext(userId: string, query: string): Promise<string> {
  const response = await fetch("https://api.knowledgesdk.com/v1/search", {
    method: "POST",
    headers: {
      "x-api-key": process.env.KNOWLEDGESDK_API_KEY!,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      query,
      limit: 3,
      filter: { userId }, // search only this user's collection
    }),
  });

  const { results } = await response.json() as { results: Array<{ title: string; content: string; url: string }> };

  if (results.length === 0) return "";

  return results
    .map((r) => `[Source: ${r.url}]\n${r.content}`)
    .join("\n\n---\n\n");
}

Step 3: Inject Into the System Prompt

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic();

async function personalizedAnswer(
  userId: string,
  userQuery: string
): Promise<string> {
  // Retrieve web context (fast: ~50-100ms)
  const webContext = await getWebContext(userId, userQuery);

  const systemPrompt = webContext
    ? `You are a helpful AI assistant. Answer based on the user's specific context below.
If the context doesn't cover the question, say so clearly.

CONTEXT FROM USER'S WEB SOURCES:
${webContext}`
    : "You are a helpful AI assistant.";

  const response = await anthropic.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    system: systemPrompt,
    messages: [{ role: "user", content: userQuery }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

The LLM receives current, user-specific web context on every query. No fine-tuning required.

Keeping It Fresh

Web content changes. A competitor updates their pricing page. A prospect announces a new funding round. A product releases a new version.

Two strategies for freshness:

Scheduled re-extraction: Run a nightly job that re-extracts each URL in every user's collection. With KnowledgeSDK's async extraction (POST /v1/extract/async), you can queue hundreds of re-extractions efficiently:

async function nightlyRefresh(userUrls: Map<string, string[]>): Promise<void> {
  const jobs: string[] = [];

  for (const [userId, urls] of userUrls) {
    for (const url of urls) {
      const response = await fetch("https://api.knowledgesdk.com/v1/extract/async", {
        method: "POST",
        headers: { "x-api-key": process.env.KNOWLEDGESDK_API_KEY! },
        body: JSON.stringify({ url, metadata: { userId } }),
      });
      const { jobId } = await response.json() as { jobId: string };
      jobs.push(jobId);
    }
  }

  console.log(`Queued ${jobs.length} re-extractions for nightly refresh.`);
}

Webhook-based re-extraction: For high-priority URLs (competitor pricing pages, news feeds), configure a webhook that triggers re-extraction when the content changes. This gives you near-real-time freshness for the URLs that matter most.

Cost Comparison

Let's be concrete about costs. Fine-tuning GPT-4o-mini on a custom dataset:

Data preparation: significant engineering time
Training run: $1,000–$5,000 depending on dataset size
Re-training when data is stale: repeat cost every 1–3 months
Total first year: $5,000–$20,000+ for a well-maintained fine-tune

Dynamic web context with KnowledgeSDK:

Initial extraction: ~$0.01–0.05 per URL (extraction cost)
Storage: negligible
Search per query: sub-cent
Monthly re-extraction: same as initial, amortized over the month
Total for 1,000 users with 10 URLs each, re-extracted weekly: roughly $40/month in extraction costs

For most applications, dynamic web context is 100x cheaper than fine-tuning and stays current automatically.

Scaling Considerations

Small teams (under 1,000 users): One collection per user works well. Metadata filtering in /v1/search keeps search scoped to each user's context.

Mid-scale (1,000–50,000 users): Consider grouping users by segment (industry, company size, product tier) and maintaining shared collections per segment, with individual user collections for personalization on top. This reduces the number of unique URLs you need to extract and monitor.

Enterprise (50,000+ users): Shift to a single global collection with rich metadata tagging (industry, company, competitor, etc.). Search filters at query time scope results to the relevant context. This is more complex to maintain but dramatically reduces extraction costs at scale.

The Personalization Stack

Dynamic web context gives you the foundation for AI personalization that's:

Always fresh — web context is re-extracted on your schedule, not frozen at training time
User-specific — each user's collection is scoped to their context
Cost-effective — extraction costs cents; no $10,000 training runs
Incrementally improvable — add more URLs, refine search, adjust injection — no retraining required

Fine-tuning still makes sense in specific cases: extremely high-volume, stable domains where per-query latency savings matter, or where the domain is so specialized that the base model performs poorly even with good context. But for the vast majority of "make my AI relevant to my business" use cases, dynamic web context is the right starting point.

Your users don't need a model trained on their industry. They need a model that can see their industry, right now.

Try it now