How AI Support Agents Use Web Knowledge to Answer Any Question

Support agents that only know your FAQ hallucinate. Support agents that extract and search your entire documentation site answer correctly — every time.

There is a pattern that plays out at nearly every company that builds a support chatbot. The team exports 50 FAQ entries, indexes them in a vector database, and ships. The demo looks good — it handles the most common questions confidently.

Then users ask real questions. "Does your API support webhooks for subscription events?" "What happens to my data if I downgrade from Pro to Starter?" "Can I export my data in CSV format?" None of these are in the FAQ. The bot either hallucinates an answer or routes to a human.

The problem isn't the AI. It's the knowledge it has access to. A support agent indexed against 50 FAQ entries can only answer 50 questions reliably. Your documentation site, help center, API reference, changelog, and product pages collectively contain the answers to thousands of questions. The knowledge is already written down. The bot just can't see it.

What to Actually Index

Most teams stop at the FAQ page because it's easy. The right strategy is to index everything a support agent might need to answer a customer question:

Documentation site — the full /docs tree, including every sub-page
Help center / knowledge base — if you have a separate help.yourproduct.com
API reference — for developer-facing products, API docs are where the technical questions live
Product pages — feature descriptions, plan comparisons, pricing details
Changelog — customers ask "when was X feature added?" and "does this work with the latest version?"
Blog posts for use cases — "can I use your product for X?" is often answered in a use-case blog post

The guiding principle: index anything that would appear on a well-structured FAQ if someone had time to write it all down. Which they don't — which is why the full site exists.

Architecture

The flow for a well-built support agent:

User submits a support question
Semantic search over indexed knowledge returns the most relevant chunks
If top result score exceeds the confidence threshold, generate an answer with citations
If no good match, route to a human agent with the searched context as background
When the human resolves the ticket, optionally add the Q&A pair to the index

The escalation step is critical. A support agent that says "I don't have reliable information about that — let me connect you with someone" is far better than one that answers confidently and incorrectly.

Step 1: Extract All Docs Pages

const ks = require('@knowledgesdk/node');
const client = new ks.KnowledgeSDK({ apiKey: 'knowledgesdk_live_...' });

// Index the full documentation site
const docsSitemap = await client.sitemap({ url: 'https://docs.yourproduct.com' });

for (const url of docsSitemap.urls) {
  await client.extractAsync({ url });
}

console.log(`Indexing ${docsSitemap.urls.length} documentation pages`);

// Also index the help center if separate
const helpSitemap = await client.sitemap({ url: 'https://help.yourproduct.com' });

for (const url of helpSitemap.urls) {
  await client.extractAsync({ url });
}

// Index product and pricing pages
const productUrls = [
  'https://yourproduct.com/pricing',
  'https://yourproduct.com/features',
  'https://yourproduct.com/changelog',
];

for (const url of productUrls) {
  await client.extract({ url });
}

For a typical SaaS product with 200–500 documentation pages, this initial indexing completes in 10–20 minutes. After that, search queries run in under 300ms.

Step 2: The Support Bot Handler

const express = require('express');
const OpenAI = require('openai');
const ks = require('@knowledgesdk/node');

const app = express();
app.use(express.json());

const client = new ks.KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const CONFIDENCE_THRESHOLD = 0.70; // Route to human below this score

app.post('/support/ask', async (req, res) => {
  const { question, sessionId } = req.body;

  // Search indexed knowledge
  const searchResults = await client.search({
    query: question,
    limit: 5,
  });

  const topScore = searchResults.results[0]?.score ?? 0;

  // Escalate if no good match found
  if (topScore < CONFIDENCE_THRESHOLD) {
    return res.json({
      type: 'escalate',
      message: "I don't have reliable information about that in our documentation. Let me connect you with a support agent who can help.",
      agentContext: {
        question,
        searchedQuery: question,
        topScore,
        topResults: searchResults.results.slice(0, 3),
      },
    });
  }

  // Build context with source attribution
  const context = searchResults.results
    .filter(r => r.score >= CONFIDENCE_THRESHOLD)
    .map(r => `[Source: ${r.source_url}]\n${r.content}`)
    .join('\n\n---\n\n');

  const citations = searchResults.results
    .filter(r => r.score >= CONFIDENCE_THRESHOLD)
    .map(r => ({ url: r.source_url, title: r.title }));

  // Generate answer
  const completion = await openai.chat.completions.create({
    model: 'gpt-4o',
    temperature: 0.1,
    messages: [
      {
        role: 'system',
        content: `You are a helpful support agent for ${process.env.PRODUCT_NAME}. 
Answer questions using ONLY the provided documentation excerpts.
If the documentation doesn't cover the question, say you don't have that information and offer to connect the user with a support agent.
Always cite which documentation page supports your answer using [Source: URL] format.
Be concise and specific. Do not add information not present in the sources.`,
      },
      {
        role: 'user',
        content: `Customer question: ${question}\n\nRelevant documentation:\n${context}`,
      },
    ],
  });

  return res.json({
    type: 'answer',
    answer: completion.choices[0].message.content,
    citations,
    confidence: topScore,
  });
});

app.listen(3000);

The Escalation Logic

The confidence threshold of 0.70 is a starting point. Tune it based on your domain:

For technical products where wrong answers are costly (wrong API usage, data loss), raise the threshold to 0.80
For simple consumer products where questions are predictable, 0.65 may be sufficient
After launch, track escalation rate — if it's above 30%, your index is missing content; if it's below 5%, your threshold may be too low

When you escalate, pass the full context to the human agent. The agent receives: the customer's question, the top search results that were found (even if insufficient), and the confidence score. They can see what the bot found and either supplement it or correct the knowledge gap by adding new documentation.

The Citation Pattern

Every answer should include citations. This serves two purposes:

For users: They can verify the answer by reading the source. A user who gets an answer like "Yes, you can export data as CSV from the Settings → Data Export page [Source: docs.yourproduct.com/settings/data-export]" can click through and confirm. This builds trust in the bot's answers.

For your team: When a user disputes an answer, you can trace it to the exact documentation chunk that generated it. If the documentation was wrong or ambiguous, that's an easy fix.

Format citations as a list below the answer:

Answer: Yes, webhooks are supported for subscription events. You can configure them in Settings → Webhooks, and they'll fire on plan upgrades, downgrades, and cancellations.

Sources:
- Webhooks Configuration Guide (docs.yourproduct.com/webhooks)
- Subscription Events Reference (docs.yourproduct.com/api/subscription-events)

Keeping Knowledge Fresh

Documentation changes with every product update. New features launch. Pricing changes. API endpoints get deprecated. A support bot running on stale docs gives confidently wrong answers about features that now work differently.

Two mechanisms to keep your index current:

Webhook-triggered re-extraction: Subscribe to publish events from your CMS or documentation platform. When a docs page is updated, re-extract it immediately.

app.post('/webhooks/docs-updated', async (req, res) => {
  const { url, updatedAt } = req.body;

  // Re-extract the updated page
  await client.extract({ url });
  console.log(`Re-indexed: ${url} (updated ${updatedAt})`);

  res.sendStatus(200);
});

Scheduled weekly re-index: For pages not covered by webhooks (third-party help centers, external docs), run a weekly re-index job. Most documentation doesn't change daily; weekly freshness is adequate for all but the fastest-moving products.

The Metrics That Actually Matter

The metrics to watch for a support agent with web knowledge:

First-contact resolution rate: Percentage of tickets resolved without human escalation. This is the primary success metric.
Deflection rate: Percentage of support volume handled by the bot vs. routed to humans. A well-indexed bot should deflect 60–80% of routine questions.
CSAT for bot-resolved tickets: If users rate bot answers lower than human answers, investigate: are citations wrong? Is the confidence threshold too low?
Escalation reasons: Track what categories of questions escalate. If "billing questions" always escalate, add your billing FAQ and pricing docs to the index.

Common Mistakes

Indexing only the FAQ page: The FAQ contains 50 questions. Your docs site contains answers to 5,000 questions. Index everything.

Not indexing the changelog: Users frequently ask "does X work with version Y?" or "when was Z feature added?" The changelog contains these answers. Index it.

No confidence threshold: A bot that always answers, even when it shouldn't, damages trust faster than a bot that sometimes says "I don't know." Set a threshold and route low-confidence queries to humans.

Not passing context to human agents: When a ticket escalates, the human agent needs context. They should see what the bot searched for and what it found (or didn't find). This makes handoffs faster and also surfaces documentation gaps — if the same question escalates repeatedly because the index has no good answer, that's a signal to write the missing documentation.

The investment in indexing your full documentation site pays back in the first week. Support volume drops. Response times drop. Human agents spend time on genuinely complex tickets rather than answering "how do I reset my password?" for the hundredth time.

The knowledge was already there. The bot just needed access to it.

Try it now