Build an MCP Knowledge Server with KnowledgeSDK
Model Context Protocol (MCP) has become the standard way to extend AI clients — Claude, Cursor, Windsurf, and others — with custom tools and data sources. Instead of copy-pasting documentation into every chat, you define a server that your AI client can query on demand.
This tutorial walks you through building an MCP server that gives your AI client access to a live, searchable web knowledge base. By the end, Claude Code or Cursor will be able to search indexed web content — competitor docs, API references, changelog entries — without you manually pasting anything.
What MCP Is
MCP (Model Context Protocol) is an open standard, originally developed by Anthropic, that defines how AI assistants communicate with external tools and data sources. It's a client-server protocol: the AI client (Claude, Cursor) sends requests to an MCP server you control, and the server responds with data or action results.
The client doesn't need to know how your server works internally. It just knows which tools are available and what parameters they accept. When the model decides it needs information from your knowledge base, it calls your tool — your server handles the rest.
Why Combine MCP with Web Knowledge
Here's the problem MCP + web knowledge solves: AI coding assistants are great at reasoning, but they only know what's in their context window. If you're building against a third-party API, your agent doesn't automatically know what's in the latest docs. If you're doing competitive analysis, your agent doesn't know what the competitor's site says today.
The old solution: paste the docs in yourself every time. The better solution: an MCP server that fetches and searches that content on demand.
With this setup, Claude Code can call search_knowledge("authentication headers") and immediately get current, accurate content from your indexed knowledge base — without you lifting a finger.
Architecture
Claude / Cursor
│
│ MCP Protocol (stdio or SSE)
▼
Your MCP Server (Node.js)
│
│ HTTPS API calls
▼
KnowledgeSDK API
│
├── POST /v1/extract → scrape + index URL
└── POST /v1/search → hybrid search over indexed content
Your MCP server is the translation layer. It receives tool calls from Claude, translates them into KnowledgeSDK API calls, and returns the results in MCP format.
Step 1: Project Setup
Create a new directory and initialize the project:
mkdir mcp-knowledge-server
cd mcp-knowledge-server
npm init -y
npm install @modelcontextprotocol/sdk @knowledgesdk/node
npm install -D typescript @types/node tsx
Create a tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"outDir": "dist",
"strict": true,
"esModuleInterop": true
},
"include": ["src/**/*"]
}
Add to package.json:
{
"scripts": {
"build": "tsc",
"start": "node dist/index.js",
"dev": "tsx src/index.ts"
}
}
Step 2: Build the MCP Server
Create src/index.ts:
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import KnowledgeSDK from "@knowledgesdk/node";
const ks = new KnowledgeSDK({
apiKey: process.env.KNOWLEDGESDK_API_KEY!,
});
const server = new Server(
{ name: "knowledge-server", version: "1.0.0" },
{ capabilities: { tools: {} } }
);
// Register available tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
{
name: "extract_url",
description:
"Extract and index a URL into the knowledge base. " +
"Use this to add new web pages, documentation, or articles. " +
"The content will be immediately searchable after extraction.",
inputSchema: {
type: "object",
properties: {
url: {
type: "string",
description: "The URL to extract and index",
},
},
required: ["url"],
},
},
{
name: "search_knowledge",
description:
"Search the indexed knowledge base for relevant content. " +
"Uses hybrid semantic + keyword search. " +
"Returns the most relevant chunks from previously extracted URLs.",
inputSchema: {
type: "object",
properties: {
query: {
type: "string",
description: "The search query",
},
limit: {
type: "number",
description: "Number of results to return (default: 3, max: 10)",
},
},
required: ["query"],
},
},
{
name: "scrape_url",
description:
"Scrape a URL and return its content as markdown, without indexing it. " +
"Use this for one-off content retrieval when you don't need to persist the knowledge.",
inputSchema: {
type: "object",
properties: {
url: {
type: "string",
description: "The URL to scrape",
},
},
required: ["url"],
},
},
],
};
});
// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params;
try {
if (name === "extract_url") {
const { url } = args as { url: string };
const result = await ks.extract({ url });
return {
content: [
{
type: "text",
text: `Successfully extracted and indexed:\n- Title: ${result.title}\n- URL: ${url}\n- Status: indexed and searchable`,
},
],
};
}
if (name === "search_knowledge") {
const { query, limit = 3 } = args as { query: string; limit?: number };
const results = await ks.search({ query, limit });
if (results.results.length === 0) {
return {
content: [
{
type: "text",
text: "No results found. Try extracting relevant URLs first with the extract_url tool.",
},
],
};
}
const formatted = results.results
.map(
(r, i) =>
`## Result ${i + 1}: ${r.title}\nSource: ${r.url}\n\n${r.content}`
)
.join("\n\n---\n\n");
return {
content: [
{
type: "text",
text: `Found ${results.results.length} results:\n\n${formatted}`,
},
],
};
}
if (name === "scrape_url") {
const { url } = args as { url: string };
const result = await ks.extract({ url });
return {
content: [
{
type: "text",
text: `# ${result.title}\nSource: ${url}\n\n${result.markdown}`,
},
],
};
}
throw new Error(`Unknown tool: ${name}`);
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
return {
content: [{ type: "text", text: `Error: ${message}` }],
isError: true,
};
}
});
// Start the server
async function main() {
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("Knowledge MCP server running on stdio");
}
main().catch((error) => {
console.error("Fatal error:", error);
process.exit(1);
});
Build the server:
npm run build
Step 3: Register with Claude Code
Create a .mcp.json file in your project root (or in ~/.claude/ for global registration):
{
"mcpServers": {
"knowledge": {
"command": "node",
"args": ["/absolute/path/to/mcp-knowledge-server/dist/index.js"],
"env": {
"KNOWLEDGESDK_API_KEY": "knowledgesdk_live_your_key_here"
}
}
}
}
Restart Claude Code. The knowledge server tools will appear in the available tools list.
For Cursor, add to your MCP settings (Cursor → Settings → MCP):
{
"knowledge": {
"command": "node",
"args": ["/absolute/path/to/mcp-knowledge-server/dist/index.js"],
"env": {
"KNOWLEDGESDK_API_KEY": "knowledgesdk_live_your_key_here"
}
}
}
Step 4: Test It
Open a conversation with Claude Code and try:
Index the KnowledgeSDK API documentation at https://docs.knowledgesdk.com
Claude will call extract_url, and you'll see a confirmation that the content was indexed. Then:
Search the knowledge base for how authentication works
Claude will call search_knowledge("how authentication works") and return the relevant chunks from the indexed docs.
The workflow is seamless — Claude decides when to call the tools based on the conversation, you don't have to instruct it manually.
Advanced: Add a List Tool
Extend the server with a tool that shows what's been indexed:
// Add to the tools list:
{
name: "list_indexed_content",
description: "List recently indexed URLs in the knowledge base",
inputSchema: {
type: "object",
properties: {},
},
}
// Add to the tool handler:
if (name === "list_indexed_content") {
const items = await ks.list({ limit: 20 });
const formatted = items.results
.map((item) => `- ${item.title} (${item.url}) — indexed ${item.crawledAt}`)
.join("\n");
return {
content: [{ type: "text", text: `Indexed content:\n${formatted}` }],
};
}
Advanced: TTL-Based Re-Extraction
For content that changes frequently, add a re-extraction tool that Claude can invoke:
{
name: "refresh_url",
description: "Re-extract a URL to get the latest version of its content",
inputSchema: {
type: "object",
properties: {
url: { type: "string", description: "URL to re-extract" },
},
required: ["url"],
},
}
Or trigger re-extraction automatically on a schedule with a cron job running alongside your MCP server.
Real Use Case: Living Competitor Intelligence
Here's where this setup shines in practice. Index your top 3 competitors' pricing, features, and changelog pages:
Extract https://competitor-a.com/pricing
Extract https://competitor-a.com/features
Extract https://competitor-b.com/pricing
Extract https://competitor-b.com/changelog
Now, whenever you ask Claude Code to help you write positioning copy, draft a comparison page, or analyze your competitive gaps — it can search that knowledge base directly. The docs are always current because you're re-extracting on a schedule, not relying on training data from 18 months ago.
Your coding agent just became a well-informed product analyst.
Summary
MCP + KnowledgeSDK is a powerful combination. MCP gives AI clients a standard way to call your tools. KnowledgeSDK handles the hard part of turning arbitrary web URLs into a searchable, current knowledge base.
The server we built exposes three tools: extract a URL, search indexed content, and scrape without indexing. That covers the full lifecycle of web knowledge for an AI agent. Add it to Claude Code or Cursor once, and every future conversation has access to a live knowledge base without you pasting anything manually.