If you have built a RAG pipeline from scratch, you know the step count. A realistic implementation involves at minimum: an HTTP client to fetch the page, an HTML parser to extract content, a markdown converter, a chunking strategy, an embedding model call, a vector database write, and a search endpoint. Seven distinct operations across at least three services.
This article explains what each step does, why it exists, and how KnowledgeSDK collapses the entire pipeline into two API calls.
The Traditional RAG Pipeline
Here is what a minimal do-it-yourself pipeline looks like in practice:
import axios from "axios";
import TurndownService from "turndown";
import OpenAI from "openai";
import { PineconeClient } from "@pinecone-database/pinecone";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const pinecone = new PineconeClient();
async function indexUrl(url: string) {
// Step 1: Fetch the page
const response = await axios.get(url, {
headers: { "User-Agent": "Mozilla/5.0" },
timeout: 10000,
});
// Step 2: Strip HTML and convert to markdown
const turndown = new TurndownService();
const markdown = turndown.turndown(response.data);
// Step 3: Chunk the content
const chunks = chunkText(markdown, { maxTokens: 512, overlap: 64 });
// Step 4: Generate embeddings for each chunk
const embeddingResponse = await openai.embeddings.create({
model: "text-embedding-3-small",
input: chunks,
});
const embeddings = embeddingResponse.data.map((d) => d.embedding);
// Step 5: Store in vector database
await pinecone.index("knowledge").upsert({
vectors: chunks.map((chunk, i) => ({
id: `${url}-chunk-${i}`,
values: embeddings[i],
metadata: { url, text: chunk },
})),
});
console.log(`Indexed ${chunks.length} chunks from ${url}`);
}
function chunkText(text: string, options: { maxTokens: number; overlap: number }): string[] {
// Chunking implementation omitted for brevity — typically 40-80 lines
return [];
}
This is the simplified version. A production-quality pipeline also handles: JavaScript-rendered pages (requires a headless browser like Playwright), anti-bot bypass, retry logic, error handling for malformed HTML, token counting for accurate chunking, and rate limiting on the embedding API.
Service count: HTTP client, HTML-to-markdown converter, OpenAI API, Pinecone (or equivalent) — minimum 3 external dependencies and their associated configuration, pricing, and failure modes.
What KnowledgeSDK Collapses
POST /v1/extract does everything in the DIY pipeline as a single API call:
- Fetches the URL with a headless browser (handles JavaScript rendering)
- Applies anti-bot bypass where needed
- Converts the rendered HTML to clean markdown
- Chunks the content using a token-aware strategy
- Generates embeddings via text-embedding-3-small (1536 dimensions)
- Stores vectors in pgvector with HNSW indexing
- Makes the content immediately searchable
POST /v1/search runs hybrid retrieval — vector similarity search plus ILIKE keyword fallback — over your indexed content.
The full pipeline that required 7 steps and 3 services becomes:
import KnowledgeSDK from "@knowledgesdk/node";
const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
// Index a URL — all 7 steps happen server-side
await client.extract("https://docs.stripe.com/api/payment_intents");
// Search your indexed content
const results = await client.search("how do I handle payment confirmation?", {
limit: 5,
});
for (const item of results.items) {
console.log(`[${item.score.toFixed(2)}] ${item.title}`);
console.log(item.snippet);
console.log(`Source: ${item.sourceUrl}\n`);
}
from knowledgesdk import KnowledgeSDK
client = KnowledgeSDK(api_key=KNOWLEDGESDK_API_KEY)
# Index the URL
client.extract("https://docs.stripe.com/api/payment_intents")
# Search
results = client.search("how do I handle payment confirmation?", limit=5)
for item in results.items:
print(f"[{item.score:.2f}] {item.title}")
print(item.snippet)
print(f"Source: {item.source_url}\n")
What Happens Under the Hood
Understanding the pipeline internals helps you make informed decisions about when to use a managed solution versus building your own.
pgvector with HNSW indexing. KnowledgeSDK uses PostgreSQL with the pgvector extension and HNSW (Hierarchical Navigable Small World) indexing for approximate nearest neighbor search. In benchmarks against same-region deployments, this achieves ~10-20ms search latency at 94% accuracy. No separate vector database service is required.
text-embedding-3-small. Embeddings are generated via OpenAI's text-embedding-3-small model (1536 dimensions), routed through the Vercel AI gateway. Embedding vectors for repeated queries are cached in Redis with a 24-hour TTL — search queries you run frequently skip the embedding API call on subsequent requests.
Hybrid search. The search endpoint runs both vector similarity (embedding <=> ?::vector cosine distance) and keyword fallback (ILIKE pattern matching). This handles edge cases where a very specific technical term does not have a strong vector match — keyword fallback catches it.
A Complete Production Example
Here is an end-to-end implementation for indexing a competitor's documentation site and searching it:
import KnowledgeSDK from "@knowledgesdk/node";
const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
// Index a documentation site
async function indexDocumentation(baseUrl: string) {
// First, discover all URLs on the site
const sitemap = await client.sitemap(baseUrl);
const docUrls = sitemap.urls.filter((url) => url.includes("/docs/"));
console.log(`Found ${docUrls.length} documentation pages`);
// Extract each page asynchronously
const jobs = await Promise.all(
docUrls.slice(0, 50).map((url) =>
client.extractAsync(url, {
callbackUrl: `${process.env.YOUR_APP_URL}/webhooks/indexed`,
})
)
);
console.log(`Queued ${jobs.length} extraction jobs`);
}
// Search the indexed documentation
async function searchDocs(query: string) {
const results = await client.search(query, {
limit: 5,
filter: { domain: "docs.competitor.com" },
});
if (results.items.length === 0) {
return "No results found in the indexed documentation.";
}
return results.items
.map((item) => `## ${item.title}\n${item.snippet}\nSource: ${item.sourceUrl}`)
.join("\n\n");
}
// Example usage
await indexDocumentation("https://docs.competitor.com");
// Later — after extraction completes
const answer = await searchDocs("how do they handle webhook retry logic?");
console.log(answer);
DIY Pipeline Cost Estimate
For a team running 1,000 URL extractions per month and 10,000 searches:
DIY pipeline:
- Playwright/headless browser infrastructure: ~$50-100/month (self-hosted) or $200+ (managed)
- OpenAI embeddings (1,000 docs × ~4K tokens each): ~$2
- Pinecone starter: $70/month
- Postgres hosting (if separate from your main DB): $15-25/month
- Development and maintenance time: significant
KnowledgeSDK:
- Starter plan: $29/month (includes extractions, embeddings, search, webhooks, MCP)
- No infrastructure to configure or maintain
The cost difference is most obvious in developer time. The DIY pipeline is maintainable once built, but building it correctly — handling JS rendering, anti-bot, proper chunking, embedding caching — takes days, not hours.
When You Would Still Build DIY
There are legitimate reasons to build the pipeline yourself:
Custom chunking strategies. If your content requires domain-specific chunking (code-aware chunking for a developer docs corpus, section-aware chunking for legal documents), the managed extraction may not match your requirements.
Specific embedding models. If your retrieval accuracy depends on a particular model (Voyage 3.5 for code, Gemini embedding-001 for multilingual content), you need control over the embedding step.
Full data ownership. If your compliance requirements prohibit sending content to a third-party API, a fully self-hosted pipeline is necessary.
For most developers building AI agents that need web knowledge retrieval, none of these conditions apply. The managed pipeline is faster to build, easier to maintain, and cheaper at early-to-mid scale.
Summary
The seven-step DIY pipeline — fetch, parse, convert, chunk, embed, store, index — is a reasonable engineering exercise and provides maximum flexibility. It is also significant infrastructure to build and maintain.
For the common use case — "I need URLs to be searchable for my AI agent" — KnowledgeSDK collapses that pipeline into two API calls: extract and search. The tradeoff is control for simplicity, which is the right tradeoff for most production AI agents.
npm install @knowledgesdk/node
pip install knowledgesdk