LlamaIndex is the most widely used framework for building RAG (Retrieval-Augmented Generation) pipelines in Python. It handles document loading, chunking, indexing, and query — but it does not handle live web scraping. When your RAG pipeline needs to answer questions about current web content (competitor documentation, product pages, news articles), you need to solve the data collection problem yourself.
This tutorial shows you how to combine LlamaIndex with KnowledgeSDK to build a live web RAG pipeline that:
- Scrapes competitor documentation with full JavaScript rendering
- Indexes it in KnowledgeSDK's hybrid search engine
- Answers developer questions using LlamaIndex's query engine
- Stays current with webhook-triggered updates
By the end, you will have a working RAG agent that can answer questions like "How does Competitor X handle authentication?" or "What rate limits does their API have?" — using real-time web content.
Why Not Just Use LlamaIndex's Built-In Web Reader?
LlamaIndex ships a SimpleWebPageReader and a BeautifulSoupWebReader. They work for simple static pages. Here is where they fall short:
No JavaScript rendering. Modern documentation sites (Docusaurus, GitBook, Notion-embedded docs) require a browser to render. BeautifulSoup reads raw HTML, which is often empty for SPA-based sites.
No anti-bot bypass. Fetch the wrong page at the wrong rate and you get blocked. LlamaIndex's built-in readers have no proxy rotation or stealth mode.
No incremental updates. To keep your index fresh, you have to re-run the entire document loading step. There is no mechanism to detect which pages have changed.
You still need a vector database. LlamaIndex can index locally or push to Pinecone, Weaviate, etc. You pay for and maintain that separately.
KnowledgeSDK solves all four problems: headless browser scraping, anti-bot bypass, webhook-triggered updates, and built-in hybrid search.
Architecture Overview
Web (competitor docs)
→ KnowledgeSDK /v1/scrape (JS rendering + anti-bot)
→ KnowledgeSDK knowledge base (auto-indexed)
→ KnowledgeSDK /v1/search (hybrid semantic + keyword)
→ LlamaIndex RetrieverQueryEngine
→ LLM response
The key insight: KnowledgeSDK acts as both the web reader and the vector store. LlamaIndex provides the query engine and LLM integration layer.
Prerequisites
pip install llama-index knowledgesdk openai
You will need:
- A KnowledgeSDK API key (free tier at knowledgesdk.com)
- An OpenAI API key (for the LLM response generation)
Step 1: Scrape and Index Web Content
First, let us scrape a documentation site and index it in KnowledgeSDK.
# Python — Step 1: Scrape competitor docs into KnowledgeSDK
import asyncio
from knowledgesdk import KnowledgeSDK
client = KnowledgeSDK(api_key="sk_ks_your_key")
# Pages to index — could be competitor docs, your own docs, any web source
pages_to_scrape = [
"https://docs.competitor.com/getting-started",
"https://docs.competitor.com/authentication",
"https://docs.competitor.com/api-reference",
"https://docs.competitor.com/rate-limits",
"https://docs.competitor.com/webhooks",
"https://docs.competitor.com/sdks",
"https://docs.competitor.com/pricing",
]
print(f"Scraping {len(pages_to_scrape)} pages...")
for url in pages_to_scrape:
result = client.scrape(url)
print(f"Indexed: {url} ({len(result.markdown)} chars)")
print("All pages indexed. Ready for search.")
Or use /v1/extract to crawl an entire site at once:
# Python — Full site extraction (one call instead of looping)
extraction = client.extract("https://docs.competitor.com", options={
"maxPages": 50
})
print(f"Extracted {extraction.pageCount} pages")
print(f"Total content: {extraction.totalCharacters} characters")
Step 2: Create a Custom LlamaIndex Retriever
LlamaIndex has a BaseRetriever interface that lets you plug in any search backend. We will implement one that queries KnowledgeSDK's /v1/search endpoint.
# Python — KnowledgeSDK retriever for LlamaIndex
from typing import List, Optional
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode, QueryBundle
from knowledgesdk import KnowledgeSDK
class KnowledgeSDKRetriever(BaseRetriever):
"""LlamaIndex retriever backed by KnowledgeSDK's hybrid search."""
def __init__(
self,
api_key: str,
top_k: int = 5,
min_score: float = 0.0
):
self.client = KnowledgeSDK(api_key=api_key)
self.top_k = top_k
self.min_score = min_score
super().__init__()
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
query = query_bundle.query_str
# Call KnowledgeSDK's hybrid search
results = self.client.search(query, limit=self.top_k)
nodes = []
for item in results.items:
if item.score < self.min_score:
continue
# Create a LlamaIndex TextNode from each search result
node = TextNode(
text=item.content or item.snippet,
metadata={
"url": item.url,
"title": item.title,
"score": item.score,
"category": getattr(item, "category", None)
}
)
nodes.append(NodeWithScore(node=node, score=item.score))
return nodes
Step 3: Build the Query Engine
Now assemble the full query engine using our custom retriever and an OpenAI LLM.
# Python — Full RAG pipeline with LlamaIndex + KnowledgeSDK
from llama_index.core import Settings
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Configure LlamaIndex settings
Settings.llm = OpenAI(
model="gpt-4o",
api_key="your-openai-api-key",
temperature=0.1
)
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key="your-openai-api-key"
)
# Create our KnowledgeSDK retriever
retriever = KnowledgeSDKRetriever(
api_key="sk_ks_your_key",
top_k=5
)
# Create the response synthesizer
synthesizer = get_response_synthesizer(
response_mode="tree_summarize", # Good for multi-document synthesis
verbose=True
)
# Build the query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=synthesizer
)
# Ask questions about your scraped web content
response = query_engine.query(
"What authentication methods does their API support?"
)
print(str(response))
print("\nSources:")
for node in response.source_nodes:
print(f" - {node.metadata['title']} ({node.metadata['url']})")
Step 4: Streaming Responses
For production applications, streaming is better UX. LlamaIndex supports streaming queries natively.
# Python — Streaming response with LlamaIndex + KnowledgeSDK
from llama_index.core.query_engine import RetrieverQueryEngine
streaming_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=get_response_synthesizer(
response_mode="tree_summarize",
streaming=True
)
)
# Stream the response token by token
streaming_response = streaming_engine.query(
"What are the rate limits on their API?"
)
print("Answer: ", end="", flush=True)
for token in streaming_response.response_gen:
print(token, end="", flush=True)
print() # newline
Step 5: Keeping the Index Fresh with Webhooks
The knowledge base needs to stay current. KnowledgeSDK webhooks let you trigger re-scrapes automatically when pages change.
First, create webhooks for your monitored pages:
# Python — Set up webhooks for automatic index refresh
from knowledgesdk import KnowledgeSDK
client = KnowledgeSDK(api_key="sk_ks_your_key")
monitored_pages = [
"https://docs.competitor.com/api-reference",
"https://docs.competitor.com/rate-limits",
"https://competitor.com/pricing"
]
for url in monitored_pages:
webhook = client.webhooks.create(
url=url,
callback_url="https://your-app.com/webhooks/content-changed",
events=["content.changed"]
)
print(f"Monitoring {url}: webhook {webhook.id}")
Then handle the webhook in your server to trigger a re-scrape:
# Python — FastAPI webhook handler for automatic re-indexing
from fastapi import FastAPI, Request
from knowledgesdk import KnowledgeSDK
app = FastAPI()
client = KnowledgeSDK(api_key="sk_ks_your_key")
@app.post("/webhooks/content-changed")
async def handle_content_change(request: Request):
payload = await request.json()
changed_url = payload.get("url")
event = payload.get("event")
if event == "content.changed" and changed_url:
print(f"Content changed at {changed_url}, re-indexing...")
# Re-scrape the changed page — KnowledgeSDK updates the index
result = client.scrape(changed_url)
print(f"Re-indexed {changed_url}: {len(result.markdown)} chars")
return {"status": "ok"}
Now your RAG pipeline automatically stays current whenever monitored pages update.
Step 6: Complete Working Example
Here is the full end-to-end pipeline:
# Python — Complete LlamaIndex + KnowledgeSDK RAG pipeline
import os
from knowledgesdk import KnowledgeSDK
from llama_index.core import Settings
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode, QueryBundle
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from typing import List
# --- Configuration ---
KS_API_KEY = os.environ["KNOWLEDGESDK_API_KEY"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
# --- LlamaIndex Settings ---
Settings.llm = OpenAI(model="gpt-4o", api_key=OPENAI_API_KEY)
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=OPENAI_API_KEY
)
# --- KnowledgeSDK Retriever ---
class KnowledgeSDKRetriever(BaseRetriever):
def __init__(self, api_key: str, top_k: int = 5):
self.client = KnowledgeSDK(api_key=api_key)
self.top_k = top_k
super().__init__()
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
results = self.client.search(query_bundle.query_str, limit=self.top_k)
return [
NodeWithScore(
node=TextNode(
text=item.content or item.snippet,
metadata={"url": item.url, "title": item.title}
),
score=item.score
)
for item in results.items
]
# --- Initialize knowledge base ---
def build_knowledge_base(urls: List[str]):
client = KnowledgeSDK(api_key=KS_API_KEY)
for url in urls:
print(f"Scraping: {url}")
client.scrape(url)
print(f"Knowledge base ready: {len(urls)} pages indexed")
# --- Build query engine ---
def create_query_engine():
retriever = KnowledgeSDKRetriever(api_key=KS_API_KEY)
synthesizer = get_response_synthesizer(response_mode="tree_summarize")
return RetrieverQueryEngine(retriever=retriever, response_synthesizer=synthesizer)
# --- Main ---
if __name__ == "__main__":
# Build knowledge base from target pages
build_knowledge_base([
"https://docs.competitor.com/getting-started",
"https://docs.competitor.com/authentication",
"https://docs.competitor.com/api-reference",
"https://docs.competitor.com/pricing"
])
# Create query engine
engine = create_query_engine()
# Interactive Q&A
questions = [
"How do I authenticate with their API?",
"What are the rate limits?",
"Do they have a Python SDK?",
"What is the pricing for the Pro plan?"
]
for question in questions:
print(f"\nQ: {question}")
response = engine.query(question)
print(f"A: {response}")
print("Sources:", [n.metadata["url"] for n in response.source_nodes])
Comparing: LlamaIndex Built-In Web Reader vs KnowledgeSDK
| Capability | LlamaIndex SimpleWebPageReader | KnowledgeSDK Retriever |
|---|---|---|
| Static page HTML | Yes | Yes |
| JavaScript rendering | No | Yes |
| Anti-bot bypass | No | Yes |
| Pagination handling | No | Yes |
| Full site crawl | No | Yes |
| Index persistence | Via vector store | Built-in |
| Hybrid search | No (vector only) | Yes |
| Incremental updates | Manual | Webhook-triggered |
| External vector DB needed | Yes (Pinecone etc.) | No |
| Setup time | Medium | Low |
The built-in reader works for simple, static pages. For production systems with modern web targets, KnowledgeSDK handles everything the built-in reader cannot.
Using KnowledgeSDK as a Node Index
If you prefer to work within LlamaIndex's index abstraction more tightly, you can also use VectorStoreIndex with a custom vector store adapter backed by KnowledgeSDK. But the RetrieverQueryEngine pattern shown above is simpler and gives you direct access to KnowledgeSDK's hybrid search.
Performance Notes
KnowledgeSDK's /v1/search returns results in under 100ms regardless of knowledge base size (up to tens of thousands of documents). LlamaIndex's tree_summarize response synthesizer adds LLM latency on top. End-to-end query time for a typical question is 1-3 seconds with GPT-4o.
For lower latency, use response_mode="compact" instead of "tree_summarize":
synthesizer = get_response_synthesizer(response_mode="compact")
FAQ
Do I need a separate embedding model if KnowledgeSDK handles search?
No. KnowledgeSDK manages embeddings internally. You only need an LLM (like GPT-4o) for the response synthesis step. The Settings.embed_model in LlamaIndex is used only if you use LlamaIndex's own indexing features — with the custom retriever approach, it is not needed.
Can I combine KnowledgeSDK with LlamaIndex's local vector stores?
Yes. You could scrape with KnowledgeSDK, retrieve the markdown content, and feed it into LlamaIndex's VectorStoreIndex for local indexing. This gives you offline search capability. But you lose the hybrid search quality and eliminate the need for a separate vector DB.
How do I handle rate limits on the KnowledgeSDK API? KnowledgeSDK's search endpoint is designed for high query volumes. If you are scraping many pages simultaneously, use async calls with concurrency limits. The Python SDK supports both sync and async usage.
Can I filter searches to specific sources?
Yes. KnowledgeSDK search results include the source URL in metadata. You can post-filter results in the _retrieve method, or use query prefixes that biased toward specific domains in your knowledge base.
What is the maximum knowledge base size? KnowledgeSDK's search is designed to scale to tens of thousands of documents per API key. For very large knowledge bases, pagination on search results is supported.
Does this work with open-source LLMs instead of GPT-4o?
Yes. LlamaIndex supports Ollama, Hugging Face, Mistral, and many other LLM backends. Replace OpenAI(model="gpt-4o") with your preferred LLM provider. The KnowledgeSDK retriever is LLM-agnostic.
Is there an async version of the KnowledgeSDK Python SDK?
The KnowledgeSDK Python SDK supports async usage. Use await client.scrape_async(url) and await client.search_async(query) for non-blocking calls in async frameworks.
Conclusion
LlamaIndex provides an excellent query engine and LLM orchestration layer. KnowledgeSDK fills the gap that LlamaIndex's built-in web readers leave open: production-quality JavaScript rendering, anti-bot bypass, automatic indexing, and hybrid search.
The combination gives you a live web RAG pipeline where:
- KnowledgeSDK handles web data collection and search
- LlamaIndex handles LLM integration and response synthesis
- Webhooks keep the knowledge base current automatically
No separate vector database. No embedding pipeline. No proxy configuration. Just a clean, maintainable RAG pipeline grounded in real, current web content.
Get your KnowledgeSDK API key and build your first live web RAG pipeline today.
pip install knowledgesdk llama-index llama-index-llms-openai