knowledgesdk.com/blog/knowledgesdk-llamaindex

integrationMarch 19, 2026·14 min read

Using KnowledgeSDK with LlamaIndex for Live Web RAG (2026)

Build a live web RAG pipeline with LlamaIndex and KnowledgeSDK. Scrape competitor docs, index them, and answer questions—no separate vector DB required.

LlamaIndex is the most widely used framework for building RAG (Retrieval-Augmented Generation) pipelines in Python. It handles document loading, chunking, indexing, and query — but it does not handle live web scraping. When your RAG pipeline needs to answer questions about current web content (competitor documentation, product pages, news articles), you need to solve the data collection problem yourself.

This tutorial shows you how to combine LlamaIndex with KnowledgeSDK to build a live web RAG pipeline that:

Scrapes competitor documentation with full JavaScript rendering
Indexes it in KnowledgeSDK's hybrid search engine
Answers developer questions using LlamaIndex's query engine
Stays current with webhook-triggered updates

By the end, you will have a working RAG agent that can answer questions like "How does Competitor X handle authentication?" or "What rate limits does their API have?" — using real-time web content.

Why Not Just Use LlamaIndex's Built-In Web Reader?

LlamaIndex ships a SimpleWebPageReader and a BeautifulSoupWebReader. They work for simple static pages. Here is where they fall short:

No JavaScript rendering. Modern documentation sites (Docusaurus, GitBook, Notion-embedded docs) require a browser to render. BeautifulSoup reads raw HTML, which is often empty for SPA-based sites.

No anti-bot bypass. Fetch the wrong page at the wrong rate and you get blocked. LlamaIndex's built-in readers have no proxy rotation or stealth mode.

No incremental updates. To keep your index fresh, you have to re-run the entire document loading step. There is no mechanism to detect which pages have changed.

You still need a vector database. LlamaIndex can index locally or push to Pinecone, Weaviate, etc. You pay for and maintain that separately.

KnowledgeSDK solves all four problems: headless browser scraping, anti-bot bypass, webhook-triggered updates, and built-in hybrid search.

Architecture Overview

Web (competitor docs)
    → KnowledgeSDK /v1/extract (JS rendering + anti-bot)
    → KnowledgeSDK knowledge base (auto-indexed)
    → KnowledgeSDK /v1/search (hybrid semantic + keyword)
    → LlamaIndex RetrieverQueryEngine
    → LLM response

The key insight: KnowledgeSDK acts as both the web reader and the vector store. LlamaIndex provides the query engine and LLM integration layer.

Prerequisites

pip install llama-index knowledgesdk openai

You will need:

A KnowledgeSDK API key (free tier at knowledgesdk.com)
An OpenAI API key (for the LLM response generation)

Step 1: Scrape and Index Web Content

First, let us scrape a documentation site and index it in KnowledgeSDK.

# Python — Step 1: Scrape competitor docs into KnowledgeSDK
import asyncio
from knowledgesdk import KnowledgeSDK

client = KnowledgeSDK(api_key="knowledgesdk_live_your_key")

# Pages to index — could be competitor docs, your own docs, any web source
pages_to_scrape = [
    "https://docs.competitor.com/getting-started",
    "https://docs.competitor.com/authentication",
    "https://docs.competitor.com/api-reference",
    "https://docs.competitor.com/rate-limits",
    "https://docs.competitor.com/webhooks",
    "https://docs.competitor.com/sdks",
    "https://docs.competitor.com/pricing",
]

print(f"Scraping {len(pages_to_scrape)} pages...")

for url in pages_to_scrape:
    result = client.scrape(url)
    print(f"Indexed: {url} ({len(result.markdown)} chars)")

print("All pages indexed. Ready for search.")

Or use /v1/extract to crawl an entire site at once:

# Python — Full site extraction (one call instead of looping)
extraction = client.extract("https://docs.competitor.com", options={
    "maxPages": 50
})

print(f"Extracted {extraction.pageCount} pages")
print(f"Total content: {extraction.totalCharacters} characters")

Step 2: Create a Custom LlamaIndex Retriever

LlamaIndex has a BaseRetriever interface that lets you plug in any search backend. We will implement one that queries KnowledgeSDK's /v1/search endpoint.

# Python — KnowledgeSDK retriever for LlamaIndex
from typing import List, Optional
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode, QueryBundle
from knowledgesdk import KnowledgeSDK

class KnowledgeSDKRetriever(BaseRetriever):
    """LlamaIndex retriever backed by KnowledgeSDK's hybrid search."""

    def __init__(
        self,
        api_key: str,
        top_k: int = 5,
        min_score: float = 0.0
    ):
        self.client = KnowledgeSDK(api_key=api_key)
        self.top_k = top_k
        self.min_score = min_score
        super().__init__()

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        query = query_bundle.query_str

        # Call KnowledgeSDK's hybrid search
        results = self.client.search(query, limit=self.top_k)

        nodes = []
        for item in results.items:
            if item.score < self.min_score:
                continue

            # Create a LlamaIndex TextNode from each search result
            node = TextNode(
                text=item.content or item.snippet,
                metadata={
                    "url": item.url,
                    "title": item.title,
                    "score": item.score,
                    "category": getattr(item, "category", None)
                }
            )
            nodes.append(NodeWithScore(node=node, score=item.score))

        return nodes

Step 3: Build the Query Engine

Now assemble the full query engine using our custom retriever and an OpenAI LLM.

# Python — Full RAG pipeline with LlamaIndex + KnowledgeSDK
from llama_index.core import Settings
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure LlamaIndex settings
Settings.llm = OpenAI(
    model="gpt-4o",
    api_key="your-openai-api-key",
    temperature=0.1
)
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key="your-openai-api-key"
)

# Create our KnowledgeSDK retriever
retriever = KnowledgeSDKRetriever(
    api_key="knowledgesdk_live_your_key",
    top_k=5
)

# Create the response synthesizer
synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",  # Good for multi-document synthesis
    verbose=True
)

# Build the query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=synthesizer
)

# Ask questions about your scraped web content
response = query_engine.query(
    "What authentication methods does their API support?"
)
print(str(response))
print("\nSources:")
for node in response.source_nodes:
    print(f"  - {node.metadata['title']} ({node.metadata['url']})")

Step 4: Streaming Responses

For production applications, streaming is better UX. LlamaIndex supports streaming queries natively.

# Python — Streaming response with LlamaIndex + KnowledgeSDK
from llama_index.core.query_engine import RetrieverQueryEngine

streaming_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=get_response_synthesizer(
        response_mode="tree_summarize",
        streaming=True
    )
)

# Stream the response token by token
streaming_response = streaming_engine.query(
    "What are the rate limits on their API?"
)

print("Answer: ", end="", flush=True)
for token in streaming_response.response_gen:
    print(token, end="", flush=True)
print()  # newline

Step 5: Keeping the Index Fresh with Webhooks

The knowledge base needs to stay current. KnowledgeSDK webhooks let you trigger re-scrapes automatically when pages change.

First, create webhooks for your monitored pages:

# Python — Set up webhooks for automatic index refresh
from knowledgesdk import KnowledgeSDK

client = KnowledgeSDK(api_key="knowledgesdk_live_your_key")

monitored_pages = [
    "https://docs.competitor.com/api-reference",
    "https://docs.competitor.com/rate-limits",
    "https://competitor.com/pricing"
]

for url in monitored_pages:
    webhook = client.webhooks.create(
        url=url,
        callback_url="https://your-app.com/webhooks/content-changed",
        events=["content.changed"]
    )
    print(f"Monitoring {url}: webhook {webhook.id}")

Then handle the webhook in your server to trigger a re-scrape:

# Python — FastAPI webhook handler for automatic re-indexing
from fastapi import FastAPI, Request
from knowledgesdk import KnowledgeSDK

app = FastAPI()
client = KnowledgeSDK(api_key="knowledgesdk_live_your_key")

@app.post("/webhooks/content-changed")
async def handle_content_change(request: Request):
    payload = await request.json()

    changed_url = payload.get("url")
    event = payload.get("event")

    if event == "content.changed" and changed_url:
        print(f"Content changed at {changed_url}, re-indexing...")

        # Re-scrape the changed page — KnowledgeSDK updates the index
        result = client.scrape(changed_url)
        print(f"Re-indexed {changed_url}: {len(result.markdown)} chars")

    return {"status": "ok"}

Now your RAG pipeline automatically stays current whenever monitored pages update.

Step 6: Complete Working Example

Here is the full end-to-end pipeline:

# Python — Complete LlamaIndex + KnowledgeSDK RAG pipeline
import os
from knowledgesdk import KnowledgeSDK
from llama_index.core import Settings
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode, QueryBundle
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from typing import List

# --- Configuration ---
KS_API_KEY = os.environ["KNOWLEDGESDK_API_KEY"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

# --- LlamaIndex Settings ---
Settings.llm = OpenAI(model="gpt-4o", api_key=OPENAI_API_KEY)
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key=OPENAI_API_KEY
)

# --- KnowledgeSDK Retriever ---
class KnowledgeSDKRetriever(BaseRetriever):
    def __init__(self, api_key: str, top_k: int = 5):
        self.client = KnowledgeSDK(api_key=api_key)
        self.top_k = top_k
        super().__init__()

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        results = self.client.search(query_bundle.query_str, limit=self.top_k)
        return [
            NodeWithScore(
                node=TextNode(
                    text=item.content or item.snippet,
                    metadata={"url": item.url, "title": item.title}
                ),
                score=item.score
            )
            for item in results.items
        ]

# --- Initialize knowledge base ---
def build_knowledge_base(urls: List[str]):
    client = KnowledgeSDK(api_key=KS_API_KEY)
    for url in urls:
        print(f"Scraping: {url}")
        client.scrape(url)
    print(f"Knowledge base ready: {len(urls)} pages indexed")

# --- Build query engine ---
def create_query_engine():
    retriever = KnowledgeSDKRetriever(api_key=KS_API_KEY)
    synthesizer = get_response_synthesizer(response_mode="tree_summarize")
    return RetrieverQueryEngine(retriever=retriever, response_synthesizer=synthesizer)

# --- Main ---
if __name__ == "__main__":
    # Build knowledge base from target pages
    build_knowledge_base([
        "https://docs.competitor.com/getting-started",
        "https://docs.competitor.com/authentication",
        "https://docs.competitor.com/api-reference",
        "https://docs.competitor.com/pricing"
    ])

    # Create query engine
    engine = create_query_engine()

    # Interactive Q&A
    questions = [
        "How do I authenticate with their API?",
        "What are the rate limits?",
        "Do they have a Python SDK?",
        "What is the pricing for the Pro plan?"
    ]

    for question in questions:
        print(f"\nQ: {question}")
        response = engine.query(question)
        print(f"A: {response}")
        print("Sources:", [n.metadata["url"] for n in response.source_nodes])

Comparing: LlamaIndex Built-In Web Reader vs KnowledgeSDK

Capability	LlamaIndex SimpleWebPageReader	KnowledgeSDK Retriever
Static page HTML	Yes	Yes
JavaScript rendering	No	Yes
Anti-bot bypass	No	Yes
Pagination handling	No	Yes
Full site crawl	No	Yes
Index persistence	Via vector store	Built-in
Hybrid search	No (vector only)	Yes
Incremental updates	Manual	Webhook-triggered
External vector DB needed	Yes (Pinecone etc.)	No
Setup time	Medium	Low

The built-in reader works for simple, static pages. For production systems with modern web targets, KnowledgeSDK handles everything the built-in reader cannot.

Using KnowledgeSDK as a Node Index

If you prefer to work within LlamaIndex's index abstraction more tightly, you can also use VectorStoreIndex with a custom vector store adapter backed by KnowledgeSDK. But the RetrieverQueryEngine pattern shown above is simpler and gives you direct access to KnowledgeSDK's hybrid search.

Performance Notes

KnowledgeSDK's /v1/search returns results in under 300ms regardless of knowledge base size (up to tens of thousands of documents). LlamaIndex's tree_summarize response synthesizer adds LLM latency on top. End-to-end query time for a typical question is 1-3 seconds with GPT-4o.

For lower latency, use response_mode="compact" instead of "tree_summarize":

synthesizer = get_response_synthesizer(response_mode="compact")

FAQ

Do I need a separate embedding model if KnowledgeSDK handles search? No. KnowledgeSDK manages embeddings internally. You only need an LLM (like GPT-4o) for the response synthesis step. The Settings.embed_model in LlamaIndex is used only if you use LlamaIndex's own indexing features — with the custom retriever approach, it is not needed.

Can I combine KnowledgeSDK with LlamaIndex's local vector stores? Yes. You could scrape with KnowledgeSDK, retrieve the markdown content, and feed it into LlamaIndex's VectorStoreIndex for local indexing. This gives you offline search capability. But you lose the hybrid search quality and eliminate the need for a separate vector DB.

How do I handle rate limits on the KnowledgeSDK API? KnowledgeSDK's search endpoint is designed for high query volumes. If you are scraping many pages simultaneously, use async calls with concurrency limits. The Python SDK supports both sync and async usage.

Can I filter searches to specific sources? Yes. KnowledgeSDK search results include the source URL in metadata. You can post-filter results in the _retrieve method, or use query prefixes that biased toward specific domains in your knowledge base.

What is the maximum knowledge base size? KnowledgeSDK's search is designed to scale to tens of thousands of documents per API key. For very large knowledge bases, pagination on search results is supported.

Does this work with open-source LLMs instead of GPT-4o? Yes. LlamaIndex supports Ollama, Hugging Face, Mistral, and many other LLM backends. Replace OpenAI(model="gpt-4o") with your preferred LLM provider. The KnowledgeSDK retriever is LLM-agnostic.

Is there an async version of the KnowledgeSDK Python SDK? The KnowledgeSDK Python SDK supports async usage. Use await client.scrape_async(url) and await client.search_async(query) for non-blocking calls in async frameworks.

Conclusion

LlamaIndex provides an excellent query engine and LLM orchestration layer. KnowledgeSDK fills the gap that LlamaIndex's built-in web readers leave open: production-quality JavaScript rendering, anti-bot bypass, automatic indexing, and hybrid search.

The combination gives you a live web RAG pipeline where:

KnowledgeSDK handles web data collection and search
LlamaIndex handles LLM integration and response synthesis
Webhooks keep the knowledge base current automatically

No separate vector database. No embedding pipeline. No proxy configuration. Just a clean, maintainable RAG pipeline grounded in real, current web content.

Get your KnowledgeSDK API key and build your first live web RAG pipeline today.

pip install knowledgesdk llama-index llama-index-llms-openai

Try it now

Scrape, search, and monitor any website with one API.

Get your API key in 30 seconds. First 1,000 requests free.

GET API KEY →

integration

DSPy + Web Scraping: Optimize Your Retrieval Prompts Automatically

integration

Google ADK Web Scraping: Custom Grounding Beyond Google Search

integration

Web Scraping with Haystack: Build a Live RAG Pipeline with KnowledgeSDK

integration

LangGraph Web Scraping: Build a Stateful Web Research Agent

← Back to blog

Using KnowledgeSDK with LlamaIndex for Live Web RAG (2026)

Why Not Just Use LlamaIndex's Built-In Web Reader?

Architecture Overview

Prerequisites

Step 1: Scrape and Index Web Content

Step 2: Create a Custom LlamaIndex Retriever

Step 3: Build the Query Engine

Step 4: Streaming Responses

Step 5: Keeping the Index Fresh with Webhooks

Step 6: Complete Working Example

Comparing: LlamaIndex Built-In Web Reader vs KnowledgeSDK

Using KnowledgeSDK as a Node Index

Performance Notes

FAQ

Conclusion

Scrape, search, and monitor any website with one API.

Related Articles