From URL to Searchable Knowledge in 60 Seconds (Full Tutorial)

The fastest way to turn any website into a searchable knowledge base: one API call to extract, one to search. No infrastructure, no embedding pipeline. Just results.

The usual path from "I want to search this website" to "I can actually search this website" involves picking an embedding model, setting up a vector database, building a scraper that handles JavaScript rendering, writing a chunking pipeline, scheduling re-indexing jobs, and wiring a search endpoint to it all.

That's weeks of work if you're doing it carefully. Months if you're doing it right.

There is a shorter path: two API calls.

The 60-Second Claim

Here is what actually happens in those 60 seconds when you call /v1/extract:

JavaScript rendering — KnowledgeSDK uses a headless browser to render the full page, including content loaded by JS frameworks
Clean extraction — navigation, ads, footers, and boilerplate are stripped; the content is returned as clean markdown
Chunking — the document is split into semantically coherent chunks with appropriate overlap
Embedding — each chunk is embedded using a high-quality embedding model
Indexing — chunks are stored in a hybrid index (semantic + BM25 keyword) keyed to your API key
Ready — any subsequent search query over your API key searches these chunks

Steps 1–6 happen in the background after the call returns. The extract call returns the full markdown immediately; indexing completes shortly after. By the time a user is reading the extracted content, it's searchable.

For small pages, the entire process — including indexing — typically completes in 15–30 seconds. For larger, JavaScript-heavy pages, allow up to 60 seconds.

The 2-Call Pattern

You need exactly two calls to go from URL to searchable knowledge.

Using curl:

# Call 1: Extract and index (15-60 seconds)
curl -X POST https://api.knowledgesdk.com/v1/extract \
  -H "x-api-key: knowledgesdk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/pricing"}'

# Call 2: Search (< 100ms)
curl -X POST https://api.knowledgesdk.com/v1/search \
  -H "x-api-key: knowledgesdk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"query": "what does the pro plan include?"}'

The extract response includes the full markdown content, page title, word count, and metadata. The search response returns ranked chunks with content, source URL, and relevance scores.

Using Node.js:

const ks = require('@knowledgesdk/node');
const client = new ks.KnowledgeSDK({ apiKey: 'knowledgesdk_live_your_key_here' });

// Extract and index
const extracted = await client.extract({ url: 'https://example.com/pricing' });
console.log(`Extracted: ${extracted.title} (${extracted.word_count} words)`);

// Search
const results = await client.search({ query: 'what does the pro plan include?' });

for (const result of results.results) {
  console.log(`[${result.title}] ${result.source_url}`);
  console.log(result.content);
  console.log(`Score: ${result.score}`);
}

Using Python:

from knowledgesdk import KnowledgeSDK

client = KnowledgeSDK(api_key="knowledgesdk_live_your_key_here")

# Extract and index
extracted = client.extract(url="https://example.com/pricing")
print(f"Extracted: {extracted.title} ({extracted.word_count} words)")

# Search
results = client.search(query="what does the pro plan include?")

for result in results.results:
    print(f"[{result.title}] {result.source_url}")
    print(result.content)
    print(f"Score: {result.score}")

The API is identical across languages. Extract, then search. That's it.

What You Get in Search Results

Each search result includes:

title — the page title of the source document
content — the relevant chunk of text (typically 200–800 words)
source_url — the exact URL the chunk came from
score — relevance score (0–1, higher is better)

The source_url field is particularly useful for attribution. When you build a chatbot or research tool on top of the search results, you can show users exactly where each answer came from.

For Large Sites: Sitemap + Async

For a single page, the sync extract call is all you need. For entire documentation sites, help centers, or multi-page product catalogs, combine the sitemap endpoint with async extraction.

// Discover all URLs on a site
const sitemap = await client.sitemap({ url: 'https://docs.yourproduct.com' });
console.log(`Found ${sitemap.urls.length} pages`);

// Extract each page asynchronously
const jobs = [];
for (const url of sitemap.urls) {
  const job = await client.extractAsync({ url });
  jobs.push({ jobId: job.jobId, url });
}

// Poll for completion (or use a webhook)
for (const { jobId, url } of jobs) {
  let status;
  do {
    const job = await client.getJob({ jobId });
    status = job.status;
    if (status !== 'complete') {
      await new Promise(r => setTimeout(r, 3000));
    }
  } while (status !== 'complete' && status !== 'failed');
  
  console.log(`${status}: ${url}`);
}

// Now search across all indexed pages
const results = await client.search({ query: 'how do I reset my password?' });

The async endpoint returns a jobId immediately. You poll /v1/jobs/{jobId} to check status, or set a callbackUrl in the extract request to receive a webhook when each job completes.

Response Times

To set expectations:

Operation	Typical time
`/v1/extract` (small page, < 500 words)	10–20 seconds
`/v1/extract` (medium page, 500–2000 words)	20–40 seconds
`/v1/extract` (large page or heavy JS)	40–90 seconds
`/v1/search`	< 100ms
`/v1/sitemap`	5–15 seconds

Search is fast because all the expensive work (embedding, indexing) happened at extraction time. Every search query runs against a pre-built hybrid index.

Three Things You Can Build in 60 More Seconds

Once you've done the initial extract, here are three use cases you can wire up quickly:

1. Competitor monitoring

Extract a competitor's pricing page. Set a daily cron job to re-extract it. Compare the markdown diff between runs. If the content changes, send yourself a Slack message.

// Daily job
const result = await client.extract({ url: 'https://competitor.com/pricing' });
// Compare result.content to yesterday's snapshot
// Alert on significant diff

2. Documentation chatbot

Extract all your docs pages. Add a search call before your LLM call. Pass retrieved chunks as context. Return the answer with source_url citations.

// User asks a question
const results = await client.search({ query: userQuestion });
const context = results.results.map(r => r.content).join('\n\n');
// Pass context to your LLM

3. Research agent

Give your AI agent the ability to extract any URL it discovers during research. Every extracted URL automatically becomes searchable — the corpus grows as the agent works.

// Agent discovers a URL worth reading
await client.extract({ url: discoveredUrl });
// Now searchable alongside everything else the agent has read

Getting Started

You need one thing: an API key. Get one at knowledgesdk.com, paste it into either of the code examples above, and run them against any URL you want to search.

The first extract call is the slowest part. Everything after that — all your searches — runs in under 300ms. The infrastructure is already there. The embeddings are already stored. You're just querying an index that was built for you automatically.

That's what 60 seconds buys you.

Try it now