Perplexity API Alternative: Build AI Search for Specific Websites
Perplexity has done something genuinely impressive: it made web search feel like talking to an intelligent researcher rather than querying a database. The Sonar API packages that capability into a developer-friendly endpoint — you send a question, it searches the web, cites sources, and returns a grounded answer.
For general knowledge questions, this is excellent. But for a growing class of AI applications, "the entire internet" is the wrong retrieval scope.
Consider these use cases:
- A support chatbot that should only cite answers from your company's documentation
- A competitive intelligence tool that monitors three specific competitor websites
- A research assistant that answers questions using only peer-reviewed papers from a particular journal
- An internal knowledge agent that searches your engineering wiki and Notion pages
- A customer success tool that answers only from your product changelog and release notes
In all of these cases, Perplexity Sonar gives you too much surface area. A general web search will return results from Stack Overflow, Reddit, random blogs, and outdated tutorials — none of which are the authoritative sources you want your agent citing.
The architecture you need is a crawl-then-search model: define exactly which URLs are in your search index, and query only those.
How Perplexity Sonar Works
Perplexity's Sonar API uses a live web search layer. When you send a query, it:
- Searches the public internet using Perplexity's search index
- Retrieves content from the top results
- Generates a cited answer using an LLM with that web content as context
This is an excellent architecture for questions where the best answer might exist anywhere on the public internet. It is the wrong architecture when you need to constrain the retrieval to a specific domain.
# Python — Perplexity Sonar API (general web search)
import requests
response = requests.post(
"https://api.perplexity.ai/chat/completions",
headers={"Authorization": f"Bearer {PERPLEXITY_API_KEY}"},
json={
"model": "sonar-pro",
"messages": [
{"role": "user", "content": "What is Stripe's current pricing for international cards?"}
],
},
)
# This searches the entire web — might return outdated info,
# unofficial blog posts, or competitor comparisons.
# You can't constrain it to only stripe.com
print(response.json()["choices"][0]["message"]["content"])
The Sonar API does support a search_domain_filter parameter that lets you restrict results to specific domains. This helps, but it relies on those domains being indexed in Perplexity's search index — it does not give you control over which specific pages are indexed or how fresh the content is.
The Crawl-Then-Search Architecture
For domain-specific AI search, the correct architecture is:
[Setup Phase — run once or on schedule]
Define source URLs
│
▼
Crawl + Extract (KnowledgeSDK /v1/extract)
│
▼
Indexed Knowledge Base (semantic + keyword)
│
[Query Phase — every user query]
│
User Question
│
▼
Semantic Search (/v1/search)
│
▼
Relevant Chunks
│
▼
LLM generates cited answer
This gives you:
- Exact source control: only the URLs you specify are in the index
- Freshness control: re-crawl on your schedule, or use webhooks to re-index when content changes
- Zero hallucination from irrelevant sources: the LLM can only cite what is in your index
- Consistent performance: you are not dependent on whether a domain is in Perplexity's index
Architecture Comparison
| Dimension | Perplexity Sonar API | KnowledgeSDK (Crawl-then-Search) |
|---|---|---|
| Retrieval scope | Entire public internet | Your defined URLs only |
| Source control | Domain filter only | Exact URL list |
| Content freshness | Perplexity's index freshness | Controlled by you |
| Setup required | None | Crawl step (minutes) |
| Works on private/internal URLs | No | Yes (with auth) |
| Semantic search quality | Good (general) | Excellent (domain-specific) |
| Citation accuracy | Varies | High (deterministic sources) |
| Cost per query | ~$0.005–$0.01 | ~$0.001–$0.005 |
| Customization | Limited | Full control |
| Best for | General knowledge questions | Domain-specific, authoritative answers |
Tutorial: Build a Private Perplexity for Stripe Documentation
Let us build a domain-specific AI search tool that answers questions using only Stripe's documentation. The result: a chatbot that always cites stripe.com pages and never hallucinates answers from unofficial sources.
Step 1: Index the Source URLs
# Python — Index Stripe documentation pages
import os
from knowledgesdk import KnowledgeSDK
client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])
# Define the documentation pages you want to index
stripe_docs = [
"https://stripe.com/docs/payments/payment-intents",
"https://stripe.com/docs/payments/accept-a-payment",
"https://stripe.com/docs/billing/subscriptions/overview",
"https://stripe.com/docs/connect/overview",
"https://stripe.com/docs/radar/rules",
"https://stripe.com/docs/webhooks",
"https://stripe.com/docs/api",
"https://stripe.com/pricing",
]
# Extract and index each page
for url in stripe_docs:
result = client.extract(url=url)
print(f"Indexed: {url} ({result.word_count} words)")
print(f"Indexed {len(stripe_docs)} Stripe documentation pages")
// TypeScript — Index documentation pages
import { KnowledgeSDK } from "@knowledgesdk/node";
const client = new KnowledgeSDK({
apiKey: process.env.KNOWLEDGESDK_API_KEY!,
});
const stripeDocs = [
"https://stripe.com/docs/payments/payment-intents",
"https://stripe.com/docs/payments/accept-a-payment",
"https://stripe.com/docs/billing/subscriptions/overview",
"https://stripe.com/docs/connect/overview",
"https://stripe.com/docs/radar/rules",
"https://stripe.com/docs/webhooks",
"https://stripe.com/docs/api",
"https://stripe.com/pricing",
];
// Index pages in parallel for speed
const results = await Promise.all(
stripeDocs.map((url) => client.extract({ url }))
);
console.log(`Indexed ${results.length} Stripe documentation pages`);
Step 2: Query Your Private Search Index
# Python — Search and answer using only indexed content
from openai import OpenAI
openai = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def stripe_docs_search(question: str) -> str:
# Search only the indexed Stripe documentation
search_results = client.search(
query=question,
limit=5,
)
if not search_results.results:
return "I could not find relevant information in the Stripe documentation."
# Build context from search results
context_parts = []
for result in search_results.results:
context_parts.append(
f"Source: {result.url}\n"
f"Title: {result.title}\n\n"
f"{result.content}"
)
context = "\n\n---\n\n".join(context_parts)
# Generate answer grounded in Stripe docs only
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": """You are a Stripe documentation assistant.
Answer questions using ONLY the documentation excerpts provided below.
Always cite the source URL when you provide information.
If the documentation doesn't cover the question, say so clearly — do not guess.
Stripe Documentation:
""" + context,
},
{"role": "user", "content": question},
],
)
return response.choices[0].message.content
# Example queries
questions = [
"What is the difference between Payment Intents and Charges?",
"How do I handle webhook signature verification?",
"What fees does Stripe charge for international cards?",
]
for question in questions:
print(f"\nQ: {question}")
print(f"A: {stripe_docs_search(question)}")
// TypeScript — Full search-and-answer implementation
import { KnowledgeSDK } from "@knowledgesdk/node";
import { openai } from "@ai-sdk/openai";
import { generateText } from "ai";
const client = new KnowledgeSDK({
apiKey: process.env.KNOWLEDGESDK_API_KEY!,
});
async function stripeDocsSearch(question: string): Promise<string> {
// Search only indexed Stripe documentation
const searchResults = await client.search({
query: question,
limit: 5,
});
if (!searchResults.results.length) {
return "I could not find relevant information in the Stripe documentation.";
}
const context = searchResults.results
.map(
(r) =>
`Source: ${r.url}\nTitle: ${r.title}\n\n${r.content}`
)
.join("\n\n---\n\n");
const { text } = await generateText({
model: openai("gpt-4o"),
system: `You are a Stripe documentation assistant.
Answer questions using ONLY the documentation excerpts provided below.
Always cite the source URL. If the docs don't cover the question, say so.
Stripe Documentation:
${context}`,
prompt: question,
});
return text;
}
// Example
const answer = await stripeDocsSearch(
"What fees does Stripe charge for international cards?"
);
console.log(answer);
Step 3: Keep the Index Fresh with Webhooks
Documentation pages change. Stripe updates their pricing, adds new features, and revises their guides. Set up a webhook to re-index pages when their content changes:
# Python — Set up webhook to monitor documentation pages for changes
webhook = client.webhooks.create(
url="https://your-app.com/webhooks/doc-updated",
events=["page.changed"],
urls=stripe_docs, # The same list you indexed earlier
)
print(f"Monitoring {len(stripe_docs)} pages for changes")
print(f"Webhook ID: {webhook.id}")
# Python — Handle the webhook callback (e.g., Flask route)
from flask import Flask, request
app = Flask(__name__)
@app.route("/webhooks/doc-updated", methods=["POST"])
def handle_doc_update():
payload = request.json
changed_url = payload["url"]
# Re-index the changed page
result = client.extract(url=changed_url)
print(f"Re-indexed: {changed_url}")
return {"status": "ok"}
With this setup, your search index automatically stays current without manual re-crawling.
Extending to Multiple Domains
The same pattern scales to multiple documentation sources. Build a developer tool that searches across several API providers simultaneously:
# Python — Multi-domain documentation search
doc_sources = {
"stripe": [
"https://stripe.com/docs/api",
"https://stripe.com/docs/webhooks",
],
"twilio": [
"https://www.twilio.com/docs/sms",
"https://www.twilio.com/docs/voice",
],
"sendgrid": [
"https://docs.sendgrid.com/for-developers/sending-email/api-getting-started",
],
}
# Index all sources
for provider, urls in doc_sources.items():
for url in urls:
client.extract(url=url)
print(f"Indexed [{provider}]: {url}")
def multi_provider_search(question: str, providers: list = None) -> str:
"""Search across all indexed documentation."""
results = client.search(
query=question,
limit=8,
# Optional: filter by specific domains
# filter={"domain": {"$in": ["stripe.com", "twilio.com"]}}
)
context = "\n\n---\n\n".join(
f"[{r.url}]\n{r.content}" for r in results.results
)
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": f"Answer using only the documentation below:\n\n{context}",
},
{"role": "user", "content": question},
],
)
return response.choices[0].message.content
When to Use Perplexity Sonar vs KnowledgeSDK
This is not a zero-sum comparison. Both tools have clear use cases:
Use Perplexity Sonar when:
- Your users ask general knowledge questions that benefit from broad web coverage
- You do not care which specific sites the answers come from — any authoritative source is acceptable
- You want a zero-setup search integration and freshness is handled by Perplexity's crawlers
- Your use case is consumer-facing general Q&A, not domain-specific tooling
Use KnowledgeSDK when:
- You need strict source control — answers must cite only your approved URLs
- You want to search internal, private, or authenticated content
- You are building a customer support bot, documentation assistant, or competitive intelligence tool
- You want to control re-indexing schedules and get webhook alerts when source content changes
- Your users need high-confidence citations from authoritative sources in your domain
The key mental model: Perplexity Sonar is a search engine you connect to. KnowledgeSDK is a search index you control.
Real-World Applications
Teams are building domain-specific AI search with this pattern for:
- Developer documentation assistants: Answer questions using only a specific SDK's documentation, so the bot never cites outdated Stack Overflow answers
- Internal knowledge bases: Index Notion, Confluence, and internal wikis so employees can ask natural-language questions
- Competitive intelligence: Monitor competitor product pages and answer questions about their feature set with citation
- Legal and compliance search: Answer questions using only approved regulatory documents, not general web results
- Customer support bots: Answer product questions using only official documentation and changelog entries
In each case, source control is the critical requirement — and that is exactly what the crawl-then-search model delivers.
Start building domain-specific AI search at knowledgesdk.com