Tutorial Articles

23 articles in this category

tutorialMar 20, 2026

Agentic RAG: Building Self-Correcting Retrieval Pipelines with Live Web Data

Learn how to build a CRAG (Corrective RAG) pipeline that falls back to live web scraping when your vector index is stale. Full Python code with LangGraph and KnowledgeSDK.

Read →· 15 min read

tutorialMar 20, 2026

Build a Searchable Knowledge Base from Any Website in Minutes

Step-by-step tutorial: extract any website into a searchable knowledge base using KnowledgeSDK — no infrastructure, no vector DB setup, just a few API calls.

Read →· 8 min read

tutorialMar 20, 2026

Build a Compliance Chatbot That Reads Your Website Automatically

Your compliance docs live on your website. Build a chatbot that reads them automatically, stays current when pages change, and answers questions with source citations.

Read →· 12 min read

tutorialMar 20, 2026

Context Engineering with Live Web Data: Keep Your AI Agents Current

Context engineering is the defining AI skill of 2026. Learn how to pipe live web data into agent context using KnowledgeSDK — just-in-time scraping, webhooks, and temporal metadata.

Read →· 13 min read

tutorialMar 20, 2026

GraphRAG + Web Scraping: Extract Entities and Build Knowledge Graphs from Any Website

Build a GraphRAG pipeline with KnowledgeSDK: scrape any website to clean markdown, extract entities with Claude or GPT-4o, and load into Neo4j or LightRAG.

Read →· 16 min read

tutorialMar 20, 2026

How to Keep Your RAG Pipeline Fresh Without Re-Indexing Everything

Stop re-crawling your entire knowledge base every 24 hours. Use KnowledgeSDK webhooks to update only changed pages in Pinecone or Weaviate — 10x cheaper.

Read →· 15 min read

tutorialMar 20, 2026

Build a Knowledge Graph from Any Website Using LLMs

End-to-end tutorial: scrape any website with KnowledgeSDK, extract entities and relationships with an LLM, and load the result into Neo4j for multi-hop graph queries.

Read →· 16 min read

tutorialMar 20, 2026

Building a Knowledge Graph from Websites with Neo4j and KnowledgeSDK

Extract entities and relationships from any website, build a Neo4j knowledge graph, and query it for multi-hop reasoning in your RAG pipeline.

Read →· 12 min read

tutorialMar 20, 2026

Markdown Extraction API: How to Get Clean Text from Any URL

A practical guide to markdown extraction APIs — what they do, how they differ, and how to use them to feed clean text to your LLMs, RAG pipelines, and AI agents.

Read →· 9 min read

tutorialMar 20, 2026

Build an MCP Knowledge Server with KnowledgeSDK

Step-by-step: build a Model Context Protocol server that gives Claude, Cursor, or any MCP client access to a live web knowledge base powered by KnowledgeSDK.

Read →· 10 min read

tutorialMar 20, 2026

Natural Language Web Extraction: Describe What You Want, Get JSON Back

Skip CSS selectors and XPath forever. Use natural language or JSON schema to extract structured data from any webpage with LLM-powered APIs.

Read →· 14 min read

tutorialMar 20, 2026

Screenshot to Structured Data: Extract Information from Visual Web Pages

Learn how to extract structured JSON from visual web pages using screenshots and vision LLMs. Full Node.js and Python code, plus benchmarks across 3 page types.

Read →· 14 min read

tutorialMar 20, 2026

SERP API vs Content Scraping API: Two Different Tools for AI Agents

SERP APIs return search result lists. Content scraping APIs return full page content. Learn when to use each and how to combine them for AI agent workflows.

Read →· 12 min read

tutorialMar 20, 2026

Sitemap Extraction: Crawl a Thousand Pages Without Getting Blocked

A practical tutorial for extracting and crawling all URLs from a website's sitemap — with rate limiting, error handling, and clean markdown output for AI applications.

Read →· 10 min read

tutorialMar 20, 2026

Building Stateful AI Agents With Live Web Knowledge

Stateless agents forget everything. Stateful agents with web knowledge are unstoppable. Here's how to build agents that persist context AND stay current with the web.

Read →· 8 min read

tutorialMar 20, 2026

Extract Structured Data from Any Website with a Single API Call

Learn how to extract structured JSON data from any website using KnowledgeSDK. No CSS selectors, no broken scrapers — just a schema and an API call.

Read →· 12 min read

tutorialMar 20, 2026

From URL to Searchable Knowledge in 60 Seconds (Full Tutorial)

The fastest way to turn any website into a searchable knowledge base: one API call to extract, one to search. No infrastructure, no embedding pipeline. Just results.

Read →· 6 min read

tutorialMar 20, 2026

Web Scraping in Node.js for AI Applications: 2026 Complete Guide

A developer guide to web scraping in Node.js for AI applications — from Axios/Cheerio basics to production-ready knowledge extraction APIs with TypeScript.

Read →· 12 min read

tutorialMar 20, 2026

Web Scraping with Python for LLMs: From BeautifulSoup to Knowledge APIs

A practical guide to web scraping in Python for LLM applications — from DIY with BeautifulSoup to production-ready knowledge extraction APIs.

Read →· 12 min read

tutorialMar 19, 2026

Scrape Documentation Sites for AI: Build a Living Knowledge Base

Learn how to scrape Stripe, GitHub, and other API docs to build a living knowledge base for AI agents. Handle multi-page docs, versioning, and auth.

Read →· 12 min read

tutorialMar 19, 2026

Python Web Scraping for AI: Complete KnowledgeSDK Tutorial (2026)

Learn to scrape URLs to clean markdown, build a semantic search index, and subscribe to webhooks using the KnowledgeSDK Python SDK with async support.

Read →· 13 min read

tutorialMar 19, 2026

How to Scrape Any Website to Markdown: JS Rendering, Anti-Bot & Pagination (2026)

A complete guide to scraping any website to clean markdown in 2026. Covers static pages, React SPAs, paginated content, and Cloudflare-protected sites with code examples.

Read →· 14 min read

tutorialMar 19, 2026

Web Scraping for RAG: Keep Your Knowledge Base Fresh (2026)

A complete tutorial for building a web-scraped RAG pipeline: from scraping competitor docs to semantic search and GPT-4o integration. Compare DIY vs knowledgeSDK approaches.

Read →· 15 min read