knowledgesdk.com/glossary/api-scraping
Web Scraping & Extractionintermediate

Also known as: private API scraping, XHR scraping

API Scraping

Extracting data by calling a website's internal or undocumented APIs rather than parsing its HTML.

What Is API Scraping?

API scraping is the practice of extracting data by calling a website's internal, private, or undocumented HTTP APIs — typically JSON endpoints — rather than parsing the visible HTML of the page. Modern web applications often load their data from a backend API and render it client-side with JavaScript. By intercepting or reverse-engineering those API calls, a scraper can retrieve clean, structured JSON directly, bypassing the need to parse HTML at all.

How It Differs from HTML Scraping

Approach Data Source Output Format Brittleness
HTML scraping Rendered DOM Raw HTML → parsed fields High (breaks on redesign)
API scraping Internal API endpoints JSON / structured data Medium (breaks on API changes)

API scraping produces cleaner, more structured data with less parsing effort. The trade-off is that the API endpoints are undocumented and may change or require authentication tokens.

How to Find Internal API Endpoints

The standard technique is to use browser developer tools to inspect network requests while browsing the target site:

  1. Open Chrome DevTools → Network tab
  2. Filter by Fetch/XHR to see only API requests
  3. Browse the site normally — search, scroll, click
  4. Identify JSON-returning requests that carry the data you need
  5. Copy the request URL, headers, and cookies

For example, a social media site's feed might be loaded by:

GET https://api.example.com/v2/feed?user_id=12345&limit=20
Authorization: Bearer eyJhbGc...

Common Patterns in API Scraping

  • Pagination tokens — APIs often return a next_cursor or page_token for paginated results; your scraper must follow these to collect all records
  • Authentication headers — many internal APIs require session cookies or bearer tokens obtained by simulating a login
  • Rate limiting — internal APIs have rate limits; honor them with delays or your token will be revoked
  • GraphQL endpoints — some sites use GraphQL; you can query exactly the fields you need
  • WebSocket streams — real-time data (prices, scores, feeds) may arrive over a WebSocket rather than REST

Example: Collecting Product Data via Internal API

// Intercepted endpoint from browser DevTools
const response = await fetch(
  'https://api.shop.example.com/products?category=electronics&page=1',
  {
    headers: {
      'Authorization': 'Bearer <session-token>',
      'x-client-version': '3.14.0',
    }
  }
);
const { products, next_page } = await response.json();

When to Use API Scraping vs. HTML Scraping

Use API scraping when:

  • The site loads data via clearly identifiable XHR/fetch requests
  • You need large volumes of records (faster than HTML parsing)
  • The page is a complex SPA where HTML scraping is unreliable

Use HTML scraping (or KnowledgeSDK's POST /v1/scrape / POST /v1/extract) when:

  • Content is server-rendered and lives directly in the HTML
  • No clear API endpoint is discoverable
  • You need the full rendered page including text, metadata, and structure

Legal and Ethical Considerations

Accessing undocumented private APIs without authorization may violate a site's terms of service and, in some jurisdictions, the Computer Fraud and Abuse Act (CFAA) or equivalent laws. Always review the site's terms before proceeding, and prefer official public APIs when available.

Related Terms

Web Scraping & Extractionbeginner
Web Scraping
The automated extraction of data from websites by programmatically fetching and parsing HTML content.
Web Scraping & Extractionintermediate
Structured Data Extraction
Pulling specific fields — prices, names, dates — from web pages into structured formats like JSON or CSV.
Web Scraping & Extractionintermediate
Headless Browser
A web browser that runs without a graphical user interface, used to render JavaScript-heavy pages for scraping.
API KeyApproximate Nearest Neighbor

Try it now

Build with API Scraping using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary