User-Agent Spoofing

Setting a custom HTTP User-Agent header to make a scraper appear as a real browser or specific client to the target server.

What Is User-Agent Spoofing?

User-Agent spoofing is the practice of setting the HTTP User-Agent request header to a value that mimics a real web browser or another legitimate client, rather than revealing that the request comes from an automated scraper. By presenting a realistic browser User-Agent string, a scraper can avoid the simplest form of bot detection — blocking requests from known or absent User-Agent strings.

What Is the User-Agent Header?

Every HTTP request includes a User-Agent header that identifies the client making the request. Real browsers send descriptive strings like:

# Chrome 120 on macOS
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36

# Firefox 121 on Windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0

# iPhone Safari
Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1

Default HTTP libraries often send values like python-requests/2.31.0 or axios/1.6.0 — immediately identifiable as non-browser clients, which many sites block outright.

How User-Agent Spoofing Works

The technique is simple: override the default User-Agent header in your HTTP client:

# Python requests
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/120.0.0.0 Safari/537.36'
}
response = requests.get('https://example.com', headers=headers)

// Node.js fetch
const response = await fetch('https://example.com', {
  headers: {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
  }
});

Beyond User-Agent: Full Header Spoofing

Modern anti-bot systems look at more than just User-Agent. A complete browser impersonation requires mimicking the full set of headers a real browser sends:

User-Agent: Mozilla/5.0 ...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1

Missing or mismatched Sec-Fetch-* headers are a common signal that a request did not originate from a real browser navigation.

Limitations of User-Agent Spoofing

User-Agent spoofing is a basic technique that bypasses only the most naive bot detection:

TLS fingerprinting — the TLS handshake signature of requests or axios differs from Chrome's, regardless of the User-Agent
Browser fingerprinting — headless browsers have detectable properties (missing plugins, specific GPU signatures, navigator.webdriver === true)
Behavioral analysis — real users move a mouse and scroll; HTTP clients do not
IP reputation — datacenter IPs are flagged regardless of the User-Agent string

User-Agent Rotation

To avoid patterns, scrapers often rotate through a pool of realistic User-Agent strings, changing the value per request or per session:

const agents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36...',
];
const ua = agents[Math.floor(Math.random() * agents.length)];

KnowledgeSDK's Approach

When you use KnowledgeSDK's POST /v1/scrape or POST /v1/extract, requests are made from a fully managed headless browser with authentic browser headers and TLS fingerprints — far more convincing than User-Agent string manipulation alone. This means you get consistent extraction results without managing header pools or worrying about fingerprint detection.

Related Terms

Web Scraping & Extractionintermediate

Anti-Bot Protection

Techniques websites use to detect and block automated scrapers, including CAPTCHAs, fingerprinting, and behavioral analysis.

Web Scraping & Extractionbeginner

Web Scraping

The automated extraction of data from websites by programmatically fetching and parsing HTML content.

Web Scraping & Extractionintermediate

Proxy Rotation

Automatically cycling through a pool of IP addresses when scraping to avoid rate limits and IP-based blocking.

← Triple Store Vector Database →

Try it now

Build with User-Agent Spoofing using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary