What Is User-Agent Spoofing?
User-Agent spoofing is the practice of setting the HTTP User-Agent request header to a value that mimics a real web browser or another legitimate client, rather than revealing that the request comes from an automated scraper. By presenting a realistic browser User-Agent string, a scraper can avoid the simplest form of bot detection — blocking requests from known or absent User-Agent strings.
What Is the User-Agent Header?
Every HTTP request includes a User-Agent header that identifies the client making the request. Real browsers send descriptive strings like:
# Chrome 120 on macOS
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
# Firefox 121 on Windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0
# iPhone Safari
Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1
Default HTTP libraries often send values like python-requests/2.31.0 or axios/1.6.0 — immediately identifiable as non-browser clients, which many sites block outright.
How User-Agent Spoofing Works
The technique is simple: override the default User-Agent header in your HTTP client:
# Python requests
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/120.0.0.0 Safari/537.36'
}
response = requests.get('https://example.com', headers=headers)
// Node.js fetch
const response = await fetch('https://example.com', {
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}
});
Beyond User-Agent: Full Header Spoofing
Modern anti-bot systems look at more than just User-Agent. A complete browser impersonation requires mimicking the full set of headers a real browser sends:
User-Agent: Mozilla/5.0 ...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Missing or mismatched Sec-Fetch-* headers are a common signal that a request did not originate from a real browser navigation.
Limitations of User-Agent Spoofing
User-Agent spoofing is a basic technique that bypasses only the most naive bot detection:
- TLS fingerprinting — the TLS handshake signature of
requestsoraxiosdiffers from Chrome's, regardless of the User-Agent - Browser fingerprinting — headless browsers have detectable properties (missing plugins, specific GPU signatures,
navigator.webdriver === true) - Behavioral analysis — real users move a mouse and scroll; HTTP clients do not
- IP reputation — datacenter IPs are flagged regardless of the User-Agent string
User-Agent Rotation
To avoid patterns, scrapers often rotate through a pool of realistic User-Agent strings, changing the value per request or per session:
const agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36...',
];
const ua = agents[Math.floor(Math.random() * agents.length)];
KnowledgeSDK's Approach
When you use KnowledgeSDK's POST /v1/scrape or POST /v1/extract, requests are made from a fully managed headless browser with authentic browser headers and TLS fingerprints — far more convincing than User-Agent string manipulation alone. This means you get consistent extraction results without managing header pools or worrying about fingerprint detection.