Price intelligence is one of the oldest use cases for web scraping, but it has never been harder to get right. Modern e-commerce sites — Shopify stores, Amazon, direct-to-consumer brands — render prices with JavaScript, hide them behind A/B tests, and protect their pages with sophisticated bot detection. A naive scraper that worked in 2022 breaks instantly today.
This guide covers how to build a production-ready price monitoring agent in 2026: one that handles JS-rendered prices, stores historical data in Postgres, fires webhooks on price drops, and respects the websites it scrapes.
What We're Building
By the end of this tutorial you'll have:
- A price extraction pipeline using KnowledgeSDK's scrape API
- A Postgres schema for storing price history
- A comparison engine that detects meaningful price changes
- A webhook system that alerts your application when prices drop
- Rate limiting and deduplication to keep the scraper sustainable
Why E-Commerce Prices Are Hard to Scrape
JavaScript rendering. Most product pages today load prices asynchronously. The initial HTML contains a placeholder; the actual price is injected by React or Vue after an API call completes. A curl request returns $-- instead of $29.99.
Dynamic pricing. Prices change based on your location, your login state, your browsing history, and the time of day. What you scrape at 9am may differ from the actual price a user sees at 9pm.
Anti-bot protection. Retailers use Cloudflare, PerimeterX, and DataDome to detect and block scrapers. IP-based blocking, browser fingerprinting, and CAPTCHA challenges are all in play.
DOM instability. Retailers A/B test their product page layouts constantly. A CSS selector that targets the price today may stop working after the next frontend deploy.
KnowledgeSDK handles the headless browser execution and anti-bot challenges, returning clean markdown from any product page. Your code only needs to parse the structured output.
Database Schema
Start with a Postgres schema to store products and price history:
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
url TEXT NOT NULL UNIQUE,
name TEXT,
retailer TEXT,
target_price DECIMAL(10, 2), -- alert threshold
check_interval_hours INTEGER DEFAULT 24,
active BOOLEAN DEFAULT true,
last_checked_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE price_history (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
product_id UUID REFERENCES products(id) ON DELETE CASCADE,
price DECIMAL(10, 2) NOT NULL,
currency CHAR(3) DEFAULT 'USD',
in_stock BOOLEAN,
raw_price_text TEXT, -- preserve the original scraped text
scraped_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_price_history_product_id ON price_history(product_id);
CREATE INDEX idx_price_history_scraped_at ON price_history(scraped_at DESC);
CREATE TABLE price_alerts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
product_id UUID REFERENCES products(id) ON DELETE CASCADE,
old_price DECIMAL(10, 2),
new_price DECIMAL(10, 2),
change_pct DECIMAL(5, 2),
alert_type TEXT CHECK (alert_type IN ('price_drop', 'price_increase', 'back_in_stock')),
notified_at TIMESTAMPTZ DEFAULT NOW()
);
The Price Extraction Pipeline
Node.js
import KnowledgeSDK from '@knowledgesdk/node';
import pg from 'pg';
const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
const db = new pg.Pool({ connectionString: process.env.DATABASE_URL });
// Extract price from scraped markdown using pattern matching
function parsePriceFromMarkdown(markdown, url) {
// Common price patterns
const patterns = [
/\$\s*([\d,]+\.?\d{0,2})/, // $29.99 or $1,299
/USD\s*([\d,]+\.?\d{0,2})/, // USD 29.99
/Price:\s*\$?([\d,]+\.?\d{0,2})/i, // Price: $29.99
/([\d,]+\.?\d{0,2})\s*USD/, // 29.99 USD
];
for (const pattern of patterns) {
const match = markdown.match(pattern);
if (match) {
const rawPrice = match[0];
const numericPrice = parseFloat(match[1].replace(/,/g, ''));
if (!isNaN(numericPrice) && numericPrice > 0 && numericPrice < 100000) {
return { price: numericPrice, rawText: rawPrice };
}
}
}
return null;
}
function detectStockStatus(markdown) {
const outOfStockPatterns = [/out of stock/i, /sold out/i, /unavailable/i, /currently unavailable/i];
const inStockPatterns = [/add to cart/i, /buy now/i, /in stock/i, /add to bag/i];
for (const pattern of outOfStockPatterns) {
if (pattern.test(markdown)) return false;
}
for (const pattern of inStockPatterns) {
if (pattern.test(markdown)) return true;
}
return null; // Unknown
}
async function checkProductPrice(product) {
console.log(`Checking price for: ${product.url}`);
let scrapeResult;
try {
scrapeResult = await client.scrape({ url: product.url });
} catch (err) {
console.error(`Scrape failed for ${product.url}:`, err.message);
return null;
}
const priceData = parsePriceFromMarkdown(scrapeResult.markdown, product.url);
const inStock = detectStockStatus(scrapeResult.markdown);
if (!priceData) {
console.warn(`Could not parse price from ${product.url}`);
return null;
}
// Store price in history
await db.query(
`INSERT INTO price_history (product_id, price, in_stock, raw_price_text)
VALUES ($1, $2, $3, $4)`,
[product.id, priceData.price, inStock, priceData.rawText]
);
// Update last checked timestamp and product name if available
await db.query(
`UPDATE products SET last_checked_at = NOW(), name = COALESCE($2, name)
WHERE id = $1`,
[product.id, scrapeResult.title ?? null]
);
return { price: priceData.price, inStock };
}
async function detectPriceChanges(product, currentPrice) {
// Get the previous price
const { rows } = await db.query(
`SELECT price, scraped_at FROM price_history
WHERE product_id = $1
ORDER BY scraped_at DESC
LIMIT 2`,
[product.id]
);
if (rows.length < 2) return null; // Not enough history
const previousPrice = parseFloat(rows[1].price);
const changePct = ((currentPrice - previousPrice) / previousPrice) * 100;
// Only alert on meaningful changes (>= 1%)
if (Math.abs(changePct) < 1) return null;
const alertType = changePct < 0 ? 'price_drop' : 'price_increase';
// Check if price dropped below target
const targetAlert = product.target_price && currentPrice <= product.target_price;
await db.query(
`INSERT INTO price_alerts (product_id, old_price, new_price, change_pct, alert_type)
VALUES ($1, $2, $3, $4, $5)`,
[product.id, previousPrice, currentPrice, changePct.toFixed(2), alertType]
);
return {
type: alertType,
oldPrice: previousPrice,
newPrice: currentPrice,
changePct: changePct.toFixed(1),
targetHit: targetAlert,
};
}
Python
import os
import re
import asyncio
from decimal import Decimal
from datetime import datetime
from typing import Optional, Dict, Any
import asyncpg
from knowledgesdk import KnowledgeSDK
client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])
def parse_price_from_markdown(markdown: str) -> Optional[Dict]:
patterns = [
r'\$\s*([\d,]+\.?\d{0,2})',
r'USD\s*([\d,]+\.?\d{0,2})',
r'Price:\s*\$?([\d,]+\.?\d{0,2})',
r'([\d,]+\.?\d{0,2})\s*USD',
]
for pattern in patterns:
match = re.search(pattern, markdown)
if match:
raw_price = match.group(0)
numeric_str = match.group(1).replace(",", "")
try:
price = Decimal(numeric_str)
if 0 < price < 100000:
return {"price": price, "raw_text": raw_price}
except Exception:
continue
return None
def detect_stock_status(markdown: str) -> Optional[bool]:
out_patterns = [r"out of stock", r"sold out", r"unavailable"]
in_patterns = [r"add to cart", r"buy now", r"in stock", r"add to bag"]
for pattern in out_patterns:
if re.search(pattern, markdown, re.IGNORECASE):
return False
for pattern in in_patterns:
if re.search(pattern, markdown, re.IGNORECASE):
return True
return None
async def check_product_price(db: asyncpg.Pool, product: Dict) -> Optional[Dict]:
print(f"Checking: {product['url']}")
try:
result = client.scrape(url=product["url"])
except Exception as e:
print(f"Scrape failed for {product['url']}: {e}")
return None
price_data = parse_price_from_markdown(result["markdown"])
in_stock = detect_stock_status(result["markdown"])
if not price_data:
print(f"Could not parse price from {product['url']}")
return None
await db.execute(
"""INSERT INTO price_history (product_id, price, in_stock, raw_price_text)
VALUES ($1, $2, $3, $4)""",
product["id"], price_data["price"], in_stock, price_data["raw_text"]
)
await db.execute(
"UPDATE products SET last_checked_at = NOW() WHERE id = $1",
product["id"]
)
return {"price": price_data["price"], "in_stock": in_stock}
The Monitoring Agent Loop
The agent runs on a schedule, checks due products, detects changes, and fires alerts:
async function runPriceMonitorAgent() {
console.log(`[${new Date().toISOString()}] Price monitor agent starting...`);
// Get products due for checking
const { rows: products } = await db.query(`
SELECT * FROM products
WHERE active = true
AND (
last_checked_at IS NULL
OR last_checked_at < NOW() - INTERVAL '1 hour' * check_interval_hours
)
ORDER BY last_checked_at ASC NULLS FIRST
LIMIT 50
`);
console.log(`Found ${products.length} products to check`);
// Process with controlled concurrency
const BATCH_SIZE = 3;
const alerts = [];
for (let i = 0; i < products.length; i += BATCH_SIZE) {
const batch = products.slice(i, i + BATCH_SIZE);
const batchResults = await Promise.allSettled(
batch.map(async (product) => {
const result = await checkProductPrice(product);
if (!result) return null;
const change = await detectPriceChanges(product, result.price);
if (change) {
alerts.push({ product, change });
}
return result;
})
);
// Respect rate limits between batches
if (i + BATCH_SIZE < products.length) {
await new Promise(resolve => setTimeout(resolve, 2000));
}
}
// Fire webhook alerts
for (const { product, change } of alerts) {
await fireWebhookAlert(product, change);
}
console.log(`Agent complete. ${alerts.length} price changes detected.`);
}
async function fireWebhookAlert(product, change) {
const payload = {
event: 'price.changed',
product: {
id: product.id,
url: product.url,
name: product.name,
},
change: {
type: change.type,
oldPrice: change.oldPrice,
newPrice: change.newPrice,
changePct: change.changePct,
targetHit: change.targetHit,
},
timestamp: new Date().toISOString(),
};
// Fire to your application webhook
if (process.env.ALERT_WEBHOOK_URL) {
await fetch(process.env.ALERT_WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
}
console.log(`Alert fired: ${change.type} for ${product.url} (${change.oldPrice} → ${change.newPrice})`);
}
// Run on schedule (or via cron job)
setInterval(runPriceMonitorAgent, 60 * 60 * 1000); // Every hour
runPriceMonitorAgent(); // Run immediately on startup
Handling Amazon and Major Retailer Edge Cases
Large retailers present specific challenges:
Amazon. Prices change dozens of times per day. Amazon aggressively blocks scrapers. Consider using official data sources (Amazon Product Advertising API) for Amazon products and reserving KnowledgeSDK for direct retailer and Shopify sites where official APIs don't exist.
Shopify stores. Most Shopify stores expose a products.json endpoint (e.g., store.com/products/product-handle.json) that returns structured product data including price and variants. Scrape this instead of the product page HTML when possible — it's more reliable and faster.
async function scrapeShopifyProduct(storeUrl, productHandle) {
const apiUrl = `${storeUrl}/products/${productHandle}.json`;
try {
const result = await client.scrape({ url: apiUrl });
const data = JSON.parse(result.markdown); // Shopify returns JSON
const variant = data.product.variants[0];
return {
price: parseFloat(variant.price),
compareAtPrice: parseFloat(variant.compare_at_price),
available: variant.available,
title: data.product.title,
};
} catch {
// Fall back to HTML scraping
return checkProductPrice({ url: `${storeUrl}/products/${productHandle}` });
}
}
Production Patterns
Respect robots.txt. Check robots.txt before adding a site to your monitor. If a site explicitly disallows crawlers in its robots.txt, respect that. Focus on sites that allow automated access.
Rate limiting. Use the check_interval_hours column to spread load. Don't check every product every hour — that's unnecessary and adds load to both your system and the target sites. Most price monitoring applications check once or twice per day.
Deduplication. Before inserting a new price record, check if the price actually changed. Store only meaningful data:
async function shouldStorePrice(productId, newPrice) {
const { rows } = await db.query(
`SELECT price FROM price_history WHERE product_id = $1 ORDER BY scraped_at DESC LIMIT 1`,
[productId]
);
if (rows.length === 0) return true; // First check
return Math.abs(parseFloat(rows[0].price) - newPrice) > 0.01; // Only store if changed
}
Error handling and retries. Network issues, temporary anti-bot challenges, and server errors will happen. Implement exponential backoff for retries and track error rates per product.
FAQ
Is price scraping legal? Scraping publicly visible prices is generally legal in most jurisdictions and has been affirmed by courts in the US (hiQ Labs v. LinkedIn, 2022). Always review the site's terms of service before scraping, and only collect data that is publicly accessible without authentication.
How do I handle prices in multiple currencies? Store the raw scraped currency alongside the numeric value. Use an exchange rate API to normalize prices to a base currency for comparison. KnowledgeSDK returns the raw text, so you can detect the currency symbol or code.
How accurate is price parsing from markdown?
High accuracy for standard price formats. Edge cases include prices shown as ranges ($19.99–$39.99), prices hidden in bundles, and membership prices. Always store raw_price_text so you can audit and refine your parsing logic.
What's a good check interval for price monitoring? For fast-moving prices (electronics, flights): every 4–6 hours. For stable products (books, furniture): once or twice per day. Higher frequency increases load and the chance of detection — match your interval to the price volatility of the product.
Can I monitor prices on sites that require a login? If you have a legitimate account and the terms of service allow it, you can pass session credentials in the request headers. Do not bypass authentication systems you are not authorized to use.
Price monitoring at scale is a solved problem — the tools exist, the patterns are proven, and the data value is clear. Set up your first product tracker in minutes at knowledgesdk.com/setup.