knowledgesdk.com/blog/ecommerce-price-scraper-agent
use-caseMarch 19, 2026·13 min read

Build an E-Commerce Price Monitoring Agent (2026)

Build a production-grade e-commerce price monitoring agent: scrape JS-rendered prices, store history in Postgres, trigger webhooks on price drops.

Build an E-Commerce Price Monitoring Agent (2026)

Price intelligence is one of the oldest use cases for web scraping, but it has never been harder to get right. Modern e-commerce sites — Shopify stores, Amazon, direct-to-consumer brands — render prices with JavaScript, hide them behind A/B tests, and protect their pages with sophisticated bot detection. A naive scraper that worked in 2022 breaks instantly today.

This guide covers how to build a production-ready price monitoring agent in 2026: one that handles JS-rendered prices, stores historical data in Postgres, fires webhooks on price drops, and respects the websites it scrapes.

What We're Building

By the end of this tutorial you'll have:

  • A price extraction pipeline using KnowledgeSDK's scrape API
  • A Postgres schema for storing price history
  • A comparison engine that detects meaningful price changes
  • A webhook system that alerts your application when prices drop
  • Rate limiting and deduplication to keep the scraper sustainable

Why E-Commerce Prices Are Hard to Scrape

JavaScript rendering. Most product pages today load prices asynchronously. The initial HTML contains a placeholder; the actual price is injected by React or Vue after an API call completes. A curl request returns $-- instead of $29.99.

Dynamic pricing. Prices change based on your location, your login state, your browsing history, and the time of day. What you scrape at 9am may differ from the actual price a user sees at 9pm.

Anti-bot protection. Retailers use Cloudflare, PerimeterX, and DataDome to detect and block scrapers. IP-based blocking, browser fingerprinting, and CAPTCHA challenges are all in play.

DOM instability. Retailers A/B test their product page layouts constantly. A CSS selector that targets the price today may stop working after the next frontend deploy.

KnowledgeSDK handles the headless browser execution and anti-bot challenges, returning clean markdown from any product page. Your code only needs to parse the structured output.

Database Schema

Start with a Postgres schema to store products and price history:

CREATE TABLE products (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  url TEXT NOT NULL UNIQUE,
  name TEXT,
  retailer TEXT,
  target_price DECIMAL(10, 2), -- alert threshold
  check_interval_hours INTEGER DEFAULT 24,
  active BOOLEAN DEFAULT true,
  last_checked_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE price_history (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  product_id UUID REFERENCES products(id) ON DELETE CASCADE,
  price DECIMAL(10, 2) NOT NULL,
  currency CHAR(3) DEFAULT 'USD',
  in_stock BOOLEAN,
  raw_price_text TEXT, -- preserve the original scraped text
  scraped_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_price_history_product_id ON price_history(product_id);
CREATE INDEX idx_price_history_scraped_at ON price_history(scraped_at DESC);

CREATE TABLE price_alerts (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  product_id UUID REFERENCES products(id) ON DELETE CASCADE,
  old_price DECIMAL(10, 2),
  new_price DECIMAL(10, 2),
  change_pct DECIMAL(5, 2),
  alert_type TEXT CHECK (alert_type IN ('price_drop', 'price_increase', 'back_in_stock')),
  notified_at TIMESTAMPTZ DEFAULT NOW()
);

The Price Extraction Pipeline

Node.js

import KnowledgeSDK from '@knowledgesdk/node';
import pg from 'pg';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
const db = new pg.Pool({ connectionString: process.env.DATABASE_URL });

// Extract price from scraped markdown using pattern matching
function parsePriceFromMarkdown(markdown, url) {
  // Common price patterns
  const patterns = [
    /\$\s*([\d,]+\.?\d{0,2})/,          // $29.99 or $1,299
    /USD\s*([\d,]+\.?\d{0,2})/,          // USD 29.99
    /Price:\s*\$?([\d,]+\.?\d{0,2})/i,   // Price: $29.99
    /([\d,]+\.?\d{0,2})\s*USD/,          // 29.99 USD
  ];

  for (const pattern of patterns) {
    const match = markdown.match(pattern);
    if (match) {
      const rawPrice = match[0];
      const numericPrice = parseFloat(match[1].replace(/,/g, ''));
      if (!isNaN(numericPrice) && numericPrice > 0 && numericPrice < 100000) {
        return { price: numericPrice, rawText: rawPrice };
      }
    }
  }

  return null;
}

function detectStockStatus(markdown) {
  const outOfStockPatterns = [/out of stock/i, /sold out/i, /unavailable/i, /currently unavailable/i];
  const inStockPatterns = [/add to cart/i, /buy now/i, /in stock/i, /add to bag/i];

  for (const pattern of outOfStockPatterns) {
    if (pattern.test(markdown)) return false;
  }

  for (const pattern of inStockPatterns) {
    if (pattern.test(markdown)) return true;
  }

  return null; // Unknown
}

async function checkProductPrice(product) {
  console.log(`Checking price for: ${product.url}`);

  let scrapeResult;
  try {
    scrapeResult = await client.scrape({ url: product.url });
  } catch (err) {
    console.error(`Scrape failed for ${product.url}:`, err.message);
    return null;
  }

  const priceData = parsePriceFromMarkdown(scrapeResult.markdown, product.url);
  const inStock = detectStockStatus(scrapeResult.markdown);

  if (!priceData) {
    console.warn(`Could not parse price from ${product.url}`);
    return null;
  }

  // Store price in history
  await db.query(
    `INSERT INTO price_history (product_id, price, in_stock, raw_price_text)
     VALUES ($1, $2, $3, $4)`,
    [product.id, priceData.price, inStock, priceData.rawText]
  );

  // Update last checked timestamp and product name if available
  await db.query(
    `UPDATE products SET last_checked_at = NOW(), name = COALESCE($2, name)
     WHERE id = $1`,
    [product.id, scrapeResult.title ?? null]
  );

  return { price: priceData.price, inStock };
}

async function detectPriceChanges(product, currentPrice) {
  // Get the previous price
  const { rows } = await db.query(
    `SELECT price, scraped_at FROM price_history
     WHERE product_id = $1
     ORDER BY scraped_at DESC
     LIMIT 2`,
    [product.id]
  );

  if (rows.length < 2) return null; // Not enough history

  const previousPrice = parseFloat(rows[1].price);
  const changePct = ((currentPrice - previousPrice) / previousPrice) * 100;

  // Only alert on meaningful changes (>= 1%)
  if (Math.abs(changePct) < 1) return null;

  const alertType = changePct < 0 ? 'price_drop' : 'price_increase';

  // Check if price dropped below target
  const targetAlert = product.target_price && currentPrice <= product.target_price;

  await db.query(
    `INSERT INTO price_alerts (product_id, old_price, new_price, change_pct, alert_type)
     VALUES ($1, $2, $3, $4, $5)`,
    [product.id, previousPrice, currentPrice, changePct.toFixed(2), alertType]
  );

  return {
    type: alertType,
    oldPrice: previousPrice,
    newPrice: currentPrice,
    changePct: changePct.toFixed(1),
    targetHit: targetAlert,
  };
}

Python

import os
import re
import asyncio
from decimal import Decimal
from datetime import datetime
from typing import Optional, Dict, Any
import asyncpg
from knowledgesdk import KnowledgeSDK

client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])

def parse_price_from_markdown(markdown: str) -> Optional[Dict]:
    patterns = [
        r'\$\s*([\d,]+\.?\d{0,2})',
        r'USD\s*([\d,]+\.?\d{0,2})',
        r'Price:\s*\$?([\d,]+\.?\d{0,2})',
        r'([\d,]+\.?\d{0,2})\s*USD',
    ]

    for pattern in patterns:
        match = re.search(pattern, markdown)
        if match:
            raw_price = match.group(0)
            numeric_str = match.group(1).replace(",", "")
            try:
                price = Decimal(numeric_str)
                if 0 < price < 100000:
                    return {"price": price, "raw_text": raw_price}
            except Exception:
                continue

    return None

def detect_stock_status(markdown: str) -> Optional[bool]:
    out_patterns = [r"out of stock", r"sold out", r"unavailable"]
    in_patterns = [r"add to cart", r"buy now", r"in stock", r"add to bag"]

    for pattern in out_patterns:
        if re.search(pattern, markdown, re.IGNORECASE):
            return False
    for pattern in in_patterns:
        if re.search(pattern, markdown, re.IGNORECASE):
            return True
    return None

async def check_product_price(db: asyncpg.Pool, product: Dict) -> Optional[Dict]:
    print(f"Checking: {product['url']}")

    try:
        result = client.scrape(url=product["url"])
    except Exception as e:
        print(f"Scrape failed for {product['url']}: {e}")
        return None

    price_data = parse_price_from_markdown(result["markdown"])
    in_stock = detect_stock_status(result["markdown"])

    if not price_data:
        print(f"Could not parse price from {product['url']}")
        return None

    await db.execute(
        """INSERT INTO price_history (product_id, price, in_stock, raw_price_text)
           VALUES ($1, $2, $3, $4)""",
        product["id"], price_data["price"], in_stock, price_data["raw_text"]
    )

    await db.execute(
        "UPDATE products SET last_checked_at = NOW() WHERE id = $1",
        product["id"]
    )

    return {"price": price_data["price"], "in_stock": in_stock}

The Monitoring Agent Loop

The agent runs on a schedule, checks due products, detects changes, and fires alerts:

async function runPriceMonitorAgent() {
  console.log(`[${new Date().toISOString()}] Price monitor agent starting...`);

  // Get products due for checking
  const { rows: products } = await db.query(`
    SELECT * FROM products
    WHERE active = true
    AND (
      last_checked_at IS NULL
      OR last_checked_at < NOW() - INTERVAL '1 hour' * check_interval_hours
    )
    ORDER BY last_checked_at ASC NULLS FIRST
    LIMIT 50
  `);

  console.log(`Found ${products.length} products to check`);

  // Process with controlled concurrency
  const BATCH_SIZE = 3;
  const alerts = [];

  for (let i = 0; i < products.length; i += BATCH_SIZE) {
    const batch = products.slice(i, i + BATCH_SIZE);

    const batchResults = await Promise.allSettled(
      batch.map(async (product) => {
        const result = await checkProductPrice(product);
        if (!result) return null;

        const change = await detectPriceChanges(product, result.price);
        if (change) {
          alerts.push({ product, change });
        }
        return result;
      })
    );

    // Respect rate limits between batches
    if (i + BATCH_SIZE < products.length) {
      await new Promise(resolve => setTimeout(resolve, 2000));
    }
  }

  // Fire webhook alerts
  for (const { product, change } of alerts) {
    await fireWebhookAlert(product, change);
  }

  console.log(`Agent complete. ${alerts.length} price changes detected.`);
}

async function fireWebhookAlert(product, change) {
  const payload = {
    event: 'price.changed',
    product: {
      id: product.id,
      url: product.url,
      name: product.name,
    },
    change: {
      type: change.type,
      oldPrice: change.oldPrice,
      newPrice: change.newPrice,
      changePct: change.changePct,
      targetHit: change.targetHit,
    },
    timestamp: new Date().toISOString(),
  };

  // Fire to your application webhook
  if (process.env.ALERT_WEBHOOK_URL) {
    await fetch(process.env.ALERT_WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(payload),
    });
  }

  console.log(`Alert fired: ${change.type} for ${product.url} (${change.oldPrice} → ${change.newPrice})`);
}

// Run on schedule (or via cron job)
setInterval(runPriceMonitorAgent, 60 * 60 * 1000); // Every hour
runPriceMonitorAgent(); // Run immediately on startup

Handling Amazon and Major Retailer Edge Cases

Large retailers present specific challenges:

Amazon. Prices change dozens of times per day. Amazon aggressively blocks scrapers. Consider using official data sources (Amazon Product Advertising API) for Amazon products and reserving KnowledgeSDK for direct retailer and Shopify sites where official APIs don't exist.

Shopify stores. Most Shopify stores expose a products.json endpoint (e.g., store.com/products/product-handle.json) that returns structured product data including price and variants. Scrape this instead of the product page HTML when possible — it's more reliable and faster.

async function scrapeShopifyProduct(storeUrl, productHandle) {
  const apiUrl = `${storeUrl}/products/${productHandle}.json`;

  try {
    const result = await client.scrape({ url: apiUrl });
    const data = JSON.parse(result.markdown); // Shopify returns JSON
    const variant = data.product.variants[0];

    return {
      price: parseFloat(variant.price),
      compareAtPrice: parseFloat(variant.compare_at_price),
      available: variant.available,
      title: data.product.title,
    };
  } catch {
    // Fall back to HTML scraping
    return checkProductPrice({ url: `${storeUrl}/products/${productHandle}` });
  }
}

Production Patterns

Respect robots.txt. Check robots.txt before adding a site to your monitor. If a site explicitly disallows crawlers in its robots.txt, respect that. Focus on sites that allow automated access.

Rate limiting. Use the check_interval_hours column to spread load. Don't check every product every hour — that's unnecessary and adds load to both your system and the target sites. Most price monitoring applications check once or twice per day.

Deduplication. Before inserting a new price record, check if the price actually changed. Store only meaningful data:

async function shouldStorePrice(productId, newPrice) {
  const { rows } = await db.query(
    `SELECT price FROM price_history WHERE product_id = $1 ORDER BY scraped_at DESC LIMIT 1`,
    [productId]
  );

  if (rows.length === 0) return true; // First check
  return Math.abs(parseFloat(rows[0].price) - newPrice) > 0.01; // Only store if changed
}

Error handling and retries. Network issues, temporary anti-bot challenges, and server errors will happen. Implement exponential backoff for retries and track error rates per product.

FAQ

Is price scraping legal? Scraping publicly visible prices is generally legal in most jurisdictions and has been affirmed by courts in the US (hiQ Labs v. LinkedIn, 2022). Always review the site's terms of service before scraping, and only collect data that is publicly accessible without authentication.

How do I handle prices in multiple currencies? Store the raw scraped currency alongside the numeric value. Use an exchange rate API to normalize prices to a base currency for comparison. KnowledgeSDK returns the raw text, so you can detect the currency symbol or code.

How accurate is price parsing from markdown? High accuracy for standard price formats. Edge cases include prices shown as ranges ($19.99–$39.99), prices hidden in bundles, and membership prices. Always store raw_price_text so you can audit and refine your parsing logic.

What's a good check interval for price monitoring? For fast-moving prices (electronics, flights): every 4–6 hours. For stable products (books, furniture): once or twice per day. Higher frequency increases load and the chance of detection — match your interval to the price volatility of the product.

Can I monitor prices on sites that require a login? If you have a legitimate account and the terms of service allow it, you can pass session credentials in the request headers. Do not bypass authentication systems you are not authorized to use.


Price monitoring at scale is a solved problem — the tools exist, the patterns are proven, and the data value is clear. Set up your first product tracker in minutes at knowledgesdk.com/setup.

Try it now

Scrape, search, and monitor any website with one API.

Get your API key in 30 seconds. First 1,000 requests free.

GET API KEY →
← Back to blog