Price Monitoring with AI Agents: Scraping + Alerting Architecture

Build an AI-powered price monitoring system that tracks competitor pricing in real time and sends intelligent alerts — using web scraping APIs and webhooks.

Price Monitoring with AI Agents: Scraping + Alerting Architecture

Price monitoring is one of the highest-ROI applications of web scraping. E-commerce teams track competitor product prices to stay competitive. SaaS companies watch rivals' pricing pages for plan restructuring. Investors monitor commodity and subscription pricing as market signals. The information is public. The challenge is extracting it reliably and acting on it intelligently.

The traditional approach — a cron job that scrapes a URL, diffs the result, and sends an email — works until it doesn't. Pages get restructured. Anti-bot systems get smarter. You end up spending more time maintaining the scraper than using the data. In 2026, there's a better architecture: scraping APIs for reliable extraction, webhooks for change detection, and LLMs for intelligent interpretation of what changed.

This guide walks through that architecture end to end, with working code you can deploy today.

Architecture Overview

Competitor URLs
      │
      ▼
┌─────────────────┐
│  KnowledgeSDK   │  ← Handles JS rendering, anti-bot, extraction
│  Webhook Config │  ← Monitors URLs for changes on your schedule
└────────┬────────┘
         │ Change detected
         ▼
┌─────────────────┐
│  Your Webhook   │  ← Receives diff payload
│  Endpoint       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  OpenAI / LLM   │  ← Interprets what changed (price, tier, feature)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Alert System   │  ← Slack, email, PagerDuty
└─────────────────┘

The key insight: you don't need to poll. Webhooks invert the model — the system calls you when something changes, rather than you checking repeatedly.

What to Monitor

Not all pricing content is equally valuable to track. Prioritize:

Competitor pricing pages. The canonical source of truth for SaaS pricing. Changes here often signal strategic shifts — new tiers, price increases, repositioning. A competitor moving from per-seat to usage-based pricing is a significant signal.

Product listing pages (e-commerce). Product prices, availability, promotional badges, and shipping costs. Amazon changes prices millions of times per day; even tracking a curated subset of ASINs is valuable.

SaaS feature comparison pages. What's included in each plan matters as much as the price. An LLM can detect when a feature moved from Pro to Enterprise tier.

Landing page CTAs and pricing anchors. "Starting at" prices, "Most Popular" badge positioning, and trial offer terms often change before the full pricing page does.

The Extraction Challenge

Pricing pages are among the hardest pages to scrape reliably. Several factors compound:

JavaScript rendering. Most modern pricing pages load content dynamically. Server-side HTML often shows loading states. You need a headless browser that executes JavaScript before capturing content.

Anti-bot protection. High-value pages (especially e-commerce product pages) are aggressively protected by Cloudflare, DataDome, or Akamai. Naive scrapers get blocked within a few requests.

Dynamic pricing. Some platforms show different prices based on location, account type, or session history. You need consistent extraction conditions to produce comparable data over time.

KnowledgeSDK's extraction layer handles JS rendering and anti-bot bypass natively, returning clean markdown — which is exactly what you want to feed to an LLM for interpretation.

Setting Up Webhooks for Change Detection

KnowledgeSDK's webhook system monitors URLs on a schedule and calls your endpoint when content changes. You configure it once; the platform handles polling, diffing, and notification.

import KnowledgeSDK from '@knowledgesdk/node';

const ks = new KnowledgeSDK({ apiKey: 'knowledgesdk_live_...' });

// Register a URL for change monitoring
const webhook = await ks.webhooks.create({
  url: 'https://your-app.com/webhooks/price-change',
  events: ['content.changed'],
  monitors: [
    { url: 'https://competitor.com/pricing', schedule: 'every_hour' },
    { url: 'https://rival.com/pricing', schedule: 'every_6_hours' },
  ]
});

console.log('Webhook registered:', webhook.id);

When a monitored URL changes, KnowledgeSDK calls your endpoint with a payload containing the previous content, the new content, and a diff summary.

LLM-Powered Price Extraction from Markdown

Raw markdown from a pricing page is verbose. You want structured data: plan names, prices, key features. An LLM extracts this reliably from markdown in a way that regex-based parsers can't.

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function extractPricingData(markdown: string) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `Extract pricing information from the following webpage content.
Return a JSON object with: plans (array of {name, price_monthly, price_annual, features[], is_popular}),
currency, and any promotional offers mentioned.`
      },
      { role: 'user', content: markdown }
    ],
    response_format: { type: 'json_object' }
  });

  return JSON.parse(response.choices[0].message.content!);
}

Alert Logic: What Counts as a Significant Change?

Not every content change is worth alerting on. A competitor updating their blog post navigation or footer links isn't actionable. A 20% price increase on their top tier is.

Use the LLM to classify the significance of a change:

async function classifyPriceChange(before: string, after: string) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are analyzing changes to a competitor's pricing page.
Classify the change as: CRITICAL (price change, new/removed tier), 
SIGNIFICANT (feature moved between tiers, new add-on), 
MINOR (copy change, layout update), or NOISE (navigation/footer changes).
Also provide a one-sentence summary of what changed.`
      },
      {
        role: 'user',
        content: `BEFORE:\n${before}\n\nAFTER:\n${after}`
      }
    ],
    response_format: { type: 'json_object' }
  });

  return JSON.parse(response.choices[0].message.content!);
}

Only page CRITICAL and SIGNIFICANT changes to your on-call channel. Log MINOR changes. Discard NOISE.

Full Webhook Handler Example

import { NextRequest, NextResponse } from 'next/server';
import { WebClient } from '@slack/web-api';

const slack = new WebClient(process.env.SLACK_TOKEN);

export async function POST(req: NextRequest) {
  const payload = await req.json();

  // Verify webhook signature
  const signature = req.headers.get('x-knowledgesdk-signature');
  // ... verify signature against your webhook secret

  const { url, previousContent, newContent } = payload;

  // Classify the change with LLM
  const classification = await classifyPriceChange(previousContent, newContent);

  if (['CRITICAL', 'SIGNIFICANT'].includes(classification.severity)) {
    // Extract structured pricing from new content
    const pricing = await extractPricingData(newContent);

    await slack.chat.postMessage({
      channel: '#competitive-intel',
      text: `*Pricing change detected* on ${url}`,
      blocks: [
        {
          type: 'section',
          text: {
            type: 'mrkdwn',
            text: `*${classification.severity}*: ${classification.summary}\n*URL*: ${url}`
          }
        },
        {
          type: 'section',
          text: {
            type: 'mrkdwn',
            text: `*New pricing*:\n${pricing.plans.map((p: any) =>
              `• ${p.name}: $${p.price_monthly}/mo`
            ).join('\n')}`
          }
        }
      ]
    });
  }

  return NextResponse.json({ received: true });
}

Scaling the Architecture

For monitoring dozens of competitors across hundreds of URLs, a few architectural considerations:

Deduplicate alerts. Set a cooldown period (e.g., 24 hours) per URL to avoid alert fatigue when pages change frequently.

Store historical snapshots. Beyond alerting, maintain a time-series database of pricing snapshots. This lets you answer "when did Competitor X last raise prices?" and surface trends.

Use semantic search for trend analysis. Once you've accumulated historical extracted data in KnowledgeSDK's search index, you can query across time: "show me all pricing changes where a competitor added a usage-based tier."

KnowledgeSDK's POST /v1/search endpoint enables this — semantic search over your extracted content corpus, not just keyword matching.

Price monitoring done right is a continuous intelligence feed, not a one-time snapshot. The architecture above gives you a system that scales from 10 competitors to 10,000 monitored URLs, stays reliable in the face of anti-bot evolution, and surfaces only the signals that actually matter.

Try it now