Automated Competitive Intelligence: Build a Scraper That Never Sleeps

A practical guide to building an automated competitive intelligence pipeline — scraping competitor websites, extracting pricing and product changes, and getting alerted instantly.

Automated Competitive Intelligence: Build a Scraper That Never Sleeps

Manual competitive monitoring is a losing strategy. By the time a human analyst notices a competitor changed their pricing, the competitor has already run an A/B test, iterated on positioning, and moved on to the next experiment. Companies that win at competitive intelligence do it with automation — systems that watch competitors continuously and surface changes the moment they happen.

Web scraping is the core technology behind this. Your competitors publish their strategy on their own websites: pricing pages, product descriptions, job listings, press releases, case studies. They can't hide what they're publicly announcing. The question is whether you're reading it manually once a month or automatically within minutes of publication.

This guide walks through building a real competitive intelligence pipeline — one that crawls competitor sites, detects changes, and delivers alerts before your team would have noticed anything manually.

What Makes Competitive Intelligence Worth Automating

Before building, it's worth being specific about what you're monitoring and why. The highest-value signals from competitor websites are:

Pricing changes: Competitors rarely announce price changes via press release. They just update the pricing page. Automated monitoring catches this within hours. If a competitor drops prices, you need to know before your sales team loses a deal to a price objection they didn't see coming.

Product and feature updates: New feature launches often appear on product pages, changelog pages, or in updated documentation before any official announcement. Early detection gives you time to prepare positioning responses.

Job listings: Hiring patterns reveal strategic intent. A competitor suddenly posting 10 ML engineer roles signals investment in AI. A sudden wave of sales hires in a new geography signals expansion. Job boards update daily.

Press releases and news: Corporate newsrooms are a goldmine. Partnerships, customer wins, funding announcements — all of this appears on competitor domains before it's picked up by media.

Case studies and testimonials: New customer case studies reveal which verticals competitors are winning in, what problems they're solving, and what ROI claims they're making.

System Architecture

A production competitive intelligence system has four components:

Discovery: Map all relevant URLs on each competitor's site
Extraction: Pull structured content from each page
Change detection: Compare current content against stored snapshots
Alerting: Deliver changes to the right people in the right format

Scheduler (cron/Inngest)
    → KnowledgeSDK /v1/sitemap (discover URLs)
    → KnowledgeSDK /v1/extract (extract content)
    → Diff engine (compare against stored snapshots)
    → Webhook handler (receive change notifications)
    → Alert delivery (Slack, email, PagerDuty)

KnowledgeSDK's webhook system handles the change detection layer — you register a URL and get called back when content changes, rather than polling on a schedule yourself.

Step 1: Discover Competitor URLs

Start by mapping the structure of each competitor's site. KnowledgeSDK's sitemap endpoint returns all discoverable URLs:

import KnowledgeSDK from '@knowledgesdk/node';

const ks = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY! });

async function discoverCompetitorPages(domain: string) {
  const sitemap = await ks.sitemap({ url: `https://${domain}` });

  // Filter to high-value page types
  const priorityPages = sitemap.urls.filter(url => {
    const path = new URL(url).pathname.toLowerCase();
    return (
      path.includes('/pricing') ||
      path.includes('/product') ||
      path.includes('/features') ||
      path.includes('/customers') ||
      path.includes('/case-studies') ||
      path.includes('/news') ||
      path.includes('/blog')
    );
  });

  return priorityPages;
}

// Discover pages for multiple competitors
const competitors = ['competitor-a.com', 'competitor-b.com', 'competitor-c.com'];

for (const domain of competitors) {
  const pages = await discoverCompetitorPages(domain);
  console.log(`Found ${pages.length} priority pages on ${domain}`);
  await storeUrlsForMonitoring(domain, pages);
}

Step 2: Register Webhooks for Change Detection

Instead of polling competitor pages on a schedule and diffing yourself, register KnowledgeSDK webhooks to be notified when content changes:

// Register a webhook for a competitor's pricing page
async function watchPricingPage(competitorUrl: string, callbackUrl: string) {
  const webhook = await ks.webhooks.create({
    url: competitorUrl,
    callbackUrl,
    events: ['content.changed'],
    checkInterval: 'hourly', // check every hour
  });

  console.log(`Watching ${competitorUrl} — webhook ID: ${webhook.id}`);
  return webhook;
}

// Watch all discovered priority pages
const priorityPages = await discoverCompetitorPages('competitor.com');

for (const pageUrl of priorityPages) {
  await watchPricingPage(pageUrl, 'https://yourapp.com/webhooks/competitor-change');
}

Step 3: Handle Change Notifications

When KnowledgeSDK detects a content change, it calls your webhook with the before and after content:

// Express/Next.js webhook handler
export async function POST(req: Request) {
  const payload = await req.json();

  const { url, event, content, previousContent, changedAt } = payload;

  if (event !== 'content.changed') return new Response('ok');

  // Generate a diff summary using an LLM
  const summary = await generateChangeSummary({
    url,
    before: previousContent,
    after: content,
  });

  // Route to appropriate alert channel based on page type
  const path = new URL(url).pathname;

  if (path.includes('/pricing')) {
    await sendSlackAlert({
      channel: '#competitive-intel',
      message: `Pricing change detected at ${url}`,
      summary,
      urgency: 'high',
    });
  } else if (path.includes('/blog') || path.includes('/news')) {
    await sendSlackAlert({
      channel: '#market-intel',
      message: `New content published at ${url}`,
      summary,
      urgency: 'normal',
    });
  }

  // Store snapshot for trend analysis
  await storeSnapshot({ url, content, changedAt });

  return new Response('ok');
}

async function generateChangeSummary({ url, before, after }: {
  url: string;
  before: string;
  after: string;
}) {
  // Use your LLM to summarize what changed
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{
      role: 'user',
      content: `Summarize the key changes between these two versions of ${url}. Focus on pricing, feature changes, and strategic positioning.

BEFORE:
${before.slice(0, 2000)}

AFTER:
${after.slice(0, 2000)}`,
    }],
  });

  return response.choices[0].message.content;
}

Step 4: Build a Search Layer for Historical Analysis

Beyond alerting on individual changes, storing extracted content in a searchable index lets you answer strategic questions across your competitive dataset:

// After extracting competitor content, index it for search
async function indexCompetitorContent(domain: string) {
  const result = await ks.extract({
    url: `https://${domain}`,
    crawlSubpages: true,
  });

  // Content is automatically indexed for semantic search
  return result;
}

// Answer strategic questions across all competitor data
const results = await ks.search({
  query: 'What security certifications do our competitors claim?',
  limit: 10,
});

// Or: 'What are competitors charging for enterprise plans?'
// Or: 'Which competitors are targeting healthcare customers?'

What to Monitor and How Often

Page Type	Check Frequency	Alert Priority	What to Look For
Pricing pages	Every hour	Critical	Price changes, new plans, removed tiers
Product/feature pages	Every 4 hours	High	New features, changed descriptions
Homepage	Every 4 hours	Medium	Positioning changes, new messaging
Job listings	Daily	Medium	Hiring signals, new role types
Blog / news	Daily	Low	Strategic announcements, thought leadership
Case studies	Weekly	Low	New customer verticals, ROI claims

Integrating Alerts with Slack

A Slack integration turns raw data into actionable intelligence for your team:

async function sendSlackAlert({
  channel,
  message,
  summary,
  urgency,
}: {
  channel: string;
  message: string;
  summary: string;
  urgency: 'high' | 'normal';
}) {
  await fetch('https://hooks.slack.com/services/YOUR/WEBHOOK/URL', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      channel,
      blocks: [
        {
          type: 'header',
          text: {
            type: 'plain_text',
            text: urgency === 'high' ? `🚨 ${message}` : `📊 ${message}`,
          },
        },
        {
          type: 'section',
          text: { type: 'mrkdwn', text: summary },
        },
      ],
    }),
  });
}

Competitive Intelligence vs. Data Theft

A note on ethics and legality: automated competitive monitoring of publicly accessible websites is a long-established business practice and is generally legal. You're reading what your competitors chose to make public.

The lines that matter:

Monitor public pages, not authenticated or gated content
Respect robots.txt — don't crawl paths explicitly disallowed
Don't reverse-engineer APIs or circumvent technical access controls
Don't republish scraped content verbatim; use it for internal analysis

KnowledgeSDK's extraction follows responsible crawling practices by default. The goal is competitive awareness, not copyright infringement.

The ROI Calculation

A team doing manual competitive monitoring spends 2-4 hours per week per analyst. For a team of 3 analysts, that's 6-12 hours per week — and they're still missing changes between review cycles.

Automated monitoring with KnowledgeSDK costs $29/month on the Starter plan and runs continuously. The first time it catches a competitor pricing change before your sales team loses a deal, it's paid for years of subscription.

The scraper that never sleeps doesn't just save time — it catches things that manual monitoring would never catch at all.

Try it now