Building a Competitive Intelligence Agent: Monitoring Competitor Websites for Changes

Why Automated Competitive Intelligence

Manually checking competitor websites for pricing changes, new product launches, messaging shifts, and feature updates is tedious and unreliable. By the time someone notices a competitor dropped their price by 20%, you have already lost deals. An AI-powered competitive intelligence agent monitors target websites continuously, detects meaningful changes, classifies their significance, and alerts the right people immediately.

The key challenge is not scraping — it is separating signal from noise. Websites change constantly due to personalization, A/B tests, rotating banners, and footer updates. A good competitive intelligence agent understands which changes matter and which are irrelevant.

Data Model and Storage

Start with a data model that tracks monitored pages, content snapshots, detected changes, and alert rules.

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import sqlite3
import hashlib

@dataclass
class MonitoredPage:
    id: str
    competitor: str
    url: str
    page_type: str  # pricing, features, blog, careers
    check_interval_minutes: int = 60
    last_checked: Optional[datetime] = None
    content_hash: Optional[str] = None

@dataclass
class ContentSnapshot:
    page_id: str
    content: str
    content_hash: str
    captured_at: datetime
    metadata: dict = field(default_factory=dict)

@dataclass
class DetectedChange:
    page_id: str
    change_type: str  # pricing, feature, messaging, structural
    severity: str     # high, medium, low
    summary: str
    old_content: str
    new_content: str
    detected_at: datetime
    notified: bool = False

class IntelDatabase:
    def __init__(self, db_path: str = "competitive_intel.db"):
        self.conn = sqlite3.connect(db_path)
        self._init_tables()

    def _init_tables(self):
        self.conn.executescript("""
            CREATE TABLE IF NOT EXISTS monitored_pages (
                id TEXT PRIMARY KEY,
                competitor TEXT NOT NULL,
                url TEXT NOT NULL,
                page_type TEXT NOT NULL,
                check_interval_minutes INTEGER DEFAULT 60,
                last_checked TEXT,
                content_hash TEXT
            );
            CREATE TABLE IF NOT EXISTS snapshots (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                page_id TEXT NOT NULL,
                content TEXT NOT NULL,
                content_hash TEXT NOT NULL,
                captured_at TEXT NOT NULL,
                FOREIGN KEY (page_id) REFERENCES monitored_pages(id)
            );
            CREATE TABLE IF NOT EXISTS changes (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                page_id TEXT NOT NULL,
                change_type TEXT NOT NULL,
                severity TEXT NOT NULL,
                summary TEXT NOT NULL,
                old_content TEXT,
                new_content TEXT,
                detected_at TEXT NOT NULL,
                notified INTEGER DEFAULT 0,
                FOREIGN KEY (page_id) REFERENCES monitored_pages(id)
            );
            CREATE INDEX IF NOT EXISTS idx_changes_detected
                ON changes(detected_at);
            CREATE INDEX IF NOT EXISTS idx_snapshots_page
                ON snapshots(page_id, captured_at);
        """)

    def save_snapshot(self, snapshot: ContentSnapshot):
        self.conn.execute(
            "INSERT INTO snapshots "
            "(page_id, content, content_hash, captured_at) "
            "VALUES (?, ?, ?, ?)",
            (snapshot.page_id, snapshot.content,
             snapshot.content_hash,
             snapshot.captured_at.isoformat()),
        )
        self.conn.execute(
            "UPDATE monitored_pages SET content_hash = ?, "
            "last_checked = ? WHERE id = ?",
            (snapshot.content_hash,
             snapshot.captured_at.isoformat(),
             snapshot.page_id),
        )
        self.conn.commit()

    def get_previous_snapshot(self, page_id: str):
        row = self.conn.execute(
            "SELECT content, content_hash FROM snapshots "
            "WHERE page_id = ? ORDER BY captured_at DESC LIMIT 1",
            (page_id,),
        ).fetchone()
        return row

Content Fetching and Change Detection

The scraper fetches page content and compares it against the previous snapshot. Only meaningful text content is compared — scripts, styles, and boilerplate are stripped out.

import httpx
from bs4 import BeautifulSoup
import difflib

class ContentFetcher:
    def __init__(self):
        self.client = httpx.AsyncClient(
            timeout=30,
            headers={"User-Agent": (
                "Mozilla/5.0 (compatible; CompetitiveIntel/1.0)"
            )},
            follow_redirects=True,
        )

    async def fetch_page_content(self, url: str) -> str:
        """Fetch and extract meaningful text content."""
        resp = await self.client.get(url)
        resp.raise_for_status()
        return self._extract_text(resp.text)

    def _extract_text(self, html: str) -> str:
        """Strip HTML to meaningful text content."""
        soup = BeautifulSoup(html, "html.parser")

        # Remove scripts, styles, and nav/footer boilerplate
        for tag in soup(["script", "style", "nav", "footer",
                         "header", "noscript"]):
            tag.decompose()

        text = soup.get_text(separator="\n", strip=True)
        # Collapse multiple blank lines
        lines = [line.strip() for line in text.splitlines()]
        return "\n".join(line for line in lines if line)


class ChangeDetector:
    def __init__(self):
        self.fetcher = ContentFetcher()

    def compute_hash(self, content: str) -> str:
        return hashlib.sha256(content.encode()).hexdigest()

    def compute_diff(self, old_content: str,
                      new_content: str) -> dict:
        """Compute a structured diff between content versions."""
        old_lines = old_content.splitlines()
        new_lines = new_content.splitlines()

        differ = difflib.unified_diff(
            old_lines, new_lines, lineterm=""
        )
        diff_lines = list(differ)

        added = [l[1:] for l in diff_lines if l.startswith("+")
                 and not l.startswith("+++")]
        removed = [l[1:] for l in diff_lines if l.startswith("-")
                   and not l.startswith("---")]

        return {
            "added_lines": added,
            "removed_lines": removed,
            "total_changes": len(added) + len(removed),
            "diff_text": "\n".join(diff_lines),
        }

    def is_significant_change(self, diff: dict,
                                threshold: int = 3) -> bool:
        """Filter out minor changes like timestamp updates."""
        if diff["total_changes"] < threshold:
            return False
        # Filter noise: date-only changes, counter updates
        noise_patterns = [
            r"\d{4}-\d{2}-\d{2}",
            r"\d+ (views|visitors|users)",
            r"copyright \d{4}",
        ]
        import re
        meaningful_changes = 0
        for line in diff["added_lines"] + diff["removed_lines"]:
            is_noise = any(
                re.fullmatch(p, line.strip(), re.IGNORECASE)
                for p in noise_patterns
            )
            if not is_noise and len(line.strip()) > 10:
                meaningful_changes += 1

        return meaningful_changes >= threshold

AI-Powered Change Classification

When a meaningful change is detected, an LLM classifies its type, assesses its significance, and generates a human-readable summary.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from openai import AsyncOpenAI
import json

class ChangeClassifier:
    def __init__(self, client: AsyncOpenAI):
        self.client = client

    async def classify_change(self, competitor: str,
                                page_type: str,
                                diff: dict) -> DetectedChange:
        """Use LLM to classify and summarize a detected change."""
        added_text = "\n".join(diff["added_lines"][:50])
        removed_text = "\n".join(diff["removed_lines"][:50])

        response = await self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": (
                    "You are a competitive intelligence analyst. "
                    "Classify the website change and explain its "
                    "business significance. Return JSON with: "
                    "change_type (pricing/feature/messaging/"
                    "structural/hiring), severity (high/medium/low), "
                    "summary (2-3 sentence analysis)."
                )},
                {"role": "user", "content": (
                    f"Competitor: {competitor}\n"
                    f"Page type: {page_type}\n\n"
                    f"Content removed:\n{removed_text}\n\n"
                    f"Content added:\n{added_text}"
                )},
            ],
            response_format={"type": "json_object"},
            temperature=0,
        )

        result = json.loads(response.choices[0].message.content)

        return DetectedChange(
            page_id="",  # Set by caller
            change_type=result.get("change_type", "structural"),
            severity=result.get("severity", "medium"),
            summary=result.get("summary", "Change detected"),
            old_content="\n".join(diff["removed_lines"]),
            new_content="\n".join(diff["added_lines"]),
            detected_at=datetime.utcnow(),
        )

Alerting System

When high-severity changes are detected, the alert system notifies stakeholders through email, Slack, or other channels.

class AlertManager:
    def __init__(self, slack_webhook: Optional[str] = None):
        self.slack_webhook = slack_webhook
        self.http = httpx.AsyncClient()

    async def send_alert(self, change: DetectedChange,
                          page: MonitoredPage):
        """Send alert through configured channels."""
        message = self._format_message(change, page)

        if self.slack_webhook:
            await self._send_slack(message)

    def _format_message(self, change: DetectedChange,
                         page: MonitoredPage) -> str:
        severity_emoji = {
            "high": "[!]", "medium": "[*]", "low": "[-]"
        }
        marker = severity_emoji.get(change.severity, "[-]")
        return (
            f"{marker} Competitive Intelligence Alert\n"
            f"Competitor: {page.competitor}\n"
            f"Page: {page.url}\n"
            f"Type: {change.change_type}\n"
            f"Severity: {change.severity}\n"
            f"Summary: {change.summary}"
        )

    async def _send_slack(self, message: str):
        if not self.slack_webhook:
            return
        await self.http.post(
            self.slack_webhook,
            json={"text": message},
        )

Running the Monitor

The main loop orchestrates fetching, diffing, classifying, and alerting on a configurable schedule.

async def run_competitive_monitor(pages: list[MonitoredPage],
                                    slack_webhook: str = None):
    """Main competitive intelligence monitoring loop."""
    db = IntelDatabase()
    detector = ChangeDetector()
    classifier = ChangeClassifier(AsyncOpenAI())
    alerter = AlertManager(slack_webhook=slack_webhook)

    while True:
        for page in pages:
            try:
                content = await detector.fetcher.fetch_page_content(
                    page.url
                )
                new_hash = detector.compute_hash(content)

                if new_hash == page.content_hash:
                    continue  # No change

                previous = db.get_previous_snapshot(page.id)
                snapshot = ContentSnapshot(
                    page_id=page.id,
                    content=content,
                    content_hash=new_hash,
                    captured_at=datetime.utcnow(),
                )
                db.save_snapshot(snapshot)

                if previous is None:
                    continue  # First snapshot, nothing to diff

                diff = detector.compute_diff(previous[0], content)
                if not detector.is_significant_change(diff):
                    continue

                change = await classifier.classify_change(
                    page.competitor, page.page_type, diff
                )
                change.page_id = page.id

                if change.severity in ("high", "medium"):
                    await alerter.send_alert(change, page)

                page.content_hash = new_hash

            except Exception as e:
                print(f"Error monitoring {page.url}: {e}")

        await asyncio.sleep(300)  # Check every 5 minutes

FAQ

How often should I check competitor websites?

It depends on how time-sensitive the information is. For pricing pages, check every 1-4 hours. For feature pages and blog posts, daily checks are sufficient. For careers pages (which indicate hiring direction), weekly is fine. Respect robots.txt and avoid rates that could be flagged as abusive.

How do I handle competitors that use client-side rendering?

Use Playwright or Puppeteer to render JavaScript before extracting content. Many modern sites load pricing and feature information dynamically. Fetch with a headless browser, wait for network idle, then extract the visible text. This adds latency but ensures you capture the same content a human visitor would see.

How do I reduce false positives from A/B tests and personalization?

Request pages without cookies to get the default, non-personalized version. Make multiple requests over a short period and only flag a change if it appears consistently. For A/B tests, you may see content oscillate between two versions. Track content hashes over the last several checks and only alert on changes that persist for more than two consecutive checks.

#CompetitiveIntelligence #WebMonitoring #ChangeDetection #SentimentAnalysis #AIAgents #ContentDiffing #MarketIntelligence #BusinessAutomation

Building a Competitive Intelligence Agent: Monitoring Competitor Websites for Changes

Why Automated Competitive Intelligence

Data Model and Storage

Content Fetching and Change Detection

AI-Powered Change Classification

Alerting System

Running the Monitor

FAQ

How often should I check competitor websites?

How do I handle competitors that use client-side rendering?

How do I reduce false positives from A/B tests and personalization?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding