Building a Price Monitoring Agent: Automated Price Tracking Across E-Commerce Sites

Why Price Monitoring Needs AI

Traditional price scrapers rely on CSS selectors or XPath expressions to extract price values from product pages. This works until the site redesigns its layout, introduces dynamic pricing loaded via JavaScript, or renders prices inside images. AI-powered price monitoring agents solve these problems by using language models to interpret page content semantically rather than structurally.

A production price monitoring agent needs five capabilities: multi-site scraping with site-specific adapters, intelligent price extraction that handles edge cases, change detection with configurable thresholds, alerting through multiple channels, and historical price storage for trend analysis.

Core Data Model

Start with a clean data model that separates the concepts of products, price snapshots, and alert rules.

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import sqlite3
import json

@dataclass
class Product:
    id: str
    name: str
    url: str
    site: str
    current_price: Optional[float] = None
    currency: str = "USD"
    last_checked: Optional[datetime] = None

@dataclass
class PriceSnapshot:
    product_id: str
    price: float
    currency: str
    timestamp: datetime
    raw_text: str = ""

@dataclass
class AlertRule:
    product_id: str
    condition: str  # "drop_below", "drop_percent", "any_change"
    threshold: float = 0.0
    notify_channels: list[str] = field(
        default_factory=lambda: ["email"]
    )

class PriceDatabase:
    def __init__(self, db_path: str = "prices.db"):
        self.conn = sqlite3.connect(db_path)
        self._init_tables()

    def _init_tables(self):
        self.conn.executescript("""
            CREATE TABLE IF NOT EXISTS products (
                id TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                url TEXT NOT NULL,
                site TEXT NOT NULL,
                current_price REAL,
                currency TEXT DEFAULT 'USD',
                last_checked TEXT
            );
            CREATE TABLE IF NOT EXISTS price_snapshots (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                product_id TEXT NOT NULL,
                price REAL NOT NULL,
                currency TEXT NOT NULL,
                timestamp TEXT NOT NULL,
                raw_text TEXT,
                FOREIGN KEY (product_id) REFERENCES products(id)
            );
            CREATE INDEX IF NOT EXISTS idx_snapshots_product_time
                ON price_snapshots(product_id, timestamp);
        """)

    def record_price(self, snapshot: PriceSnapshot):
        self.conn.execute(
            "INSERT INTO price_snapshots "
            "(product_id, price, currency, timestamp, raw_text) "
            "VALUES (?, ?, ?, ?, ?)",
            (snapshot.product_id, snapshot.price, snapshot.currency,
             snapshot.timestamp.isoformat(), snapshot.raw_text),
        )
        self.conn.execute(
            "UPDATE products SET current_price = ?, last_checked = ? "
            "WHERE id = ?",
            (snapshot.price, snapshot.timestamp.isoformat(),
             snapshot.product_id),
        )
        self.conn.commit()

AI-Powered Price Extraction

The key differentiator of an AI-powered price monitor is its ability to extract prices from any page without hand-crafted selectors. The agent sends the page content to an LLM and asks it to identify the current selling price.

from openai import AsyncOpenAI
import re

class AIPriceExtractor:
    def __init__(self, client: AsyncOpenAI):
        self.client = client

    async def extract_price(self, page_text: str,
                             product_name: str) -> dict:
        """Extract price from page text using LLM."""
        response = await self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": (
                    "Extract the current selling price from the "
                    "product page text. Return JSON with keys: "
                    "price (float), currency (string), "
                    "original_text (the raw price string). "
                    "If there is a sale price, use the sale price."
                )},
                {"role": "user", "content": (
                    f"Product: {product_name}\n\n"
                    f"Page content:\n{page_text[:3000]}"
                )},
            ],
            response_format={"type": "json_object"},
            temperature=0,
        )

        result = json.loads(response.choices[0].message.content)
        return result

    def parse_price_fallback(self, text: str) -> Optional[float]:
        """Regex fallback when LLM is unavailable."""
        patterns = [
            r'$[d,]+.?d*',
            r'USDs*[d,]+.?d*',
            r'Price:s*[d,]+.?d*',
        ]
        for pattern in patterns:
            match = re.search(pattern, text)
            if match:
                price_str = re.sub(r'[^d.]', '', match.group())
                return float(price_str)
        return None

Multi-Site Scraping Engine

Each e-commerce site has different loading behavior, anti-bot measures, and page structures. The scraping engine uses Playwright for JavaScript-heavy sites and falls back to HTTP requests for static pages.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from playwright.async_api import async_playwright
import httpx

class PriceScraper:
    def __init__(self, extractor: AIPriceExtractor):
        self.extractor = extractor
        self.http_client = httpx.AsyncClient(
            timeout=30,
            headers={"User-Agent": (
                "Mozilla/5.0 (compatible; PriceMonitor/1.0)"
            )},
        )

    async def scrape_product(self, product: Product) -> PriceSnapshot:
        """Scrape price for a single product."""
        # Try simple HTTP first (faster, cheaper)
        try:
            page_text = await self._fetch_http(product.url)
            if self._looks_like_price_page(page_text):
                return await self._extract(product, page_text)
        except Exception:
            pass

        # Fall back to browser for JS-rendered pages
        page_text = await self._fetch_browser(product.url)
        return await self._extract(product, page_text)

    async def _fetch_http(self, url: str) -> str:
        resp = await self.http_client.get(url)
        resp.raise_for_status()
        return resp.text

    async def _fetch_browser(self, url: str) -> str:
        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=True)
            page = await browser.new_page()
            await page.goto(url, wait_until="networkidle")
            text = await page.inner_text("body")
            await browser.close()
            return text

    async def _extract(self, product: Product,
                        page_text: str) -> PriceSnapshot:
        result = await self.extractor.extract_price(
            page_text, product.name
        )
        return PriceSnapshot(
            product_id=product.id,
            price=result["price"],
            currency=result.get("currency", product.currency),
            timestamp=datetime.utcnow(),
            raw_text=result.get("original_text", ""),
        )

    def _looks_like_price_page(self, html: str) -> bool:
        """Quick check if HTTP response has price-like content."""
        return bool(re.search(r'[$€£]s*d', html))

Change Detection and Alerting

The change detection layer compares each new price snapshot against the previous one and evaluates alert rules to determine if a notification should be sent.

class ChangeDetector:
    def __init__(self, db: PriceDatabase):
        self.db = db

    def check_alerts(self, product: Product,
                     new_price: float,
                     rules: list[AlertRule]) -> list[dict]:
        """Evaluate alert rules against price change."""
        previous = product.current_price
        if previous is None:
            return []

        alerts = []
        for rule in rules:
            triggered = False

            if rule.condition == "any_change" and new_price != previous:
                triggered = True
            elif rule.condition == "drop_below":
                triggered = new_price < rule.threshold
            elif rule.condition == "drop_percent":
                pct_change = ((previous - new_price) / previous) * 100
                triggered = pct_change >= rule.threshold

            if triggered:
                alerts.append({
                    "product": product.name,
                    "old_price": previous,
                    "new_price": new_price,
                    "rule": rule.condition,
                    "channels": rule.notify_channels,
                })

        return alerts

Running the Monitor on a Schedule

Tie everything together with an async scheduler that runs price checks at configurable intervals.

import asyncio

async def run_price_monitor(products: list[Product],
                            rules: list[AlertRule],
                            interval_minutes: int = 60):
    """Main monitoring loop."""
    db = PriceDatabase()
    extractor = AIPriceExtractor(AsyncOpenAI())
    scraper = PriceScraper(extractor)
    detector = ChangeDetector(db)

    while True:
        for product in products:
            try:
                snapshot = await scraper.scrape_product(product)
                alerts = detector.check_alerts(
                    product, snapshot.price, rules
                )
                db.record_price(snapshot)

                for alert in alerts:
                    await send_notification(alert)

            except Exception as e:
                print(f"Error checking {product.name}: {e}")

        await asyncio.sleep(interval_minutes * 60)

FAQ

How do I avoid getting blocked by e-commerce sites?

Respect robots.txt directives, use reasonable request intervals (at least 30-60 seconds between requests to the same domain), rotate user agents, and consider using the site's official API or affiliate feeds when available. For production use, services like ScrapingBee or Browserless can handle anti-bot measures.

How accurate is LLM-based price extraction compared to CSS selectors?

LLM extraction is more robust across different sites but slightly less precise on well-structured pages. The best approach is a hybrid: maintain CSS selectors for your highest-volume sites and use LLM extraction as a fallback and for new sites. Test extraction accuracy regularly against a ground truth dataset.

How should I store historical price data at scale?

For small to medium volumes (thousands of products), SQLite or PostgreSQL with a time-indexed snapshots table works well. For larger volumes, consider a time-series database like TimescaleDB, which is PostgreSQL-compatible but optimized for time-series queries and data retention policies.

#PriceMonitoring #WebScraping #ECommerce #AIAgents #DataExtraction #PriceTracking #Playwright #Automation

Building a Price Monitoring Agent: Automated Price Tracking Across E-Commerce Sites

Why Price Monitoring Needs AI

Core Data Model

AI-Powered Price Extraction

Multi-Site Scraping Engine

Change Detection and Alerting

Running the Monitor on a Schedule

FAQ

How do I avoid getting blocked by e-commerce sites?

How accurate is LLM-based price extraction compared to CSS selectors?

How should I store historical price data at scale?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding