Building a Price Monitoring Agent: Automated Price Tracking Across E-Commerce Sites
Build a production-grade price monitoring agent that scrapes multiple e-commerce sites, extracts prices with AI, detects changes, sends alerts, and maintains a historical price database for trend analysis.
Why Price Monitoring Needs AI
Traditional price scrapers rely on CSS selectors or XPath expressions to extract price values from product pages. This works until the site redesigns its layout, introduces dynamic pricing loaded via JavaScript, or renders prices inside images. AI-powered price monitoring agents solve these problems by using language models to interpret page content semantically rather than structurally.
A production price monitoring agent needs five capabilities: multi-site scraping with site-specific adapters, intelligent price extraction that handles edge cases, change detection with configurable thresholds, alerting through multiple channels, and historical price storage for trend analysis.
Core Data Model
Start with a clean data model that separates the concepts of products, price snapshots, and alert rules.
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import sqlite3
import json
@dataclass
class Product:
id: str
name: str
url: str
site: str
current_price: Optional[float] = None
currency: str = "USD"
last_checked: Optional[datetime] = None
@dataclass
class PriceSnapshot:
product_id: str
price: float
currency: str
timestamp: datetime
raw_text: str = ""
@dataclass
class AlertRule:
product_id: str
condition: str # "drop_below", "drop_percent", "any_change"
threshold: float = 0.0
notify_channels: list[str] = field(
default_factory=lambda: ["email"]
)
class PriceDatabase:
def __init__(self, db_path: str = "prices.db"):
self.conn = sqlite3.connect(db_path)
self._init_tables()
def _init_tables(self):
self.conn.executescript("""
CREATE TABLE IF NOT EXISTS products (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
url TEXT NOT NULL,
site TEXT NOT NULL,
current_price REAL,
currency TEXT DEFAULT 'USD',
last_checked TEXT
);
CREATE TABLE IF NOT EXISTS price_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
product_id TEXT NOT NULL,
price REAL NOT NULL,
currency TEXT NOT NULL,
timestamp TEXT NOT NULL,
raw_text TEXT,
FOREIGN KEY (product_id) REFERENCES products(id)
);
CREATE INDEX IF NOT EXISTS idx_snapshots_product_time
ON price_snapshots(product_id, timestamp);
""")
def record_price(self, snapshot: PriceSnapshot):
self.conn.execute(
"INSERT INTO price_snapshots "
"(product_id, price, currency, timestamp, raw_text) "
"VALUES (?, ?, ?, ?, ?)",
(snapshot.product_id, snapshot.price, snapshot.currency,
snapshot.timestamp.isoformat(), snapshot.raw_text),
)
self.conn.execute(
"UPDATE products SET current_price = ?, last_checked = ? "
"WHERE id = ?",
(snapshot.price, snapshot.timestamp.isoformat(),
snapshot.product_id),
)
self.conn.commit()
AI-Powered Price Extraction
The key differentiator of an AI-powered price monitor is its ability to extract prices from any page without hand-crafted selectors. The agent sends the page content to an LLM and asks it to identify the current selling price.
from openai import AsyncOpenAI
import re
class AIPriceExtractor:
def __init__(self, client: AsyncOpenAI):
self.client = client
async def extract_price(self, page_text: str,
product_name: str) -> dict:
"""Extract price from page text using LLM."""
response = await self.client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": (
"Extract the current selling price from the "
"product page text. Return JSON with keys: "
"price (float), currency (string), "
"original_text (the raw price string). "
"If there is a sale price, use the sale price."
)},
{"role": "user", "content": (
f"Product: {product_name}\n\n"
f"Page content:\n{page_text[:3000]}"
)},
],
response_format={"type": "json_object"},
temperature=0,
)
result = json.loads(response.choices[0].message.content)
return result
def parse_price_fallback(self, text: str) -> Optional[float]:
"""Regex fallback when LLM is unavailable."""
patterns = [
r'$[d,]+.?d*',
r'USDs*[d,]+.?d*',
r'Price:s*[d,]+.?d*',
]
for pattern in patterns:
match = re.search(pattern, text)
if match:
price_str = re.sub(r'[^d.]', '', match.group())
return float(price_str)
return None
Multi-Site Scraping Engine
Each e-commerce site has different loading behavior, anti-bot measures, and page structures. The scraping engine uses Playwright for JavaScript-heavy sites and falls back to HTTP requests for static pages.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from playwright.async_api import async_playwright
import httpx
class PriceScraper:
def __init__(self, extractor: AIPriceExtractor):
self.extractor = extractor
self.http_client = httpx.AsyncClient(
timeout=30,
headers={"User-Agent": (
"Mozilla/5.0 (compatible; PriceMonitor/1.0)"
)},
)
async def scrape_product(self, product: Product) -> PriceSnapshot:
"""Scrape price for a single product."""
# Try simple HTTP first (faster, cheaper)
try:
page_text = await self._fetch_http(product.url)
if self._looks_like_price_page(page_text):
return await self._extract(product, page_text)
except Exception:
pass
# Fall back to browser for JS-rendered pages
page_text = await self._fetch_browser(product.url)
return await self._extract(product, page_text)
async def _fetch_http(self, url: str) -> str:
resp = await self.http_client.get(url)
resp.raise_for_status()
return resp.text
async def _fetch_browser(self, url: str) -> str:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, wait_until="networkidle")
text = await page.inner_text("body")
await browser.close()
return text
async def _extract(self, product: Product,
page_text: str) -> PriceSnapshot:
result = await self.extractor.extract_price(
page_text, product.name
)
return PriceSnapshot(
product_id=product.id,
price=result["price"],
currency=result.get("currency", product.currency),
timestamp=datetime.utcnow(),
raw_text=result.get("original_text", ""),
)
def _looks_like_price_page(self, html: str) -> bool:
"""Quick check if HTTP response has price-like content."""
return bool(re.search(r'[$€£]s*d', html))
Change Detection and Alerting
The change detection layer compares each new price snapshot against the previous one and evaluates alert rules to determine if a notification should be sent.
class ChangeDetector:
def __init__(self, db: PriceDatabase):
self.db = db
def check_alerts(self, product: Product,
new_price: float,
rules: list[AlertRule]) -> list[dict]:
"""Evaluate alert rules against price change."""
previous = product.current_price
if previous is None:
return []
alerts = []
for rule in rules:
triggered = False
if rule.condition == "any_change" and new_price != previous:
triggered = True
elif rule.condition == "drop_below":
triggered = new_price < rule.threshold
elif rule.condition == "drop_percent":
pct_change = ((previous - new_price) / previous) * 100
triggered = pct_change >= rule.threshold
if triggered:
alerts.append({
"product": product.name,
"old_price": previous,
"new_price": new_price,
"rule": rule.condition,
"channels": rule.notify_channels,
})
return alerts
Running the Monitor on a Schedule
Tie everything together with an async scheduler that runs price checks at configurable intervals.
import asyncio
async def run_price_monitor(products: list[Product],
rules: list[AlertRule],
interval_minutes: int = 60):
"""Main monitoring loop."""
db = PriceDatabase()
extractor = AIPriceExtractor(AsyncOpenAI())
scraper = PriceScraper(extractor)
detector = ChangeDetector(db)
while True:
for product in products:
try:
snapshot = await scraper.scrape_product(product)
alerts = detector.check_alerts(
product, snapshot.price, rules
)
db.record_price(snapshot)
for alert in alerts:
await send_notification(alert)
except Exception as e:
print(f"Error checking {product.name}: {e}")
await asyncio.sleep(interval_minutes * 60)
FAQ
How do I avoid getting blocked by e-commerce sites?
Respect robots.txt directives, use reasonable request intervals (at least 30-60 seconds between requests to the same domain), rotate user agents, and consider using the site's official API or affiliate feeds when available. For production use, services like ScrapingBee or Browserless can handle anti-bot measures.
How accurate is LLM-based price extraction compared to CSS selectors?
LLM extraction is more robust across different sites but slightly less precise on well-structured pages. The best approach is a hybrid: maintain CSS selectors for your highest-volume sites and use LLM extraction as a fallback and for new sites. Test extraction accuracy regularly against a ground truth dataset.
How should I store historical price data at scale?
For small to medium volumes (thousands of products), SQLite or PostgreSQL with a time-indexed snapshots table works well. For larger volumes, consider a time-series database like TimescaleDB, which is PostgreSQL-compatible but optimized for time-series queries and data retention policies.
#PriceMonitoring #WebScraping #ECommerce #AIAgents #DataExtraction #PriceTracking #Playwright #Automation
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.