Playwright Network Interception: Capturing API Calls and Modifying Requests
Master Playwright's network interception API to capture API responses, log request/response data, mock endpoints, and extract structured data from XHR and fetch calls in your AI agents.
Why Network Interception Matters for AI Agents
Modern web applications load data through API calls — REST endpoints, GraphQL queries, and WebSocket connections. Rather than scraping the rendered HTML, an AI agent can intercept these network requests and access the structured JSON data directly. This is faster, more reliable, and produces cleaner data than DOM parsing.
Playwright's route() API provides full control over network traffic: intercepting requests, modifying headers, mocking responses, and logging all API activity. This post covers practical patterns for AI agents that need to work with network traffic.
Listening to Network Events
The simplest approach is passively listening to requests and responses:
from playwright.sync_api import sync_playwright
def log_request(request):
if "api" in request.url:
print(f">> {request.method} {request.url}")
def log_response(response):
if "api" in response.url:
print(f"<< {response.status} {response.url}")
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
# Register event listeners
page.on("request", log_request)
page.on("response", log_response)
page.goto("https://example.com")
page.wait_for_load_state("networkidle")
browser.close()
This logs all API requests the page makes during navigation. For AI agents, this reveals the data endpoints a site uses without any DOM inspection.
Capturing API Response Data
Intercept specific API calls and extract the JSON data:
from playwright.sync_api import sync_playwright
import json
captured_data = []
def capture_api_response(response):
"""Capture JSON responses from API endpoints."""
if "/api/" in response.url and response.status == 200:
try:
body = response.json()
captured_data.append({
"url": response.url,
"status": response.status,
"data": body,
})
except Exception:
pass # Not a JSON response
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.on("response", capture_api_response)
page.goto("https://example.com")
page.wait_for_load_state("networkidle")
# Trigger actions that fire API calls
page.get_by_role("button", name="Load More").click()
page.wait_for_load_state("networkidle")
print(f"Captured {len(captured_data)} API responses")
for item in captured_data:
print(f" {item['url']}: {json.dumps(item['data'])[:200]}")
browser.close()
Waiting for Specific API Responses
Instead of listening to all traffic, wait for a specific API call:
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
# Wait for a specific API response after triggering an action
with page.expect_response("**/api/search**") as response_info:
page.get_by_label("Search").fill("playwright")
page.get_by_label("Search").press("Enter")
response = response_info.value
search_results = response.json()
print(f"Found {len(search_results['items'])} results")
# Wait with a predicate function
with page.expect_response(
lambda resp: "/api/products" in resp.url and resp.status == 200
) as response_info:
page.get_by_text("View Products").click()
products = response_info.value.json()
print(f"Loaded {len(products)} products")
browser.close()
Route Interception: Modifying Requests
The route() API lets you intercept and modify requests before they reach the server:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
# Add custom headers to all API requests
def add_auth_header(route):
headers = route.request.headers
headers["authorization"] = "Bearer my-agent-token"
headers["x-agent-id"] = "playwright-ai-agent"
route.continue_(headers=headers)
page.route("**/api/**", add_auth_header)
page.goto("https://example.com")
browser.close()
Mocking API Responses
AI agents can mock API responses for testing or to simulate specific scenarios:
import json
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
# Mock an API endpoint with custom data
def mock_products_api(route):
mock_data = {
"products": [
{"id": 1, "name": "Test Product", "price": 29.99},
{"id": 2, "name": "Mock Product", "price": 49.99},
],
"total": 2,
}
route.fulfill(
status=200,
content_type="application/json",
body=json.dumps(mock_data),
)
page.route("**/api/products**", mock_products_api)
page.goto("https://example.com/products")
# The page now displays mock data
page.screenshot(path="mocked_products.png")
browser.close()
Blocking Unwanted Resources
Speed up page loads by blocking ads, tracking scripts, and large images:
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
# Block by resource type
def block_unnecessary(route):
if route.request.resource_type in ["image", "media", "font"]:
route.abort()
else:
route.continue_()
page.route("**/*", block_unnecessary)
# Block specific domains
page.route("**/google-analytics.com/**", lambda route: route.abort())
page.route("**/facebook.net/**", lambda route: route.abort())
page.route("**/doubleclick.net/**", lambda route: route.abort())
page.goto("https://example.com")
browser.close()
This dramatically reduces page load time and bandwidth usage for AI agents that only need text content.
Building an API Data Extraction Agent
Here is a complete agent that navigates a site, captures all API data, and structures it:
import json
from dataclasses import dataclass, field
from playwright.sync_api import sync_playwright
@dataclass
class APICapture:
url: str
method: str
status: int
request_headers: dict
response_headers: dict
body: dict | str | None = None
class APIExtractorAgent:
def __init__(self):
self.captures: list[APICapture] = field(default_factory=list)
self.captures = []
def _on_response(self, response):
request = response.request
try:
body = response.json()
except Exception:
body = None
self.captures.append(APICapture(
url=request.url,
method=request.method,
status=response.status,
request_headers=dict(request.headers),
response_headers=dict(response.headers),
body=body,
))
def extract(self, url: str, actions=None) -> list[APICapture]:
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.on("response", self._on_response)
page.goto(url, wait_until="networkidle")
if actions:
actions(page)
page.wait_for_load_state("networkidle")
browser.close()
return [c for c in self.captures if c.body is not None]
# Usage
agent = APIExtractorAgent()
api_data = agent.extract(
"https://example.com",
actions=lambda page: page.get_by_text("Load Data").click()
)
for capture in api_data:
print(f"{capture.method} {capture.url} -> {capture.status}")
if isinstance(capture.body, dict):
print(f" Keys: {list(capture.body.keys())}")
Handling WebSocket Connections
Playwright can also monitor WebSocket traffic:
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
def on_websocket(ws):
print(f"WebSocket opened: {ws.url}")
ws.on("framereceived", lambda payload: print(
f" WS received: {payload[:100]}"
))
ws.on("framesent", lambda payload: print(
f" WS sent: {payload[:100]}"
))
ws.on("close", lambda: print(" WS closed"))
page.on("websocket", on_websocket)
page.goto("https://example.com")
page.wait_for_timeout(5000)
browser.close()
FAQ
How do I capture API calls that happen during page load versus after user interaction?
Register your event listeners before calling page.goto() to capture load-time API calls. For calls triggered by user interaction, use page.expect_response() wrapped around the triggering action. Combining both gives you complete visibility into all network activity throughout the session.
Can I modify POST request bodies with route interception?
Yes. In your route handler, access the original request body with route.request.post_data, parse it, modify the data, and pass it to route.continue_(post_data=modified_body). This is useful for AI agents that need to inject additional parameters into form submissions or API calls.
Does network interception work with HTTP/2 and HTTP/3?
Playwright handles HTTP/2 transparently — all interception APIs work the same regardless of the HTTP version. HTTP/3 (QUIC) support depends on the browser being used and is still evolving. For most practical purposes, the interception API abstracts away protocol differences entirely.
#NetworkInterception #APICapture #Playwright #RequestMocking #WebScraping #AIAgents #HTTPMonitoring
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.