Skip to content
Learn Agentic AI13 min read0 views

Playwright Network Interception: Capturing API Calls and Modifying Requests

Master Playwright's network interception API to capture API responses, log request/response data, mock endpoints, and extract structured data from XHR and fetch calls in your AI agents.

Why Network Interception Matters for AI Agents

Modern web applications load data through API calls — REST endpoints, GraphQL queries, and WebSocket connections. Rather than scraping the rendered HTML, an AI agent can intercept these network requests and access the structured JSON data directly. This is faster, more reliable, and produces cleaner data than DOM parsing.

Playwright's route() API provides full control over network traffic: intercepting requests, modifying headers, mocking responses, and logging all API activity. This post covers practical patterns for AI agents that need to work with network traffic.

Listening to Network Events

The simplest approach is passively listening to requests and responses:

from playwright.sync_api import sync_playwright

def log_request(request):
    if "api" in request.url:
        print(f">> {request.method} {request.url}")

def log_response(response):
    if "api" in response.url:
        print(f"<< {response.status} {response.url}")

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    # Register event listeners
    page.on("request", log_request)
    page.on("response", log_response)

    page.goto("https://example.com")
    page.wait_for_load_state("networkidle")

    browser.close()

This logs all API requests the page makes during navigation. For AI agents, this reveals the data endpoints a site uses without any DOM inspection.

Capturing API Response Data

Intercept specific API calls and extract the JSON data:

from playwright.sync_api import sync_playwright
import json

captured_data = []

def capture_api_response(response):
    """Capture JSON responses from API endpoints."""
    if "/api/" in response.url and response.status == 200:
        try:
            body = response.json()
            captured_data.append({
                "url": response.url,
                "status": response.status,
                "data": body,
            })
        except Exception:
            pass  # Not a JSON response

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    page.on("response", capture_api_response)

    page.goto("https://example.com")
    page.wait_for_load_state("networkidle")

    # Trigger actions that fire API calls
    page.get_by_role("button", name="Load More").click()
    page.wait_for_load_state("networkidle")

    print(f"Captured {len(captured_data)} API responses")
    for item in captured_data:
        print(f"  {item['url']}: {json.dumps(item['data'])[:200]}")

    browser.close()

Waiting for Specific API Responses

Instead of listening to all traffic, wait for a specific API call:

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")

    # Wait for a specific API response after triggering an action
    with page.expect_response("**/api/search**") as response_info:
        page.get_by_label("Search").fill("playwright")
        page.get_by_label("Search").press("Enter")

    response = response_info.value
    search_results = response.json()
    print(f"Found {len(search_results['items'])} results")

    # Wait with a predicate function
    with page.expect_response(
        lambda resp: "/api/products" in resp.url and resp.status == 200
    ) as response_info:
        page.get_by_text("View Products").click()

    products = response_info.value.json()
    print(f"Loaded {len(products)} products")

    browser.close()

Route Interception: Modifying Requests

The route() API lets you intercept and modify requests before they reach the server:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    # Add custom headers to all API requests
    def add_auth_header(route):
        headers = route.request.headers
        headers["authorization"] = "Bearer my-agent-token"
        headers["x-agent-id"] = "playwright-ai-agent"
        route.continue_(headers=headers)

    page.route("**/api/**", add_auth_header)

    page.goto("https://example.com")

    browser.close()

Mocking API Responses

AI agents can mock API responses for testing or to simulate specific scenarios:

import json

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    # Mock an API endpoint with custom data
    def mock_products_api(route):
        mock_data = {
            "products": [
                {"id": 1, "name": "Test Product", "price": 29.99},
                {"id": 2, "name": "Mock Product", "price": 49.99},
            ],
            "total": 2,
        }
        route.fulfill(
            status=200,
            content_type="application/json",
            body=json.dumps(mock_data),
        )

    page.route("**/api/products**", mock_products_api)
    page.goto("https://example.com/products")

    # The page now displays mock data
    page.screenshot(path="mocked_products.png")

    browser.close()

Blocking Unwanted Resources

Speed up page loads by blocking ads, tracking scripts, and large images:

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    # Block by resource type
    def block_unnecessary(route):
        if route.request.resource_type in ["image", "media", "font"]:
            route.abort()
        else:
            route.continue_()

    page.route("**/*", block_unnecessary)

    # Block specific domains
    page.route("**/google-analytics.com/**", lambda route: route.abort())
    page.route("**/facebook.net/**", lambda route: route.abort())
    page.route("**/doubleclick.net/**", lambda route: route.abort())

    page.goto("https://example.com")
    browser.close()

This dramatically reduces page load time and bandwidth usage for AI agents that only need text content.

Building an API Data Extraction Agent

Here is a complete agent that navigates a site, captures all API data, and structures it:

import json
from dataclasses import dataclass, field
from playwright.sync_api import sync_playwright

@dataclass
class APICapture:
    url: str
    method: str
    status: int
    request_headers: dict
    response_headers: dict
    body: dict | str | None = None

class APIExtractorAgent:
    def __init__(self):
        self.captures: list[APICapture] = field(default_factory=list)
        self.captures = []

    def _on_response(self, response):
        request = response.request
        try:
            body = response.json()
        except Exception:
            body = None

        self.captures.append(APICapture(
            url=request.url,
            method=request.method,
            status=response.status,
            request_headers=dict(request.headers),
            response_headers=dict(response.headers),
            body=body,
        ))

    def extract(self, url: str, actions=None) -> list[APICapture]:
        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()
            page.on("response", self._on_response)

            page.goto(url, wait_until="networkidle")

            if actions:
                actions(page)
                page.wait_for_load_state("networkidle")

            browser.close()

        return [c for c in self.captures if c.body is not None]

# Usage
agent = APIExtractorAgent()
api_data = agent.extract(
    "https://example.com",
    actions=lambda page: page.get_by_text("Load Data").click()
)

for capture in api_data:
    print(f"{capture.method} {capture.url} -> {capture.status}")
    if isinstance(capture.body, dict):
        print(f"  Keys: {list(capture.body.keys())}")

Handling WebSocket Connections

Playwright can also monitor WebSocket traffic:

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    def on_websocket(ws):
        print(f"WebSocket opened: {ws.url}")
        ws.on("framereceived", lambda payload: print(
            f"  WS received: {payload[:100]}"
        ))
        ws.on("framesent", lambda payload: print(
            f"  WS sent: {payload[:100]}"
        ))
        ws.on("close", lambda: print("  WS closed"))

    page.on("websocket", on_websocket)
    page.goto("https://example.com")
    page.wait_for_timeout(5000)

    browser.close()

FAQ

How do I capture API calls that happen during page load versus after user interaction?

Register your event listeners before calling page.goto() to capture load-time API calls. For calls triggered by user interaction, use page.expect_response() wrapped around the triggering action. Combining both gives you complete visibility into all network activity throughout the session.

Can I modify POST request bodies with route interception?

Yes. In your route handler, access the original request body with route.request.post_data, parse it, modify the data, and pass it to route.continue_(post_data=modified_body). This is useful for AI agents that need to inject additional parameters into form submissions or API calls.

Does network interception work with HTTP/2 and HTTP/3?

Playwright handles HTTP/2 transparently — all interception APIs work the same regardless of the HTTP version. HTTP/3 (QUIC) support depends on the browser being used and is still evolving. For most practical purposes, the interception API abstracts away protocol differences entirely.


#NetworkInterception #APICapture #Playwright #RequestMocking #WebScraping #AIAgents #HTTPMonitoring

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.