Getting Started with Playwright for AI Browser Automation: Installation and First Script

Why Playwright Is the Best Choice for AI Browser Automation

AI agents increasingly need to interact with the real web — filling out forms, reading dynamic content, clicking through multi-step workflows, and extracting data from JavaScript-heavy single-page applications. Traditional HTTP-based scraping libraries like requests or httpx cannot handle these tasks because they do not execute JavaScript or render the DOM.

Playwright solves this by providing a full browser automation framework that controls Chromium, Firefox, and WebKit through a single API. Unlike Selenium, Playwright was built from the ground up for modern web applications with features like auto-waiting, network interception, and multi-browser-context isolation. For AI agents, this means reliable, deterministic interaction with any website.

In this tutorial, you will go from zero to a working Playwright automation script that navigates to a page, extracts content, and captures a screenshot.

Prerequisites

Before you begin, make sure you have:

Python 3.8 or later installed
pip for package management
Basic familiarity with Python async/await (helpful but not required)

Step 1: Install Playwright

Playwright for Python is distributed as a pip package. Install it along with its browser binaries:

pip install playwright
playwright install

The playwright install command downloads Chromium, Firefox, and WebKit browser binaries. These are self-contained — they do not interfere with any browsers already installed on your system.

If you only need Chromium (the most common choice for automation), you can save disk space:

playwright install chromium

Verify the installation:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    print(page.title())
    browser.close()

Run this script and you should see Example Domain printed to the console.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Step 2: Understanding the Playwright Object Model

Playwright organizes its API into a clear hierarchy:

Playwright — the entry point that provides browser type objects
Browser — a running browser instance (Chromium, Firefox, or WebKit)
BrowserContext — an isolated browser session (like an incognito window)
Page — a single tab within a context

This hierarchy matters for AI agents because contexts provide isolation. Each agent session can have its own cookies, storage, and authentication state without interference.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # Launch a browser
    browser = p.chromium.launch(headless=True)

    # Create an isolated context
    context = browser.new_context(
        viewport={"width": 1280, "height": 720},
        user_agent="Mozilla/5.0 (AI Agent; Playwright)"
    )

    # Open a page in that context
    page = context.new_page()
    page.goto("https://example.com")

    print(f"Title: {page.title()}")
    print(f"URL: {page.url}")

    context.close()
    browser.close()

Step 3: Navigating and Waiting

One of Playwright's most powerful features is its auto-waiting mechanism. When you call page.goto(), Playwright waits until the page reaches the load state by default. You can customize this:

# Wait until there are no more than 2 network connections for 500ms
page.goto("https://example.com", wait_until="networkidle")

# Wait only until the DOM content is loaded
page.goto("https://example.com", wait_until="domcontentloaded")

# Set a custom timeout (in milliseconds)
page.goto("https://example.com", timeout=30000)

For AI agents that need to interact with elements after navigation, you can wait for specific conditions:

# Wait for a specific element to appear
page.wait_for_selector("h1")

# Wait for a specific URL pattern
page.wait_for_url("**/dashboard**")

# Wait for the page to reach a load state
page.wait_for_load_state("networkidle")

Step 4: Locating Elements with Selectors

Playwright supports multiple selector strategies. For AI agents, the most reliable approach combines CSS selectors with text-based and role-based locators:

# CSS selector
page.locator("div.content h1").text_content()

# Text selector — finds elements containing the text
page.locator("text=Learn More").click()

# Role-based selector — semantic and accessible
page.get_by_role("button", name="Submit")
page.get_by_role("heading", name="Welcome")

# Label-based — great for form fields
page.get_by_label("Email address").fill("user@example.com")

# Placeholder-based
page.get_by_placeholder("Search...").fill("AI agents")

# Test ID — most reliable for testing
page.get_by_test_id("submit-button").click()

Step 5: Taking a Screenshot

Screenshots are essential for AI agents, especially when feeding page visuals to multimodal models like GPT-4 Vision for analysis:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")

    # Full page screenshot
    page.screenshot(path="full_page.png", full_page=True)

    # Viewport-only screenshot
    page.screenshot(path="viewport.png")

    # Screenshot a specific element
    page.locator("h1").screenshot(path="heading.png")

    browser.close()

Complete First Script

Here is a complete script that ties everything together — navigating, extracting data, and capturing a screenshot:

from playwright.sync_api import sync_playwright

def run_browser_agent():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080}
        )
        page = context.new_page()

        page.goto("https://news.ycombinator.com", wait_until="networkidle")

        # Extract the top 5 story titles
        stories = page.locator(".titleline > a").all()[:5]
        for i, story in enumerate(stories, 1):
            title = story.text_content()
            href = story.get_attribute("href")
            print(f"{i}. {title} -> {href}")

        # Take a screenshot for visual analysis
        page.screenshot(path="hackernews.png", full_page=False)
        print("Screenshot saved to hackernews.png")

        context.close()
        browser.close()

run_browser_agent()

FAQ

Why choose Playwright over Selenium for AI agents?

Playwright offers auto-waiting, network interception, and multi-browser-context support out of the box. It does not require a separate WebDriver binary, handles modern SPAs more reliably, and its API is designed for the async patterns that AI agent frameworks use. Selenium is still viable for legacy projects, but Playwright is the better choice for new automation work.

Can Playwright run in Docker or headless servers?

Yes. Playwright provides official Docker images and runs headless by default. For CI/CD or cloud deployments, set headless=True (which is the default) and install system dependencies with playwright install --with-deps chromium. This installs all required OS libraries automatically.

Does Playwright work with all websites?

Playwright can automate any website that runs in Chromium, Firefox, or WebKit. Some sites employ bot detection that may block automated browsers. Playwright provides features like custom user agents, viewport configuration, and network interception that help work around basic detection, though advanced anti-bot systems may require additional strategies.

#BrowserAutomation #Playwright #AIAgents #Python #WebScraping #Chromium #HeadlessBrowser