Getting Started with Playwright for AI Browser Automation: Installation and First Script
Learn how to install Playwright for Python, launch browsers programmatically, navigate to pages, locate elements with selectors, and capture screenshots in your first browser automation script.
Why Playwright Is the Best Choice for AI Browser Automation
AI agents increasingly need to interact with the real web — filling out forms, reading dynamic content, clicking through multi-step workflows, and extracting data from JavaScript-heavy single-page applications. Traditional HTTP-based scraping libraries like requests or httpx cannot handle these tasks because they do not execute JavaScript or render the DOM.
Playwright solves this by providing a full browser automation framework that controls Chromium, Firefox, and WebKit through a single API. Unlike Selenium, Playwright was built from the ground up for modern web applications with features like auto-waiting, network interception, and multi-browser-context isolation. For AI agents, this means reliable, deterministic interaction with any website.
In this tutorial, you will go from zero to a working Playwright automation script that navigates to a page, extracts content, and captures a screenshot.
Prerequisites
Before you begin, make sure you have:
- Python 3.8 or later installed
- pip for package management
- Basic familiarity with Python async/await (helpful but not required)
Step 1: Install Playwright
Playwright for Python is distributed as a pip package. Install it along with its browser binaries:
pip install playwright
playwright install
The playwright install command downloads Chromium, Firefox, and WebKit browser binaries. These are self-contained — they do not interfere with any browsers already installed on your system.
If you only need Chromium (the most common choice for automation), you can save disk space:
playwright install chromium
Verify the installation:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
print(page.title())
browser.close()
Run this script and you should see Example Domain printed to the console.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Step 2: Understanding the Playwright Object Model
Playwright organizes its API into a clear hierarchy:
- Playwright — the entry point that provides browser type objects
- Browser — a running browser instance (Chromium, Firefox, or WebKit)
- BrowserContext — an isolated browser session (like an incognito window)
- Page — a single tab within a context
This hierarchy matters for AI agents because contexts provide isolation. Each agent session can have its own cookies, storage, and authentication state without interference.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
# Launch a browser
browser = p.chromium.launch(headless=True)
# Create an isolated context
context = browser.new_context(
viewport={"width": 1280, "height": 720},
user_agent="Mozilla/5.0 (AI Agent; Playwright)"
)
# Open a page in that context
page = context.new_page()
page.goto("https://example.com")
print(f"Title: {page.title()}")
print(f"URL: {page.url}")
context.close()
browser.close()
Step 3: Navigating and Waiting
One of Playwright's most powerful features is its auto-waiting mechanism. When you call page.goto(), Playwright waits until the page reaches the load state by default. You can customize this:
# Wait until there are no more than 2 network connections for 500ms
page.goto("https://example.com", wait_until="networkidle")
# Wait only until the DOM content is loaded
page.goto("https://example.com", wait_until="domcontentloaded")
# Set a custom timeout (in milliseconds)
page.goto("https://example.com", timeout=30000)
For AI agents that need to interact with elements after navigation, you can wait for specific conditions:
# Wait for a specific element to appear
page.wait_for_selector("h1")
# Wait for a specific URL pattern
page.wait_for_url("**/dashboard**")
# Wait for the page to reach a load state
page.wait_for_load_state("networkidle")
Step 4: Locating Elements with Selectors
Playwright supports multiple selector strategies. For AI agents, the most reliable approach combines CSS selectors with text-based and role-based locators:
# CSS selector
page.locator("div.content h1").text_content()
# Text selector — finds elements containing the text
page.locator("text=Learn More").click()
# Role-based selector — semantic and accessible
page.get_by_role("button", name="Submit")
page.get_by_role("heading", name="Welcome")
# Label-based — great for form fields
page.get_by_label("Email address").fill("user@example.com")
# Placeholder-based
page.get_by_placeholder("Search...").fill("AI agents")
# Test ID — most reliable for testing
page.get_by_test_id("submit-button").click()
Step 5: Taking a Screenshot
Screenshots are essential for AI agents, especially when feeding page visuals to multimodal models like GPT-4 Vision for analysis:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
# Full page screenshot
page.screenshot(path="full_page.png", full_page=True)
# Viewport-only screenshot
page.screenshot(path="viewport.png")
# Screenshot a specific element
page.locator("h1").screenshot(path="heading.png")
browser.close()
Complete First Script
Here is a complete script that ties everything together — navigating, extracting data, and capturing a screenshot:
from playwright.sync_api import sync_playwright
def run_browser_agent():
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={"width": 1920, "height": 1080}
)
page = context.new_page()
page.goto("https://news.ycombinator.com", wait_until="networkidle")
# Extract the top 5 story titles
stories = page.locator(".titleline > a").all()[:5]
for i, story in enumerate(stories, 1):
title = story.text_content()
href = story.get_attribute("href")
print(f"{i}. {title} -> {href}")
# Take a screenshot for visual analysis
page.screenshot(path="hackernews.png", full_page=False)
print("Screenshot saved to hackernews.png")
context.close()
browser.close()
run_browser_agent()
FAQ
Why choose Playwright over Selenium for AI agents?
Playwright offers auto-waiting, network interception, and multi-browser-context support out of the box. It does not require a separate WebDriver binary, handles modern SPAs more reliably, and its API is designed for the async patterns that AI agent frameworks use. Selenium is still viable for legacy projects, but Playwright is the better choice for new automation work.
Can Playwright run in Docker or headless servers?
Yes. Playwright provides official Docker images and runs headless by default. For CI/CD or cloud deployments, set headless=True (which is the default) and install system dependencies with playwright install --with-deps chromium. This installs all required OS libraries automatically.
Does Playwright work with all websites?
Playwright can automate any website that runs in Chromium, Firefox, or WebKit. Some sites employ bot detection that may block automated browsers. Playwright provides features like custom user agents, viewport configuration, and network interception that help work around basic detection, though advanced anti-bot systems may require additional strategies.
#BrowserAutomation #Playwright #AIAgents #Python #WebScraping #Chromium #HeadlessBrowser
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.