Building an AI Testing Agent: Automated QA That Explores and Finds Bugs
Build an AI-powered testing agent that performs exploratory testing, automatically generates test cases, classifies discovered bugs, and produces structured reports for development teams.
Beyond Scripted Test Suites
Traditional automated testing follows scripts: visit this URL, click this button, assert that element appears. This approach catches regressions but never discovers new bugs because it only tests paths that a human already thought to check. AI testing agents flip this model. They explore the application like a curious tester, trying unexpected inputs, clicking buttons in unusual orders, and flagging behavior that looks wrong.
The difference is profound. A scripted test suite with 500 tests will always run the same 500 paths. An AI testing agent generates novel test paths on every run, covering UI states and interaction sequences that no human thought to script.
Architecture of an AI Testing Agent
An AI testing agent consists of four components: an explorer that navigates the application, a test case generator that produces structured test scenarios, a bug classifier that determines whether observed behavior is actually a defect, and a report generator that produces actionable output.
from dataclasses import dataclass, field
from enum import Enum
from datetime import datetime
from typing import Optional
class BugSeverity(Enum):
CRITICAL = "critical" # App crashes, data loss
HIGH = "high" # Feature broken, no workaround
MEDIUM = "medium" # Feature broken, workaround exists
LOW = "low" # Cosmetic, minor usability
@dataclass
class TestAction:
action_type: str # click, fill, navigate, scroll
target: str
value: Optional[str] = None
screenshot_before: Optional[str] = None
screenshot_after: Optional[str] = None
@dataclass
class BugReport:
title: str
severity: BugSeverity
description: str
steps_to_reproduce: list[TestAction]
expected_behavior: str
actual_behavior: str
screenshot_path: Optional[str] = None
url: str = ""
discovered_at: datetime = field(
default_factory=datetime.utcnow
)
@dataclass
class ExplorationState:
visited_urls: set = field(default_factory=set)
clicked_elements: set = field(default_factory=set)
forms_submitted: int = 0
bugs_found: list[BugReport] = field(default_factory=list)
action_history: list[TestAction] = field(default_factory=list)
error_count: int = 0
The Exploration Engine
The explorer navigates the application systematically, prioritizing unvisited pages and untested interaction patterns. It uses an LLM to decide what to do next based on the current page state and exploration history.
from playwright.async_api import async_playwright, Page
from openai import AsyncOpenAI
import json
class ExplorationEngine:
def __init__(self, client: AsyncOpenAI, base_url: str):
self.client = client
self.base_url = base_url
self.state = ExplorationState()
async def explore(self, max_steps: int = 100):
"""Main exploration loop."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
# Catch console errors and unhandled exceptions
console_errors = []
page.on("console", lambda msg: (
console_errors.append(msg.text)
if msg.type == "error" else None
))
page.on("pageerror", lambda err: (
console_errors.append(str(err))
))
await page.goto(self.base_url)
self.state.visited_urls.add(self.base_url)
for step in range(max_steps):
try:
action = await self._decide_next_action(page)
await self._execute_action(page, action)
# Check for bugs after each action
bugs = await self._check_for_bugs(
page, action, console_errors
)
self.state.bugs_found.extend(bugs)
console_errors.clear()
except Exception as e:
self.state.error_count += 1
if self.state.error_count > 10:
break
await browser.close()
return self.state
async def _decide_next_action(self, page: Page) -> TestAction:
"""Use LLM to decide the next exploration action."""
# Get interactive elements
elements = await page.evaluate("""
() => {
const els = document.querySelectorAll(
'a, button, input, select, textarea, '
+ '[onclick], [role="button"]'
);
return Array.from(els).slice(0, 50).map(el => ({
tag: el.tagName,
text: el.textContent?.trim().slice(0, 50),
type: el.type || '',
href: el.href || '',
id: el.id,
name: el.name,
selector: el.id ? '#' + el.id
: el.name ? '[name="' + el.name + '"]'
: el.tagName.toLowerCase(),
}));
}
""")
visited_summary = (
f"Visited {len(self.state.visited_urls)} pages, "
f"clicked {len(self.state.clicked_elements)} elements, "
f"found {len(self.state.bugs_found)} bugs so far."
)
response = await self.client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": (
"You are a QA tester exploring a web app to find "
"bugs. Choose the next action to maximize test "
"coverage. Prioritize untested elements and "
"edge cases. Return JSON: action_type, target "
"(selector), value (for inputs)."
)},
{"role": "user", "content": (
f"Current URL: {page.url}\n"
f"Page title: {await page.title()}\n"
f"Progress: {visited_summary}\n"
f"Available elements:\n"
f"{json.dumps(elements[:30], indent=2)}"
)},
],
response_format={"type": "json_object"},
temperature=0.7, # Some randomness for exploration
)
data = json.loads(response.choices[0].message.content)
return TestAction(
action_type=data.get("action_type", "click"),
target=data.get("target", ""),
value=data.get("value"),
)
Bug Detection and Classification
After each action, the agent checks for bugs by analyzing the page state. It looks for HTTP errors, JavaScript console errors, visual anomalies, broken layouts, and unexpected behavior.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
class BugDetector:
def __init__(self, client: AsyncOpenAI):
self.client = client
async def check_for_bugs(self, page: Page,
action: TestAction,
console_errors: list[str]) -> list[BugReport]:
"""Analyze current page state for potential bugs."""
bugs = []
# Check 1: HTTP error pages
status_check = await self._check_http_status(page)
if status_check:
bugs.append(status_check)
# Check 2: Console errors
for error in console_errors:
if self._is_significant_error(error):
bugs.append(BugReport(
title=f"JavaScript error: {error[:80]}",
severity=BugSeverity.HIGH,
description=f"Console error after action: {error}",
steps_to_reproduce=[action],
expected_behavior="No JavaScript errors",
actual_behavior=f"Console error: {error}",
url=page.url,
))
# Check 3: Visual/functional bugs via LLM
screenshot = await page.screenshot()
visual_bugs = await self._llm_visual_check(
page, screenshot, action
)
bugs.extend(visual_bugs)
return bugs
async def _check_http_status(self, page) -> Optional[BugReport]:
"""Check for 4xx/5xx error pages."""
content = await page.text_content("body") or ""
error_patterns = [
"500 Internal Server Error",
"404 Not Found",
"403 Forbidden",
"502 Bad Gateway",
]
for pattern in error_patterns:
if pattern.lower() in content.lower():
return BugReport(
title=f"HTTP error page: {pattern}",
severity=BugSeverity.HIGH,
description=f"Page shows {pattern}",
steps_to_reproduce=[],
expected_behavior="Page loads successfully",
actual_behavior=f"Error page: {pattern}",
url=page.url,
)
return None
def _is_significant_error(self, error: str) -> bool:
"""Filter out noise from console errors."""
noise_patterns = [
"favicon.ico",
"third-party",
"analytics",
"deprecated",
]
return not any(p in error.lower() for p in noise_patterns)
Report Generation
The report generator compiles all discovered bugs into a structured, actionable report.
class TestReportGenerator:
def generate_report(self, state: ExplorationState) -> str:
"""Generate a structured test report."""
lines = [
"# AI Exploratory Test Report",
f"Generated: {datetime.utcnow().isoformat()}",
"",
"## Summary",
f"- Pages visited: {len(state.visited_urls)}",
f"- Elements tested: {len(state.clicked_elements)}",
f"- Forms submitted: {state.forms_submitted}",
f"- Bugs found: {len(state.bugs_found)}",
"",
]
# Group bugs by severity
for severity in BugSeverity:
severity_bugs = [
b for b in state.bugs_found
if b.severity == severity
]
if not severity_bugs:
continue
lines.append(f"## {severity.value.upper()} ({len(severity_bugs)})")
for i, bug in enumerate(severity_bugs, 1):
lines.extend([
f"### {i}. {bug.title}",
f"**URL:** {bug.url}",
f"**Description:** {bug.description}",
f"**Expected:** {bug.expected_behavior}",
f"**Actual:** {bug.actual_behavior}",
"",
])
return "\n".join(lines)
Running the Full Testing Pipeline
async def run_ai_testing(target_url: str,
max_steps: int = 200) -> str:
"""Run a complete AI testing session."""
client = AsyncOpenAI()
engine = ExplorationEngine(client, target_url)
state = await engine.explore(max_steps=max_steps)
reporter = TestReportGenerator()
report = reporter.generate_report(state)
Path("test_report.md").write_text(report)
print(f"Testing complete. Found {len(state.bugs_found)} bugs.")
return report
FAQ
How does AI exploratory testing compare to traditional test suites in terms of bug detection rate?
AI exploratory testing excels at finding bugs in areas that scripted tests never cover — unusual navigation sequences, unexpected input combinations, and edge cases in form validation. In practice, AI exploratory testing finds 15-30% more unique bugs than scripted suites alone, but it is not a replacement. The best approach combines both: scripted tests for regression coverage and AI exploration for novel bug discovery.
How do I prevent the AI tester from performing destructive actions like deleting data?
Implement an action filter that blocks dangerous operations before execution. Maintain a blocklist of selectors and action patterns (delete buttons, admin operations, payment submissions) and require explicit opt-in for destructive tests. Run the agent against a staging environment with seed data that can be reset after each test session.
Can AI testing agents generate regression test scripts from their explorations?
Yes. When the agent discovers a bug, it has a complete record of the actions that led to it. These can be converted to Playwright or Selenium test scripts that reproduce the bug deterministically. This converts exploratory findings into permanent regression tests.
#QAAutomation #AITesting #ExploratoryTesting #BugDetection #TestGeneration #AgenticAI #Playwright #AutomatedQA
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.