Test Generation with AI Agents: Automatic Unit and Integration Test Creation
Learn to build an AI agent that analyzes source code, identifies untested paths, and generates high-quality unit and integration tests with proper assertions, fixtures, and edge case coverage.
Why AI-Generated Tests Are Different from Copilot Suggestions
Inline code suggestions can produce individual test functions, but they lack awareness of your overall test strategy. A test generation agent analyzes your entire codebase, identifies what is already covered, finds untested code paths, and produces a coherent test suite that fills the gaps. It reasons about which tests provide the most value rather than blindly generating tests for every function.
Analyzing Code Coverage Gaps
The agent starts by understanding what exists and what is missing. It combines static analysis with coverage data to identify high-value testing targets.
import ast
import json
import subprocess
from dataclasses import dataclass
from openai import OpenAI
client = OpenAI()
@dataclass
class TestTarget:
file_path: str
function_name: str
source_code: str
complexity: int
has_existing_tests: bool
class TestGenerationAgent:
def __init__(self, source_dir: str, test_dir: str, model: str = "gpt-4o"):
self.source_dir = source_dir
self.test_dir = test_dir
self.model = model
def find_untested_functions(self) -> list[TestTarget]:
coverage = self._run_coverage()
targets = []
import os
for root, _, files in os.walk(self.source_dir):
for fname in files:
if not fname.endswith(".py") or fname.startswith("test_"):
continue
path = os.path.join(root, fname)
with open(path) as f:
source = f.read()
tree = ast.parse(source)
for node in ast.walk(tree):
if not isinstance(node, ast.FunctionDef):
continue
func_lines = set(range(
node.lineno, node.end_lineno + 1
))
uncovered = coverage.get(path, set())
has_gap = bool(func_lines & uncovered)
targets.append(TestTarget(
file_path=path,
function_name=node.name,
source_code=ast.get_source_segment(source, node),
complexity=self._calc_complexity(node),
has_existing_tests=not has_gap,
))
targets.sort(key=lambda t: t.complexity, reverse=True)
return [t for t in targets if not t.has_existing_tests]
The agent prioritizes high-complexity functions without test coverage, ensuring the generated tests deliver maximum value.
Generating Tests with Proper Fixtures
Good tests need proper setup and teardown. The agent generates pytest fixtures alongside the test functions.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
def generate_tests(self, target: TestTarget) -> str:
system_prompt = """You are an expert Python test engineer.
Generate pytest tests for the provided function.
REQUIREMENTS:
- Use pytest fixtures for setup and teardown
- Test the happy path first, then edge cases
- Include at least one test for error handling
- Use descriptive test names: test_<function>_<scenario>
- Add brief docstrings explaining what each test verifies
- Use pytest.raises for expected exceptions
- Use parametrize for testing multiple inputs
- Mock external dependencies (database, APIs, filesystem)
- Do NOT import the function being tested. I will add the import.
Output ONLY Python test code. No markdown fences."""
context = self._gather_context(target)
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": (
f"Function to test:\n{target.source_code}\n\n"
f"Context (related types and imports):\n{context}"
)},
],
temperature=0.2,
)
test_code = response.choices[0].message.content.strip()
module_path = self._to_import_path(target.file_path)
import_line = f"from {module_path} import {target.function_name}"
return f"{import_line}\n\n{test_code}"
Validating Test Quality
Generated tests must actually run and assert meaningful things. The agent validates each test suite by executing it and analyzing the results.
def validate_tests(self, test_code: str, target: TestTarget) -> dict:
import tempfile, os
with tempfile.NamedTemporaryFile(
mode="w", suffix=".py", dir=self.test_dir,
prefix="test_gen_", delete=False,
) as f:
f.write(test_code)
test_path = f.name
try:
result = subprocess.run(
["python", "-m", "pytest", test_path, "-v", "--tb=short"],
capture_output=True, text=True, timeout=60,
)
passed = result.returncode == 0
test_count = result.stdout.count(" PASSED")
fail_count = result.stdout.count(" FAILED")
quality = self._assess_quality(test_code)
return {
"passed": passed,
"tests_passed": test_count,
"tests_failed": fail_count,
"output": result.stdout[-2000:],
"quality_score": quality,
}
finally:
os.unlink(test_path)
def _assess_quality(self, test_code: str) -> dict:
has_parametrize = "@pytest.mark.parametrize" in test_code
has_fixtures = "@pytest.fixture" in test_code
has_edge_cases = any(
kw in test_code.lower()
for kw in ["none", "empty", "zero", "negative", "boundary"]
)
assertion_count = test_code.count("assert ")
mock_count = test_code.count("mock") + test_code.count("patch")
return {
"uses_parametrize": has_parametrize,
"uses_fixtures": has_fixtures,
"tests_edge_cases": has_edge_cases,
"assertion_count": assertion_count,
"mock_count": mock_count,
}
Running the Full Pipeline
agent = TestGenerationAgent("./app", "./tests")
untested = agent.find_untested_functions()
print(f"Found {len(untested)} untested functions")
for target in untested[:5]:
print(f"\nGenerating tests for {target.function_name}...")
test_code = agent.generate_tests(target)
validation = agent.validate_tests(test_code, target)
if validation["passed"]:
output_path = f"./tests/test_{target.function_name}.py"
with open(output_path, "w") as f:
f.write(test_code)
print(f" Saved {validation['tests_passed']} passing tests")
else:
print(f" {validation['tests_failed']} tests failed, skipping")
FAQ
How do I prevent the agent from generating trivial tests that just check if a function exists?
The quality assessment step catches this. Check for meaningful assertions — tests that only call a function without asserting on the result score zero on assertion quality. Add a minimum assertion count per test as a threshold before accepting generated tests.
Should AI-generated tests replace hand-written tests?
No. AI-generated tests are best for establishing a baseline of coverage quickly. They catch regressions and document existing behavior. But hand-written tests are still essential for verifying business logic, testing complex interactions, and encoding domain-specific edge cases that the LLM might not infer from code alone.
How do I handle tests that need a real database or external service?
Prompt the agent to use mocks and fixtures for external dependencies by default. For integration tests, provide the agent with your existing conftest.py so it can reuse database fixtures, test clients, and service stubs that are already configured for your project.
#TestGeneration #AIAgents #Python #Pytest #Testing #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.