Building a Bug Fixing Agent: Automated Diagnosis and Repair of Code Issues
Build an AI agent that takes error messages, traces through code to find root causes, generates fixes, and verifies them with regression tests. A practical guide to automated debugging.
From Error Message to Working Fix
When a bug is reported, a developer follows a predictable workflow: read the error, find the relevant code, understand what went wrong, write a fix, and verify it does not break anything else. A bug fixing agent automates this entire loop. It takes an error message or test failure as input, traces the root cause through your codebase, generates a patch, and runs your test suite to confirm the fix.
The Bug Fixing Pipeline
The agent operates in four phases: error analysis, code localization, fix generation, and regression testing.
import os
import subprocess
from dataclasses import dataclass
from openai import OpenAI
client = OpenAI()
@dataclass
class BugReport:
error_message: str
stack_trace: str | None = None
failing_test: str | None = None
reproduction_steps: str | None = None
@dataclass
class BugFix:
file_path: str
original_code: str
fixed_code: str
explanation: str
root_cause: str
confidence: float
class BugFixingAgent:
def __init__(self, project_dir: str, model: str = "gpt-4o"):
self.project_dir = project_dir
self.model = model
def diagnose_and_fix(self, report: BugReport) -> BugFix | None:
root_cause = self._analyze_error(report)
relevant_files = self._locate_code(root_cause, report)
fix = self._generate_fix(root_cause, relevant_files, report)
if fix and self._verify_fix(fix, report):
return fix
return None
Error Analysis: Understanding What Went Wrong
The first step is parsing the error to understand the category of bug and where it likely originates.
def _analyze_error(self, report: BugReport) -> dict:
context = f"Error: {report.error_message}"
if report.stack_trace:
context += f"\n\nStack trace:\n{report.stack_trace}"
if report.reproduction_steps:
context += f"\n\nReproduction steps:\n{report.reproduction_steps}"
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": """Analyze this error report and
identify the root cause. Return JSON with:
- "error_type": category (e.g., TypeError, logic_error, race_condition)
- "root_cause": one-sentence explanation of why this happens
- "likely_files": list of file patterns to search
- "likely_functions": list of function names involved
- "search_terms": list of strings to grep for in the codebase"""},
{"role": "user", "content": context},
],
temperature=0,
response_format={"type": "json_object"},
)
import json
return json.loads(response.choices[0].message.content)
The structured output tells the next phase exactly where to look in the codebase.
Code Localization: Finding the Bug
With search terms from the analysis, the agent locates the relevant source files.
def _locate_code(self, analysis: dict, report: BugReport) -> dict:
relevant_code = {}
if report.stack_trace:
for line in report.stack_trace.split("\n"):
if "File " in line and self.project_dir in line:
parts = line.strip().split('"')
if len(parts) >= 2:
file_path = parts[1]
if os.path.exists(file_path):
with open(file_path) as f:
relevant_code[file_path] = f.read()
for term in analysis.get("search_terms", []):
result = subprocess.run(
["grep", "-rl", term, self.project_dir,
"--include=*.py", "--exclude-dir=__pycache__"],
capture_output=True, text=True,
)
for file_path in result.stdout.strip().split("\n"):
if file_path and file_path not in relevant_code:
with open(file_path) as f:
relevant_code[file_path] = f.read()
return relevant_code
The agent combines stack trace file references with grep-based search to build a complete picture of the relevant code.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Fix Generation
With the root cause understood and the relevant code loaded, the agent generates a targeted fix.
def _generate_fix(
self, analysis: dict, code_files: dict, report: BugReport
) -> BugFix | None:
files_context = ""
for path, content in code_files.items():
files_context += f"\n--- {path} ---\n{content}\n"
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": """You are a senior developer
fixing a bug. Based on the error analysis and source code, generate
a minimal fix. Return JSON with:
- "file_path": which file to modify
- "original_code": exact code to replace (copy precisely)
- "fixed_code": the corrected code
- "explanation": what the fix does
- "root_cause": why the original code was wrong
- "confidence": 0.0 to 1.0 how confident you are
IMPORTANT: original_code must be an EXACT match of existing code.
Make the smallest change possible. Do not refactor unrelated code."""},
{"role": "user", "content": (
f"Error analysis: {analysis}\n\n"
f"Error: {report.error_message}\n\n"
f"Source files:\n{files_context}"
)},
],
temperature=0,
response_format={"type": "json_object"},
)
import json
data = json.loads(response.choices[0].message.content)
return BugFix(**data)
Regression Testing
The fix is applied temporarily and the test suite runs to ensure nothing else breaks.
def _verify_fix(self, fix: BugFix, report: BugReport) -> bool:
with open(fix.file_path) as f:
original_content = f.read()
if fix.original_code not in original_content:
return False
patched = original_content.replace(
fix.original_code, fix.fixed_code, 1
)
try:
with open(fix.file_path, "w") as f:
f.write(patched)
cmd = ["python", "-m", "pytest", "--tb=short", "-q"]
if report.failing_test:
cmd.append(report.failing_test)
result = subprocess.run(
cmd, capture_output=True, text=True,
timeout=120, cwd=self.project_dir,
)
return result.returncode == 0
finally:
with open(fix.file_path, "w") as f:
f.write(original_content)
The original file is always restored in the finally block, so even if the fix fails, your codebase is unchanged. The fix is only committed if all tests pass.
FAQ
How do I prevent the agent from making changes that break other parts of the code?
The regression testing step catches this. Run the full test suite, not just the failing test. If any test that was passing before now fails, reject the fix. For extra safety, use git to create a temporary branch, apply the fix, and run tests in isolation.
What if the bug is in a dependency rather than my own code?
The error analysis step should detect this. If all search terms point to code inside site-packages or a third-party library, the agent reports that the root cause is external and suggests a workaround or version pin instead of trying to modify library code.
Can this agent handle intermittent or flaky bugs?
Intermittent bugs like race conditions are harder because they may not reproduce on a single test run. For these, extend the agent to run the failing test multiple times and to analyze thread or async patterns in the code. The analysis prompt can specifically look for shared mutable state, missing locks, or unguarded async operations.
#BugFixing #AIAgents #Python #Debugging #AutomatedRepair #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.