Building a Bug Fixing Agent: Automated Diagnosis and Repair of Code Issues

From Error Message to Working Fix

When a bug is reported, a developer follows a predictable workflow: read the error, find the relevant code, understand what went wrong, write a fix, and verify it does not break anything else. A bug fixing agent automates this entire loop. It takes an error message or test failure as input, traces the root cause through your codebase, generates a patch, and runs your test suite to confirm the fix.

The Bug Fixing Pipeline

The agent operates in four phases: error analysis, code localization, fix generation, and regression testing.

import os
import subprocess
from dataclasses import dataclass
from openai import OpenAI

client = OpenAI()

@dataclass
class BugReport:
    error_message: str
    stack_trace: str | None = None
    failing_test: str | None = None
    reproduction_steps: str | None = None

@dataclass
class BugFix:
    file_path: str
    original_code: str
    fixed_code: str
    explanation: str
    root_cause: str
    confidence: float

class BugFixingAgent:
    def __init__(self, project_dir: str, model: str = "gpt-4o"):
        self.project_dir = project_dir
        self.model = model

    def diagnose_and_fix(self, report: BugReport) -> BugFix | None:
        root_cause = self._analyze_error(report)
        relevant_files = self._locate_code(root_cause, report)
        fix = self._generate_fix(root_cause, relevant_files, report)

        if fix and self._verify_fix(fix, report):
            return fix
        return None

Error Analysis: Understanding What Went Wrong

The first step is parsing the error to understand the category of bug and where it likely originates.

def _analyze_error(self, report: BugReport) -> dict:
    context = f"Error: {report.error_message}"
    if report.stack_trace:
        context += f"\n\nStack trace:\n{report.stack_trace}"
    if report.reproduction_steps:
        context += f"\n\nReproduction steps:\n{report.reproduction_steps}"

    response = client.chat.completions.create(
        model=self.model,
        messages=[
            {"role": "system", "content": """Analyze this error report and
identify the root cause. Return JSON with:
- "error_type": category (e.g., TypeError, logic_error, race_condition)
- "root_cause": one-sentence explanation of why this happens
- "likely_files": list of file patterns to search
- "likely_functions": list of function names involved
- "search_terms": list of strings to grep for in the codebase"""},
            {"role": "user", "content": context},
        ],
        temperature=0,
        response_format={"type": "json_object"},
    )

    import json
    return json.loads(response.choices[0].message.content)

The structured output tells the next phase exactly where to look in the codebase.

Code Localization: Finding the Bug

With search terms from the analysis, the agent locates the relevant source files.

def _locate_code(self, analysis: dict, report: BugReport) -> dict:
    relevant_code = {}

    if report.stack_trace:
        for line in report.stack_trace.split("\n"):
            if "File " in line and self.project_dir in line:
                parts = line.strip().split('"')
                if len(parts) >= 2:
                    file_path = parts[1]
                    if os.path.exists(file_path):
                        with open(file_path) as f:
                            relevant_code[file_path] = f.read()

    for term in analysis.get("search_terms", []):
        result = subprocess.run(
            ["grep", "-rl", term, self.project_dir,
             "--include=*.py", "--exclude-dir=__pycache__"],
            capture_output=True, text=True,
        )
        for file_path in result.stdout.strip().split("\n"):
            if file_path and file_path not in relevant_code:
                with open(file_path) as f:
                    relevant_code[file_path] = f.read()

    return relevant_code

The agent combines stack trace file references with grep-based search to build a complete picture of the relevant code.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Fix Generation

With the root cause understood and the relevant code loaded, the agent generates a targeted fix.

def _generate_fix(
    self, analysis: dict, code_files: dict, report: BugReport
) -> BugFix | None:
    files_context = ""
    for path, content in code_files.items():
        files_context += f"\n--- {path} ---\n{content}\n"

    response = client.chat.completions.create(
        model=self.model,
        messages=[
            {"role": "system", "content": """You are a senior developer
fixing a bug. Based on the error analysis and source code, generate
a minimal fix. Return JSON with:
- "file_path": which file to modify
- "original_code": exact code to replace (copy precisely)
- "fixed_code": the corrected code
- "explanation": what the fix does
- "root_cause": why the original code was wrong
- "confidence": 0.0 to 1.0 how confident you are

IMPORTANT: original_code must be an EXACT match of existing code.
Make the smallest change possible. Do not refactor unrelated code."""},
            {"role": "user", "content": (
                f"Error analysis: {analysis}\n\n"
                f"Error: {report.error_message}\n\n"
                f"Source files:\n{files_context}"
            )},
        ],
        temperature=0,
        response_format={"type": "json_object"},
    )

    import json
    data = json.loads(response.choices[0].message.content)
    return BugFix(**data)

Regression Testing

The fix is applied temporarily and the test suite runs to ensure nothing else breaks.

def _verify_fix(self, fix: BugFix, report: BugReport) -> bool:
    with open(fix.file_path) as f:
        original_content = f.read()

    if fix.original_code not in original_content:
        return False

    patched = original_content.replace(
        fix.original_code, fix.fixed_code, 1
    )

    try:
        with open(fix.file_path, "w") as f:
            f.write(patched)

        cmd = ["python", "-m", "pytest", "--tb=short", "-q"]
        if report.failing_test:
            cmd.append(report.failing_test)

        result = subprocess.run(
            cmd, capture_output=True, text=True,
            timeout=120, cwd=self.project_dir,
        )
        return result.returncode == 0
    finally:
        with open(fix.file_path, "w") as f:
            f.write(original_content)

The original file is always restored in the finally block, so even if the fix fails, your codebase is unchanged. The fix is only committed if all tests pass.

FAQ

How do I prevent the agent from making changes that break other parts of the code?

The regression testing step catches this. Run the full test suite, not just the failing test. If any test that was passing before now fails, reject the fix. For extra safety, use git to create a temporary branch, apply the fix, and run tests in isolation.

What if the bug is in a dependency rather than my own code?

The error analysis step should detect this. If all search terms point to code inside site-packages or a third-party library, the agent reports that the root cause is external and suggests a workaround or version pin instead of trying to modify library code.

Can this agent handle intermittent or flaky bugs?

Intermittent bugs like race conditions are harder because they may not reproduce on a single test run. For these, extend the agent to run the failing test multiple times and to analyze thread or async patterns in the code. The analysis prompt can specifically look for shared mutable state, missing locks, or unguarded async operations.

#BugFixing #AIAgents #Python #Debugging #AutomatedRepair #AgenticAI #LearnAI #AIEngineering

Building a Bug Fixing Agent: Automated Diagnosis and Repair of Code Issues

From Error Message to Working Fix

The Bug Fixing Pipeline

Error Analysis: Understanding What Went Wrong

Code Localization: Finding the Bug

Fix Generation

Regression Testing

FAQ

How do I prevent the agent from making changes that break other parts of the code?

What if the bug is in a dependency rather than my own code?

Can this agent handle intermittent or flaky bugs?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding