Self-Reflection in AI Agents: Building Systems That Learn from Mistakes

The Problem with One-Shot Execution

Most AI agents generate a response and move on. If the output is wrong, incomplete, or poorly formatted, the user has to notice the problem and ask for a correction. This is fragile. Humans miss errors, and the feedback loop is slow.

Self-reflection changes this by adding an internal quality check. Before returning a result to the user, the agent evaluates its own output, identifies weaknesses, and improves it — all within the same execution loop. The result is higher quality output with fewer round trips.

The Basic Critique Loop

The simplest self-reflection pattern uses two LLM calls: one to generate, one to critique.

flowchart TD
    START["Self-Reflection in AI Agents: Building Systems Th…"] --> A
    A["The Problem with One-Shot Execution"]
    A --> B
    B["The Basic Critique Loop"]
    B --> C
    C["Score-and-Improve Pattern"]
    C --> D
    D["Retry-with-Feedback for Tool Failures"]
    D --> E
    E["Building a Self-Improving Agent Loop"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

from openai import OpenAI

client = OpenAI()

def generate_with_reflection(task: str, max_reflections: int = 3) -> str:
    # Step 1: Generate initial output
    draft = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a technical writer."},
            {"role": "user", "content": task},
        ],
    ).choices[0].message.content

    for i in range(max_reflections):
        # Step 2: Critique the output
        critique = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": (
                    "You are a critical reviewer. Evaluate the following output for:"
                    "\n1. Factual accuracy"
                    "\n2. Completeness (does it address all aspects of the task?)"
                    "\n3. Clarity and structure"
                    "\n4. Any errors or inconsistencies"
                    "\nIf the output is satisfactory, respond with exactly: APPROVED"
                    "\nOtherwise, list specific improvements needed."
                )},
                {"role": "user", "content": f"Task: {task}\n\nOutput:\n{draft}"},
            ],
        ).choices[0].message.content

        # If approved, return the draft
        if "APPROVED" in critique.upper():
            return draft

        # Step 3: Improve based on critique
        draft = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a technical writer. "
                 "Revise your output based on the feedback provided."},
                {"role": "user", "content": (
                    f"Original task: {task}\n\n"
                    f"Your previous draft:\n{draft}\n\n"
                    f"Reviewer feedback:\n{critique}\n\n"
                    "Please produce an improved version addressing all feedback."
                )},
            ],
        ).choices[0].message.content

    return draft  # Return best attempt after max reflections

Each iteration produces a measurably better output because the critique identifies specific issues that the revision addresses. In practice, most outputs reach "APPROVED" quality within 1-2 reflection cycles.

Score-and-Improve Pattern

For more structured reflection, assign numerical scores to specific quality dimensions. This gives you quantifiable improvement tracking and clearer termination criteria.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import json

def score_and_improve(task: str, output: str, threshold: float = 8.0) -> dict:
    """Score output on multiple dimensions, improve if below threshold."""

    # Score the output
    scoring_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Score the following output on a scale of 1-10 for each dimension. "
                "Return JSON with scores and brief justifications.\n"
                "Dimensions: accuracy, completeness, clarity, actionability"
            )},
            {"role": "user", "content": f"Task: {task}\nOutput: {output}"},
        ],
        response_format={"type": "json_object"},
    )

    scores = json.loads(scoring_response.choices[0].message.content)

    # Calculate average score
    dimensions = ["accuracy", "completeness", "clarity", "actionability"]
    avg_score = sum(scores.get(d, {}).get("score", 0) for d in dimensions) / len(dimensions)

    if avg_score >= threshold:
        return {"output": output, "scores": scores, "improved": False}

    # Identify weak dimensions for targeted improvement
    weak_dims = [d for d in dimensions if scores.get(d, {}).get("score", 0) < threshold]
    feedback = "\n".join(
        f"- {d}: {scores[d].get('justification', 'Needs improvement')}"
        for d in weak_dims
    )

    # Generate improved output focusing on weak areas
    improved = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Improve the output, focusing on the weak areas."},
            {"role": "user", "content": (
                f"Task: {task}\nCurrent output: {output}\n\n"
                f"Areas needing improvement:\n{feedback}"
            )},
        ],
    ).choices[0].message.content

    return {"output": improved, "scores": scores, "improved": True}

Retry-with-Feedback for Tool Failures

Self-reflection is not just for text generation. It is equally powerful for recovering from tool execution failures. Instead of blindly retrying, the agent reflects on why the tool call failed and adjusts its approach.

def reflective_tool_execution(agent_messages, tool_name, tool_args, max_retries=3):
    """Execute a tool with reflective retry on failure."""

    for attempt in range(max_retries):
        result = execute_tool(tool_name, tool_args)

        if "error" not in result:
            return result  # Success

        # Reflect on the failure
        reflection = client.chat.completions.create(
            model="gpt-4o",
            messages=agent_messages + [
                {"role": "system", "content": (
                    f"Your tool call to '{tool_name}' with args {json.dumps(tool_args)} "
                    f"failed with error: {result['error']}\n\n"
                    "Analyze why this failed and suggest corrected arguments. "
                    "Return JSON with 'analysis' and 'corrected_args' fields."
                )},
            ],
            response_format={"type": "json_object"},
        )

        reflection_data = json.loads(reflection.choices[0].message.content)
        tool_args = reflection_data.get("corrected_args", tool_args)

    return {"error": f"Failed after {max_retries} reflective retries"}

Building a Self-Improving Agent Loop

Combining reflection with the standard agent loop creates an agent that continuously improves within a single task execution:

def self_improving_agent(goal: str, tools: list, max_steps: int = 15) -> str:
    messages = [
        {"role": "system", "content": (
            "You are a careful agent. After completing a task, evaluate "
            "your own work before presenting it to the user. If your output "
            "has gaps or errors, fix them before responding."
        )},
        {"role": "user", "content": goal},
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )
        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            # Before returning, add a self-check
            check = client.chat.completions.create(
                model="gpt-4o",
                messages=messages + [{
                    "role": "user",
                    "content": (
                        "Review your response. Is it complete, accurate, and "
                        "fully addresses the original goal? If yes, say FINAL. "
                        "If not, explain what needs fixing."
                    ),
                }],
            ).choices[0].message.content

            if "FINAL" in check.upper():
                return msg.content

            # Continue improving
            messages.append({"role": "user", "content": f"Self-review: {check}. Please improve."})
            continue

        # Execute tool calls
        for tc in msg.tool_calls:
            args = json.loads(tc.function.arguments)
            result = execute_tool(tc.function.name, args)
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result),
            })

    return messages[-1].get("content", "Task incomplete.")

FAQ

Does self-reflection double the cost of every agent call?

Not quite double, because critique prompts are typically shorter than generation prompts. Expect 40-70% additional token cost per reflection cycle. The tradeoff is worth it for high-stakes outputs (reports, code, customer communications) where quality matters more than cost. Skip reflection for low-stakes tasks like simple lookups.

Can the same model effectively critique its own output?

Yes, with caveats. The same model can catch structural issues, missing information, and formatting problems reliably. It is less effective at catching its own factual hallucinations because the same knowledge gaps that caused the error also affect the critique. For critical accuracy requirements, use a separate verification step with tool-based fact checking.

How do I prevent reflection loops that never converge?

Set a strict maximum on reflection cycles (2-3 is usually sufficient). Use the score-and-improve pattern with a numerical threshold so you have an objective stopping criterion. If scores are not improving between iterations, break the loop — further reflection is unlikely to help, and the issue may require a fundamentally different approach.

#SelfReflection #AIAgents #CritiqueLoops #QualityAssurance #Python #AgenticAI #LearnAI #AIEngineering

Self-Reflection in AI Agents: Building Systems That Learn from Mistakes

The Problem with One-Shot Execution

The Basic Critique Loop

Score-and-Improve Pattern

Retry-with-Feedback for Tool Failures

Building a Self-Improving Agent Loop

FAQ

Does self-reflection double the cost of every agent call?

Can the same model effectively critique its own output?

How do I prevent reflection loops that never converge?

Try CallSphere AI Voice Agents

Related Articles

Building an AI Agent with Tool-Use Chains: Sequential Tool Orchestration for Complex Tasks

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Building a Hypothesis-Testing Agent: Scientific Method Applied to Data Analysis