Self-Reflection in AI Agents: Building Systems That Learn from Mistakes
Explore how self-reflection transforms AI agents from one-shot executors into iterative improvers — covering critique loops, retry-with-feedback, score-and-improve patterns, and practical Python implementations.
The Problem with One-Shot Execution
Most AI agents generate a response and move on. If the output is wrong, incomplete, or poorly formatted, the user has to notice the problem and ask for a correction. This is fragile. Humans miss errors, and the feedback loop is slow.
Self-reflection changes this by adding an internal quality check. Before returning a result to the user, the agent evaluates its own output, identifies weaknesses, and improves it — all within the same execution loop. The result is higher quality output with fewer round trips.
The Basic Critique Loop
The simplest self-reflection pattern uses two LLM calls: one to generate, one to critique.
flowchart TD
START["Self-Reflection in AI Agents: Building Systems Th…"] --> A
A["The Problem with One-Shot Execution"]
A --> B
B["The Basic Critique Loop"]
B --> C
C["Score-and-Improve Pattern"]
C --> D
D["Retry-with-Feedback for Tool Failures"]
D --> E
E["Building a Self-Improving Agent Loop"]
E --> F
F["FAQ"]
F --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from openai import OpenAI
client = OpenAI()
def generate_with_reflection(task: str, max_reflections: int = 3) -> str:
# Step 1: Generate initial output
draft = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a technical writer."},
{"role": "user", "content": task},
],
).choices[0].message.content
for i in range(max_reflections):
# Step 2: Critique the output
critique = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"You are a critical reviewer. Evaluate the following output for:"
"\n1. Factual accuracy"
"\n2. Completeness (does it address all aspects of the task?)"
"\n3. Clarity and structure"
"\n4. Any errors or inconsistencies"
"\nIf the output is satisfactory, respond with exactly: APPROVED"
"\nOtherwise, list specific improvements needed."
)},
{"role": "user", "content": f"Task: {task}\n\nOutput:\n{draft}"},
],
).choices[0].message.content
# If approved, return the draft
if "APPROVED" in critique.upper():
return draft
# Step 3: Improve based on critique
draft = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a technical writer. "
"Revise your output based on the feedback provided."},
{"role": "user", "content": (
f"Original task: {task}\n\n"
f"Your previous draft:\n{draft}\n\n"
f"Reviewer feedback:\n{critique}\n\n"
"Please produce an improved version addressing all feedback."
)},
],
).choices[0].message.content
return draft # Return best attempt after max reflections
Each iteration produces a measurably better output because the critique identifies specific issues that the revision addresses. In practice, most outputs reach "APPROVED" quality within 1-2 reflection cycles.
Score-and-Improve Pattern
For more structured reflection, assign numerical scores to specific quality dimensions. This gives you quantifiable improvement tracking and clearer termination criteria.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import json
def score_and_improve(task: str, output: str, threshold: float = 8.0) -> dict:
"""Score output on multiple dimensions, improve if below threshold."""
# Score the output
scoring_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"Score the following output on a scale of 1-10 for each dimension. "
"Return JSON with scores and brief justifications.\n"
"Dimensions: accuracy, completeness, clarity, actionability"
)},
{"role": "user", "content": f"Task: {task}\nOutput: {output}"},
],
response_format={"type": "json_object"},
)
scores = json.loads(scoring_response.choices[0].message.content)
# Calculate average score
dimensions = ["accuracy", "completeness", "clarity", "actionability"]
avg_score = sum(scores.get(d, {}).get("score", 0) for d in dimensions) / len(dimensions)
if avg_score >= threshold:
return {"output": output, "scores": scores, "improved": False}
# Identify weak dimensions for targeted improvement
weak_dims = [d for d in dimensions if scores.get(d, {}).get("score", 0) < threshold]
feedback = "\n".join(
f"- {d}: {scores[d].get('justification', 'Needs improvement')}"
for d in weak_dims
)
# Generate improved output focusing on weak areas
improved = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Improve the output, focusing on the weak areas."},
{"role": "user", "content": (
f"Task: {task}\nCurrent output: {output}\n\n"
f"Areas needing improvement:\n{feedback}"
)},
],
).choices[0].message.content
return {"output": improved, "scores": scores, "improved": True}
Retry-with-Feedback for Tool Failures
Self-reflection is not just for text generation. It is equally powerful for recovering from tool execution failures. Instead of blindly retrying, the agent reflects on why the tool call failed and adjusts its approach.
def reflective_tool_execution(agent_messages, tool_name, tool_args, max_retries=3):
"""Execute a tool with reflective retry on failure."""
for attempt in range(max_retries):
result = execute_tool(tool_name, tool_args)
if "error" not in result:
return result # Success
# Reflect on the failure
reflection = client.chat.completions.create(
model="gpt-4o",
messages=agent_messages + [
{"role": "system", "content": (
f"Your tool call to '{tool_name}' with args {json.dumps(tool_args)} "
f"failed with error: {result['error']}\n\n"
"Analyze why this failed and suggest corrected arguments. "
"Return JSON with 'analysis' and 'corrected_args' fields."
)},
],
response_format={"type": "json_object"},
)
reflection_data = json.loads(reflection.choices[0].message.content)
tool_args = reflection_data.get("corrected_args", tool_args)
return {"error": f"Failed after {max_retries} reflective retries"}
Building a Self-Improving Agent Loop
Combining reflection with the standard agent loop creates an agent that continuously improves within a single task execution:
def self_improving_agent(goal: str, tools: list, max_steps: int = 15) -> str:
messages = [
{"role": "system", "content": (
"You are a careful agent. After completing a task, evaluate "
"your own work before presenting it to the user. If your output "
"has gaps or errors, fix them before responding."
)},
{"role": "user", "content": goal},
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
)
msg = response.choices[0].message
messages.append(msg)
if not msg.tool_calls:
# Before returning, add a self-check
check = client.chat.completions.create(
model="gpt-4o",
messages=messages + [{
"role": "user",
"content": (
"Review your response. Is it complete, accurate, and "
"fully addresses the original goal? If yes, say FINAL. "
"If not, explain what needs fixing."
),
}],
).choices[0].message.content
if "FINAL" in check.upper():
return msg.content
# Continue improving
messages.append({"role": "user", "content": f"Self-review: {check}. Please improve."})
continue
# Execute tool calls
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments)
result = execute_tool(tc.function.name, args)
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result),
})
return messages[-1].get("content", "Task incomplete.")
FAQ
Does self-reflection double the cost of every agent call?
Not quite double, because critique prompts are typically shorter than generation prompts. Expect 40-70% additional token cost per reflection cycle. The tradeoff is worth it for high-stakes outputs (reports, code, customer communications) where quality matters more than cost. Skip reflection for low-stakes tasks like simple lookups.
Can the same model effectively critique its own output?
Yes, with caveats. The same model can catch structural issues, missing information, and formatting problems reliably. It is less effective at catching its own factual hallucinations because the same knowledge gaps that caused the error also affect the critique. For critical accuracy requirements, use a separate verification step with tool-based fact checking.
How do I prevent reflection loops that never converge?
Set a strict maximum on reflection cycles (2-3 is usually sufficient). Use the score-and-improve pattern with a numerical threshold so you have an objective stopping criterion. If scores are not improving between iterations, break the loop — further reflection is unlikely to help, and the issue may require a fundamentally different approach.
#SelfReflection #AIAgents #CritiqueLoops #QualityAssurance #Python #AgenticAI #LearnAI #AIEngineering
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.