Debugging Tool Call Failures: Tracing Why Agent Tools Return Errors or Wrong Results
Master techniques for diagnosing tool call failures in AI agents, from call logging and parameter inspection to mock execution and replay testing for reliable tool integrations.
Tools Are the Hands of Your Agent
AI agents do not just generate text — they act. They call APIs, query databases, read files, and execute business logic through tool functions. When a tool call fails, the agent either retries blindly, hallucinates a result, or gives up entirely. None of these outcomes are acceptable in production.
Debugging tool call failures requires visibility into what the model requested, what parameters it sent, and what the tool function actually received and returned.
Building a Tool Call Interceptor
The first step is to wrap your tool execution with comprehensive logging. This interceptor captures every detail of the tool call lifecycle:
import json
import time
import traceback
from typing import Any, Callable
from dataclasses import dataclass, field
@dataclass
class ToolCallRecord:
tool_name: str
arguments: dict
result: Any = None
error: str | None = None
duration_ms: float = 0
timestamp: float = field(default_factory=time.time)
class ToolDebugger:
def __init__(self):
self.call_history: list[ToolCallRecord] = []
def wrap(self, tool_fn: Callable, tool_name: str) -> Callable:
async def wrapper(**kwargs):
record = ToolCallRecord(
tool_name=tool_name,
arguments=kwargs,
)
start = time.perf_counter()
try:
result = await tool_fn(**kwargs)
record.result = result
record.duration_ms = (time.perf_counter() - start) * 1000
return result
except Exception as e:
record.error = f"{type(e).__name__}: {e}"
record.duration_ms = (time.perf_counter() - start) * 1000
raise
finally:
self.call_history.append(record)
return wrapper
def print_history(self):
for i, rec in enumerate(self.call_history):
status = "OK" if rec.error is None else f"FAIL: {rec.error}"
print(f"[{i}] {rec.tool_name} ({rec.duration_ms:.0f}ms) -> {status}")
print(f" Args: {json.dumps(rec.arguments, indent=2)}")
Inspecting Parameter Mismatches
The most common tool call failure is a parameter mismatch. The model sends arguments that do not match what the function expects. This happens when tool descriptions are ambiguous:
from agents import function_tool
# Bad: ambiguous parameter name
@function_tool
def search_orders(query: str) -> str:
"""Search customer orders."""
# Model might send a natural language query OR an order ID
pass
# Good: explicit parameters with clear types
@function_tool
def search_orders(
customer_email: str,
status: str = "all",
limit: int = 10,
) -> str:
"""Search orders by customer email.
Args:
customer_email: The customer email address to search for.
status: Filter by status. One of: all, pending, shipped, delivered.
limit: Maximum number of results to return. Default 10.
"""
pass
When parameter mismatches occur, compare what the model sent against your function signature. Log the raw tool_calls from the API response:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
async def inspect_tool_calls(response):
for choice in response.choices:
msg = choice.message
if msg.tool_calls:
for tc in msg.tool_calls:
print(f"Tool: {tc.function.name}")
print(f"Raw args: {tc.function.arguments}")
try:
parsed = json.loads(tc.function.arguments)
print(f"Parsed: {json.dumps(parsed, indent=2)}")
except json.JSONDecodeError as e:
print(f"INVALID JSON: {e}")
Replay Testing
Once you have captured a failed tool call, replay it in isolation to confirm the root cause:
class ToolReplayTester:
def __init__(self, debugger: ToolDebugger):
self.debugger = debugger
async def replay(self, index: int, tool_registry: dict):
record = self.debugger.call_history[index]
tool_fn = tool_registry.get(record.tool_name)
if not tool_fn:
print(f"Tool '{record.tool_name}' not found in registry")
return
print(f"Replaying: {record.tool_name}")
print(f"With args: {json.dumps(record.arguments, indent=2)}")
try:
result = await tool_fn(**record.arguments)
print(f"Result: {result}")
except Exception as e:
print(f"Error: {e}")
traceback.print_exc()
Mock Execution for Isolation
When a tool depends on external services, create mock versions that return controlled data. This isolates whether the failure is in your tool logic or the external dependency:
def create_mock_tool(tool_name: str, mock_response: Any):
async def mock_fn(**kwargs):
print(f"[MOCK] {tool_name} called with: {kwargs}")
return mock_response
return mock_fn
# Replace real tools with mocks for debugging
tool_registry = {
"search_orders": create_mock_tool(
"search_orders",
{"orders": [{"id": "123", "status": "shipped"}]},
),
"send_email": create_mock_tool(
"send_email",
{"sent": True, "message_id": "mock-001"},
),
}
FAQ
Why does the model sometimes send invalid JSON in tool call arguments?
This typically happens with older or smaller models when tool schemas are complex. Use strict mode in your function definitions if your API supports it, which forces the model to produce valid JSON matching your schema. Also simplify parameter types — avoid deeply nested objects when flat parameters work.
How do I handle the case where the model calls a tool with correct parameters but the tool returns unexpected results?
Add assertion-style checks inside your tool functions that validate the result before returning it. Log both the input parameters and the raw result from any external API your tool calls. This creates an audit trail that shows exactly where the data transformation went wrong.
Should I let the agent retry failed tool calls automatically?
Yes, but with limits. Allow one or two retries for transient failures like network timeouts. For parameter errors, return a clear error message describing what went wrong so the model can self-correct its arguments. Never allow unlimited retries as this wastes tokens and can cause infinite loops.
#Debugging #ToolCalling #AIAgents #Testing #Troubleshooting #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.