API Error Design for AI Agent Services: Problem Details, Error Codes, and Retry Hints
Design machine-readable API error responses for AI agents using RFC 7807 Problem Details, structured error codes, and retry hints. Build error responses that agents can parse and act on programmatically.
Why Error Design Matters More for AI Agents
When a human encounters an API error, they read the message, understand the context, and decide what to do. An AI agent has none of that intuition. It needs structured, machine-readable error responses that tell it exactly what went wrong, whether to retry, and how long to wait. Poor error design turns every transient failure into a hard failure for autonomous agents.
The best API error format for AI agents follows RFC 7807 (Problem Details for HTTP APIs), augmented with agent-specific fields like retry hints and error taxonomies.
RFC 7807 Problem Details Format
RFC 7807 defines a standard JSON structure for API errors. It includes a type URI for machine identification, a human-readable title and detail, the HTTP status code, and an optional instance URI pointing to the specific occurrence.
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from fastapi.exceptions import RequestValidationError
from pydantic import BaseModel
app = FastAPI()
class ProblemDetail(BaseModel):
type: str
title: str
status: int
detail: str
instance: str | None = None
# Agent-specific extensions
error_code: str | None = None
retryable: bool = False
retry_after_seconds: int | None = None
def problem_response(
status: int,
error_type: str,
title: str,
detail: str,
error_code: str | None = None,
retryable: bool = False,
retry_after: int | None = None,
instance: str | None = None,
) -> JSONResponse:
body = ProblemDetail(
type=f"https://api.example.com/errors/{error_type}",
title=title,
status=status,
detail=detail,
instance=instance,
error_code=error_code,
retryable=retryable,
retry_after_seconds=retry_after,
)
headers = {}
if retry_after is not None:
headers["Retry-After"] = str(retry_after)
return JSONResponse(
status_code=status,
content=body.model_dump(exclude_none=True),
media_type="application/problem+json",
headers=headers,
)
Error Taxonomy for AI Agent Services
Define a clear error taxonomy so agents can programmatically classify errors and decide on the appropriate recovery strategy.
class ErrorCodes:
# Authentication & Authorization
AUTH_TOKEN_EXPIRED = "auth.token_expired"
AUTH_TOKEN_INVALID = "auth.token_invalid"
AUTH_INSUFFICIENT_SCOPE = "auth.insufficient_scope"
# Rate Limiting
RATE_LIMIT_EXCEEDED = "rate_limit.exceeded"
RATE_LIMIT_TOKENS = "rate_limit.token_budget_exceeded"
# Model Errors
MODEL_OVERLOADED = "model.overloaded"
MODEL_NOT_FOUND = "model.not_found"
MODEL_CONTEXT_LENGTH = "model.context_length_exceeded"
# Validation
VALIDATION_FAILED = "validation.failed"
VALIDATION_CONTENT_FILTER = "validation.content_filter"
# Resource Errors
RESOURCE_NOT_FOUND = "resource.not_found"
RESOURCE_CONFLICT = "resource.conflict"
RESOURCE_QUOTA_EXCEEDED = "resource.quota_exceeded"
# Internal
INTERNAL_ERROR = "internal.error"
INTERNAL_TIMEOUT = "internal.timeout"
Applying the Error Pattern to Endpoints
Here is how these error responses look in practice across common failure scenarios in an AI agent API.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
@app.post("/v1/chat/completions")
async def chat_completions(request: dict):
model = request.get("model")
messages = request.get("messages", [])
if not model:
return problem_response(
status=422,
error_type="validation-error",
title="Validation Failed",
detail="The 'model' field is required.",
error_code=ErrorCodes.VALIDATION_FAILED,
)
token_count = estimate_tokens(messages)
if token_count > 128000:
return problem_response(
status=400,
error_type="context-length-exceeded",
title="Context Length Exceeded",
detail=(
f"Request contains {token_count} tokens, "
f"exceeding the model maximum of 128000."
),
error_code=ErrorCodes.MODEL_CONTEXT_LENGTH,
)
if is_rate_limited(request):
return problem_response(
status=429,
error_type="rate-limit-exceeded",
title="Rate Limit Exceeded",
detail="You have exceeded 100 requests per minute.",
error_code=ErrorCodes.RATE_LIMIT_EXCEEDED,
retryable=True,
retry_after=30,
)
try:
result = await call_llm(model, messages)
return result
except ModelOverloadedError:
return problem_response(
status=503,
error_type="model-overloaded",
title="Model Overloaded",
detail="The model is currently at capacity. Please retry.",
error_code=ErrorCodes.MODEL_OVERLOADED,
retryable=True,
retry_after=5,
)
Global Exception Handlers
Register global exception handlers to ensure every error follows the Problem Details format, even unhandled exceptions.
@app.exception_handler(RequestValidationError)
async def validation_exception_handler(
request: Request, exc: RequestValidationError
):
errors = exc.errors()
detail_parts = []
for err in errors:
field = " -> ".join(str(loc) for loc in err["loc"])
detail_parts.append(f"{field}: {err['msg']}")
return problem_response(
status=422,
error_type="validation-error",
title="Request Validation Failed",
detail="; ".join(detail_parts),
error_code=ErrorCodes.VALIDATION_FAILED,
)
@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
# Log the full exception internally
import logging
logging.exception("Unhandled exception")
return problem_response(
status=500,
error_type="internal-error",
title="Internal Server Error",
detail="An unexpected error occurred. Please retry or contact support.",
error_code=ErrorCodes.INTERNAL_ERROR,
retryable=True,
retry_after=10,
)
Client-Side Error Handling for Agents
On the agent side, the structured error format enables intelligent retry logic.
import httpx
async def call_with_retry(url: str, body: dict, max_retries: int = 3):
for attempt in range(max_retries + 1):
response = await httpx.AsyncClient().post(url, json=body)
if response.status_code < 400:
return response.json()
error = response.json()
retryable = error.get("retryable", False)
retry_after = error.get("retry_after_seconds", 2 ** attempt)
if not retryable or attempt == max_retries:
raise AgentAPIError(
code=error.get("error_code"),
detail=error.get("detail"),
status=response.status_code,
)
await asyncio.sleep(retry_after)
FAQ
Why use RFC 7807 instead of a custom error format?
RFC 7807 is an IETF standard that most HTTP client libraries and API gateways understand. Using it means your errors work with existing tooling out of the box. The application/problem+json media type signals to clients that the response follows a known structure. You can extend it with custom fields like retryable and error_code without breaking the standard.
How should AI agents decide whether to retry an error?
Agents should check the retryable field first. If true, use the retry_after_seconds value as the delay. If the field is absent, use HTTP status code heuristics: 429 (rate limit) and 503 (service unavailable) are generally retryable; 400, 401, 403, 404, and 422 are not. Always cap retries with a maximum attempt count and total timeout to prevent infinite retry loops.
Should I include stack traces in error responses?
Never in production. Stack traces expose internal implementation details, file paths, library versions, and potentially sensitive data. Log the full stack trace server-side with a correlation ID, and include that correlation ID in the instance field of the Problem Details response so your support team can locate the relevant logs.
#APIErrorDesign #RFC7807 #ErrorHandling #FastAPI #AIAgents #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.