Skip to content
Learn Agentic AI10 min read0 views

API Error Design for AI Agent Services: Problem Details, Error Codes, and Retry Hints

Design machine-readable API error responses for AI agents using RFC 7807 Problem Details, structured error codes, and retry hints. Build error responses that agents can parse and act on programmatically.

Why Error Design Matters More for AI Agents

When a human encounters an API error, they read the message, understand the context, and decide what to do. An AI agent has none of that intuition. It needs structured, machine-readable error responses that tell it exactly what went wrong, whether to retry, and how long to wait. Poor error design turns every transient failure into a hard failure for autonomous agents.

The best API error format for AI agents follows RFC 7807 (Problem Details for HTTP APIs), augmented with agent-specific fields like retry hints and error taxonomies.

RFC 7807 Problem Details Format

RFC 7807 defines a standard JSON structure for API errors. It includes a type URI for machine identification, a human-readable title and detail, the HTTP status code, and an optional instance URI pointing to the specific occurrence.

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from fastapi.exceptions import RequestValidationError
from pydantic import BaseModel

app = FastAPI()

class ProblemDetail(BaseModel):
    type: str
    title: str
    status: int
    detail: str
    instance: str | None = None
    # Agent-specific extensions
    error_code: str | None = None
    retryable: bool = False
    retry_after_seconds: int | None = None

def problem_response(
    status: int,
    error_type: str,
    title: str,
    detail: str,
    error_code: str | None = None,
    retryable: bool = False,
    retry_after: int | None = None,
    instance: str | None = None,
) -> JSONResponse:
    body = ProblemDetail(
        type=f"https://api.example.com/errors/{error_type}",
        title=title,
        status=status,
        detail=detail,
        instance=instance,
        error_code=error_code,
        retryable=retryable,
        retry_after_seconds=retry_after,
    )
    headers = {}
    if retry_after is not None:
        headers["Retry-After"] = str(retry_after)

    return JSONResponse(
        status_code=status,
        content=body.model_dump(exclude_none=True),
        media_type="application/problem+json",
        headers=headers,
    )

Error Taxonomy for AI Agent Services

Define a clear error taxonomy so agents can programmatically classify errors and decide on the appropriate recovery strategy.

class ErrorCodes:
    # Authentication & Authorization
    AUTH_TOKEN_EXPIRED = "auth.token_expired"
    AUTH_TOKEN_INVALID = "auth.token_invalid"
    AUTH_INSUFFICIENT_SCOPE = "auth.insufficient_scope"

    # Rate Limiting
    RATE_LIMIT_EXCEEDED = "rate_limit.exceeded"
    RATE_LIMIT_TOKENS = "rate_limit.token_budget_exceeded"

    # Model Errors
    MODEL_OVERLOADED = "model.overloaded"
    MODEL_NOT_FOUND = "model.not_found"
    MODEL_CONTEXT_LENGTH = "model.context_length_exceeded"

    # Validation
    VALIDATION_FAILED = "validation.failed"
    VALIDATION_CONTENT_FILTER = "validation.content_filter"

    # Resource Errors
    RESOURCE_NOT_FOUND = "resource.not_found"
    RESOURCE_CONFLICT = "resource.conflict"
    RESOURCE_QUOTA_EXCEEDED = "resource.quota_exceeded"

    # Internal
    INTERNAL_ERROR = "internal.error"
    INTERNAL_TIMEOUT = "internal.timeout"

Applying the Error Pattern to Endpoints

Here is how these error responses look in practice across common failure scenarios in an AI agent API.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

@app.post("/v1/chat/completions")
async def chat_completions(request: dict):
    model = request.get("model")
    messages = request.get("messages", [])

    if not model:
        return problem_response(
            status=422,
            error_type="validation-error",
            title="Validation Failed",
            detail="The 'model' field is required.",
            error_code=ErrorCodes.VALIDATION_FAILED,
        )

    token_count = estimate_tokens(messages)
    if token_count > 128000:
        return problem_response(
            status=400,
            error_type="context-length-exceeded",
            title="Context Length Exceeded",
            detail=(
                f"Request contains {token_count} tokens, "
                f"exceeding the model maximum of 128000."
            ),
            error_code=ErrorCodes.MODEL_CONTEXT_LENGTH,
        )

    if is_rate_limited(request):
        return problem_response(
            status=429,
            error_type="rate-limit-exceeded",
            title="Rate Limit Exceeded",
            detail="You have exceeded 100 requests per minute.",
            error_code=ErrorCodes.RATE_LIMIT_EXCEEDED,
            retryable=True,
            retry_after=30,
        )

    try:
        result = await call_llm(model, messages)
        return result
    except ModelOverloadedError:
        return problem_response(
            status=503,
            error_type="model-overloaded",
            title="Model Overloaded",
            detail="The model is currently at capacity. Please retry.",
            error_code=ErrorCodes.MODEL_OVERLOADED,
            retryable=True,
            retry_after=5,
        )

Global Exception Handlers

Register global exception handlers to ensure every error follows the Problem Details format, even unhandled exceptions.

@app.exception_handler(RequestValidationError)
async def validation_exception_handler(
    request: Request, exc: RequestValidationError
):
    errors = exc.errors()
    detail_parts = []
    for err in errors:
        field = " -> ".join(str(loc) for loc in err["loc"])
        detail_parts.append(f"{field}: {err['msg']}")

    return problem_response(
        status=422,
        error_type="validation-error",
        title="Request Validation Failed",
        detail="; ".join(detail_parts),
        error_code=ErrorCodes.VALIDATION_FAILED,
    )

@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
    # Log the full exception internally
    import logging
    logging.exception("Unhandled exception")

    return problem_response(
        status=500,
        error_type="internal-error",
        title="Internal Server Error",
        detail="An unexpected error occurred. Please retry or contact support.",
        error_code=ErrorCodes.INTERNAL_ERROR,
        retryable=True,
        retry_after=10,
    )

Client-Side Error Handling for Agents

On the agent side, the structured error format enables intelligent retry logic.

import httpx

async def call_with_retry(url: str, body: dict, max_retries: int = 3):
    for attempt in range(max_retries + 1):
        response = await httpx.AsyncClient().post(url, json=body)

        if response.status_code < 400:
            return response.json()

        error = response.json()
        retryable = error.get("retryable", False)
        retry_after = error.get("retry_after_seconds", 2 ** attempt)

        if not retryable or attempt == max_retries:
            raise AgentAPIError(
                code=error.get("error_code"),
                detail=error.get("detail"),
                status=response.status_code,
            )

        await asyncio.sleep(retry_after)

FAQ

Why use RFC 7807 instead of a custom error format?

RFC 7807 is an IETF standard that most HTTP client libraries and API gateways understand. Using it means your errors work with existing tooling out of the box. The application/problem+json media type signals to clients that the response follows a known structure. You can extend it with custom fields like retryable and error_code without breaking the standard.

How should AI agents decide whether to retry an error?

Agents should check the retryable field first. If true, use the retry_after_seconds value as the delay. If the field is absent, use HTTP status code heuristics: 429 (rate limit) and 503 (service unavailable) are generally retryable; 400, 401, 403, 404, and 422 are not. Always cap retries with a maximum attempt count and total timeout to prevent infinite retry loops.

Should I include stack traces in error responses?

Never in production. Stack traces expose internal implementation details, file paths, library versions, and potentially sensitive data. Log the full stack trace server-side with a correlation ID, and include that correlation ID in the instance field of the Problem Details response so your support team can locate the relevant logs.


#APIErrorDesign #RFC7807 #ErrorHandling #FastAPI #AIAgents #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.