Debugging Production Agent Issues: Log Analysis, Trace Correlation, and Root Cause Identification

Production Debugging Is a Different Game

Debugging an agent in development is straightforward — you can add print statements, step through code, and reproduce the issue on demand. Production debugging is fundamentally different. You cannot reproduce most issues because they depend on specific user inputs, timing, model randomness, and external service states that no longer exist.

Your only witness to what happened is your observability data: logs, traces, and metrics. If you did not capture the right data at the right granularity, the bug is unsolvable. Building an effective observability stack for AI agents requires planning for what will go wrong before it does.

Structured Logging for Agents

Unstructured log messages like "Processing request" are useless in production. Every log entry needs context — who, what, when, and how:

import json
import logging
import uuid
from contextvars import ContextVar
from functools import wraps

# Conversation-scoped correlation ID
correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")
agent_name: ContextVar[str] = ContextVar("agent_name", default="")

class AgentLogger:
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)

    def _build_entry(self, event: str, **kwargs) -> dict:
        return {
            "event": event,
            "correlation_id": correlation_id.get(),
            "agent": agent_name.get(),
            **kwargs,
        }

    def info(self, event: str, **kwargs):
        self.logger.info(json.dumps(self._build_entry(event, **kwargs)))

    def error(self, event: str, **kwargs):
        self.logger.error(json.dumps(self._build_entry(event, **kwargs)))

    def tool_call(self, tool_name: str, args: dict, result=None, error=None, duration_ms=0):
        self.info(
            "tool_call",
            tool=tool_name,
            arguments=args,
            result_preview=str(result)[:200] if result else None,
            error=str(error) if error else None,
            duration_ms=round(duration_ms, 1),
        )

    def llm_call(self, model: str, prompt_tokens: int, completion_tokens: int, duration_ms: float):
        self.info(
            "llm_call",
            model=model,
            prompt_tokens=prompt_tokens,
            completion_tokens=completion_tokens,
            duration_ms=round(duration_ms, 1),
        )

log = AgentLogger("agent")

Implementing Trace Correlation

A single user conversation generates dozens of log entries across multiple agents and tools. Correlation IDs tie them together:

from contextlib import contextmanager

@contextmanager
def conversation_trace(conversation_id: str = None):
    cid = conversation_id or str(uuid.uuid4())
    token = correlation_id.set(cid)
    log.info("conversation_start", conversation_id=cid)
    try:
        yield cid
    except Exception as e:
        log.error("conversation_error", error=str(e), error_type=type(e).__name__)
        raise
    finally:
        log.info("conversation_end", conversation_id=cid)
        correlation_id.reset(token)

def trace_agent(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        name = kwargs.get("agent_name", func.__name__)
        token = agent_name.set(name)
        log.info("agent_start", agent=name)
        try:
            result = await func(*args, **kwargs)
            log.info("agent_complete", agent=name)
            return result
        except Exception as e:
            log.error("agent_error", agent=name, error=str(e))
            raise
        finally:
            agent_name.reset(token)
    return wrapper

# Usage
@trace_agent
async def handle_support_request(user_message: str, agent_name="support"):
    # All logs inside this function include the correlation ID and agent name
    log.info("processing_message", message_length=len(user_message))
    # ... agent logic

Building a Timeline Reconstructor

When investigating an incident, you need to reconstruct the exact sequence of events from logs:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from datetime import datetime
from dataclasses import dataclass

@dataclass
class TimelineEvent:
    timestamp: datetime
    event: str
    agent: str
    details: dict

class TimelineReconstructor:
    def __init__(self):
        self.events: list[TimelineEvent] = []

    def add_from_log_line(self, log_line: str):
        try:
            data = json.loads(log_line)
            event = TimelineEvent(
                timestamp=datetime.fromisoformat(data.get("timestamp", "")),
                event=data.get("event", "unknown"),
                agent=data.get("agent", ""),
                details={
                    k: v for k, v in data.items()
                    if k not in ("timestamp", "event", "agent", "correlation_id")
                },
            )
            self.events.append(event)
        except (json.JSONDecodeError, ValueError):
            pass

    def reconstruct(self, correlation_id: str) -> list[TimelineEvent]:
        filtered = [e for e in self.events if True]  # Pre-filtered by query
        return sorted(filtered, key=lambda e: e.timestamp)

    def print_timeline(self, events: list[TimelineEvent]):
        if not events:
            print("No events found")
            return
        base = events[0].timestamp
        for e in events:
            offset_ms = (e.timestamp - base).total_seconds() * 1000
            print(f"  +{offset_ms:8.0f}ms | [{e.agent:15s}] {e.event}")
            for k, v in e.details.items():
                print(f"             |   {k}: {v}")

Alerting on Agent Anomalies

Set up alerts that catch problems before users report them:

class AgentAnomalyDetector:
    def __init__(self):
        self.baselines = {}

    def set_baseline(self, metric: str, p50: float, p99: float):
        self.baselines[metric] = {"p50": p50, "p99": p99}

    def check(self, metric: str, value: float) -> str | None:
        baseline = self.baselines.get(metric)
        if not baseline:
            return None
        if value > baseline["p99"] * 2:
            return f"CRITICAL: {metric}={value:.1f} (2x p99={baseline['p99']})"
        if value > baseline["p99"]:
            return f"WARNING: {metric}={value:.1f} (above p99={baseline['p99']})"
        return None

# Setup
detector = AgentAnomalyDetector()
detector.set_baseline("turn_count", p50=3, p99=12)
detector.set_baseline("total_tokens", p50=4000, p99=25000)
detector.set_baseline("latency_ms", p50=2000, p99=8000)

# Check after each conversation
alert = detector.check("turn_count", 18)
if alert:
    log.error("anomaly_detected", alert=alert)

FAQ

What log retention period should I use for agent conversations?

Keep detailed logs (full messages, tool calls, results) for 7 to 14 days for active debugging. Keep summarized logs (token counts, latency, error rates, correlation IDs) for 90 days for trend analysis. Archive full conversation logs for 30 days to support incident investigation that is reported after the fact.

How do I correlate agent logs with external service logs like database queries or API calls?

Pass the correlation ID as a header or parameter to every external call. For database queries, add it as a SQL comment. For HTTP calls, add it as an X-Correlation-ID header. This lets you join agent logs with infrastructure logs to build a complete picture of what happened during a request.

Should I log the full LLM prompt and response in production?

Log full prompts and responses for error cases and sampled successful cases (1 to 5 percent). Do not log everything — it generates enormous storage costs and may contain sensitive user data. Redact PII before logging and use a separate secure store for full conversation archives.

#Debugging #Observability #Production #Logging #AIAgents #AgenticAI #LearnAI #AIEngineering

Debugging Production Agent Issues: Log Analysis, Trace Correlation, and Root Cause Identification

Production Debugging Is a Different Game

Structured Logging for Agents

Implementing Trace Correlation

Building a Timeline Reconstructor

Alerting on Agent Anomalies

FAQ

What log retention period should I use for agent conversations?

How do I correlate agent logs with external service logs like database queries or API calls?

Should I log the full LLM prompt and response in production?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding