Python Logging for AI Applications: Structured Logs with structlog and loguru

Why Standard Logging Fails for AI Applications

AI agent applications produce complex, nested execution traces. A single user query might trigger five tool calls, three LLM completions, and two database lookups — each with different latencies, token counts, and costs. The standard Python logging module's flat string messages cannot capture this structure in a way that is queryable in production.

Structured logging emits JSON objects instead of formatted strings. Each log entry is a dictionary with typed fields that log aggregation systems like Datadog, Elasticsearch, and Loki can index and query. This transforms debugging from "grep through text files" to "query for all requests that exceeded 500 tokens and cost more than $0.05."

structlog: The Production Standard

structlog wraps Python's standard logging with a processor pipeline that builds structured context incrementally.

import structlog

# Configure once at application startup
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.StackInfoRenderer(),
        structlog.processors.format_exc_info,
        structlog.processors.JSONRenderer(),
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

log = structlog.get_logger()

# Structured fields are first-class
log.info("llm_completion", model="gpt-4o", tokens=1523, latency_ms=2100, cost=0.045)
# Output: {"event": "llm_completion", "model": "gpt-4o", "tokens": 1523,
#          "latency_ms": 2100, "cost": 0.045, "level": "info",
#          "timestamp": "2026-03-17T10:30:00Z"}

Context Binding for Agent Traces

The most powerful structlog feature for AI applications is context binding. Bind context once and every subsequent log entry includes it.

import structlog
from uuid import uuid4

async def handle_agent_request(user_id: str, query: str):
    request_id = str(uuid4())

    # Bind context that persists across all log calls in this scope
    log = structlog.get_logger().bind(
        request_id=request_id,
        user_id=user_id,
    )

    log.info("agent_request_started", query=query)

    # These logs automatically include request_id and user_id
    tools = await select_tools(query, log)
    log.info("tools_selected", tool_count=len(tools))

    result = await run_agent(query, tools, log)
    log.info("agent_request_completed", response_length=len(result))

    return result

async def select_tools(query: str, log):
    log.debug("tool_selection_started")
    # log output includes request_id and user_id from parent binding
    return ["web_search", "calculator"]

loguru: Simple and Powerful

loguru takes a different approach — one global logger with a fluent API. It is excellent for smaller projects and prototyping.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from loguru import logger
import sys

# Remove default handler and add structured JSON output
logger.remove()
logger.add(
    sys.stdout,
    format="{message}",
    serialize=True,  # JSON output
    level="INFO",
)

# Add file rotation for production
logger.add(
    "logs/agent_{time}.log",
    rotation="100 MB",
    retention="7 days",
    compression="gz",
    serialize=True,
)

# Context binding with loguru
def process_tool_call(tool_name: str, args: dict):
    with logger.contextualize(tool=tool_name):
        logger.info("tool_call_started", arguments=args)
        result = execute_tool(tool_name, args)
        logger.info("tool_call_completed", result_length=len(str(result)))
        return result

Cost Tracking with Custom Processors

AI applications need cost observability. Build a custom structlog processor that calculates and attaches cost data.

import structlog

COST_PER_1K_TOKENS = {
    "gpt-4o": {"input": 0.0025, "output": 0.01},
    "gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
    "claude-3-5-sonnet": {"input": 0.003, "output": 0.015},
}

def add_cost_estimate(logger, method_name, event_dict):
    model = event_dict.get("model")
    input_tokens = event_dict.get("input_tokens", 0)
    output_tokens = event_dict.get("output_tokens", 0)

    if model and model in COST_PER_1K_TOKENS:
        rates = COST_PER_1K_TOKENS[model]
        cost = (input_tokens / 1000 * rates["input"]) + (
            output_tokens / 1000 * rates["output"]
        )
        event_dict["estimated_cost_usd"] = round(cost, 6)

    return event_dict

# Add to processor chain
structlog.configure(
    processors=[
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        add_cost_estimate,  # custom processor
        structlog.processors.JSONRenderer(),
    ],
)

log = structlog.get_logger()
log.info("llm_call", model="gpt-4o", input_tokens=500, output_tokens=1200)
# Automatically includes: "estimated_cost_usd": 0.01325

Filtering Sensitive Data

AI logs often contain user queries and model responses that may include PII. Filter these before they reach storage.

import re

SENSITIVE_PATTERNS = [
    (re.compile(r"sk-[a-zA-Z0-9]{20,}"), "sk-***REDACTED***"),
    (re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"), "***EMAIL***"),
    (re.compile(r"\b\d{3}-\d{2}-\d{4}\b"), "***SSN***"),
]

def redact_sensitive(logger, method_name, event_dict):
    for key, value in event_dict.items():
        if isinstance(value, str):
            for pattern, replacement in SENSITIVE_PATTERNS:
                value = pattern.sub(replacement, value)
            event_dict[key] = value
    return event_dict

FAQ

Should I use structlog or loguru for production AI applications?

structlog is better for production systems because it integrates with Python's standard logging ecosystem, supports asyncio-safe context variables, and works well in multi-service architectures. loguru is better for single-service applications, scripts, and rapid prototyping where its simpler API saves setup time.

How do I correlate logs across multiple agent steps?

Generate a unique request ID at the entry point and bind it to the logger context. Every downstream function receives the bound logger or uses structlog's contextvars integration, which automatically propagates context across async boundaries. This lets you filter all logs for a single agent execution in your log aggregation tool.

How much logging is too much in an AI application?

Log every LLM call with model, tokens, and latency at INFO level. Log tool calls and their results at INFO. Log internal decision-making at DEBUG. Never log full prompt contents at INFO in production — they consume storage rapidly and may contain sensitive data. Use DEBUG level for full prompt logging during development.

#Python #Logging #Observability #Structlog #AgenticAI #LearnAI #AIEngineering

Python Logging for AI Applications: Structured Logs with structlog and loguru

Why Standard Logging Fails for AI Applications

structlog: The Production Standard

Context Binding for Agent Traces

loguru: Simple and Powerful

Cost Tracking with Custom Processors

Filtering Sensitive Data

FAQ

Should I use structlog or loguru for production AI applications?

How do I correlate logs across multiple agent steps?

How much logging is too much in an AI application?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding