Python Logging for AI Applications: Structured Logs with structlog and loguru
Configure production-grade logging for AI applications using structlog and loguru with structured JSON output, context binding, correlation IDs, and cost-aware filtering.
Why Standard Logging Fails for AI Applications
AI agent applications produce complex, nested execution traces. A single user query might trigger five tool calls, three LLM completions, and two database lookups — each with different latencies, token counts, and costs. The standard Python logging module's flat string messages cannot capture this structure in a way that is queryable in production.
Structured logging emits JSON objects instead of formatted strings. Each log entry is a dictionary with typed fields that log aggregation systems like Datadog, Elasticsearch, and Loki can index and query. This transforms debugging from "grep through text files" to "query for all requests that exceeded 500 tokens and cost more than $0.05."
structlog: The Production Standard
structlog wraps Python's standard logging with a processor pipeline that builds structured context incrementally.
import structlog
# Configure once at application startup
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
)
log = structlog.get_logger()
# Structured fields are first-class
log.info("llm_completion", model="gpt-4o", tokens=1523, latency_ms=2100, cost=0.045)
# Output: {"event": "llm_completion", "model": "gpt-4o", "tokens": 1523,
# "latency_ms": 2100, "cost": 0.045, "level": "info",
# "timestamp": "2026-03-17T10:30:00Z"}
Context Binding for Agent Traces
The most powerful structlog feature for AI applications is context binding. Bind context once and every subsequent log entry includes it.
import structlog
from uuid import uuid4
async def handle_agent_request(user_id: str, query: str):
request_id = str(uuid4())
# Bind context that persists across all log calls in this scope
log = structlog.get_logger().bind(
request_id=request_id,
user_id=user_id,
)
log.info("agent_request_started", query=query)
# These logs automatically include request_id and user_id
tools = await select_tools(query, log)
log.info("tools_selected", tool_count=len(tools))
result = await run_agent(query, tools, log)
log.info("agent_request_completed", response_length=len(result))
return result
async def select_tools(query: str, log):
log.debug("tool_selection_started")
# log output includes request_id and user_id from parent binding
return ["web_search", "calculator"]
loguru: Simple and Powerful
loguru takes a different approach — one global logger with a fluent API. It is excellent for smaller projects and prototyping.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from loguru import logger
import sys
# Remove default handler and add structured JSON output
logger.remove()
logger.add(
sys.stdout,
format="{message}",
serialize=True, # JSON output
level="INFO",
)
# Add file rotation for production
logger.add(
"logs/agent_{time}.log",
rotation="100 MB",
retention="7 days",
compression="gz",
serialize=True,
)
# Context binding with loguru
def process_tool_call(tool_name: str, args: dict):
with logger.contextualize(tool=tool_name):
logger.info("tool_call_started", arguments=args)
result = execute_tool(tool_name, args)
logger.info("tool_call_completed", result_length=len(str(result)))
return result
Cost Tracking with Custom Processors
AI applications need cost observability. Build a custom structlog processor that calculates and attaches cost data.
import structlog
COST_PER_1K_TOKENS = {
"gpt-4o": {"input": 0.0025, "output": 0.01},
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
"claude-3-5-sonnet": {"input": 0.003, "output": 0.015},
}
def add_cost_estimate(logger, method_name, event_dict):
model = event_dict.get("model")
input_tokens = event_dict.get("input_tokens", 0)
output_tokens = event_dict.get("output_tokens", 0)
if model and model in COST_PER_1K_TOKENS:
rates = COST_PER_1K_TOKENS[model]
cost = (input_tokens / 1000 * rates["input"]) + (
output_tokens / 1000 * rates["output"]
)
event_dict["estimated_cost_usd"] = round(cost, 6)
return event_dict
# Add to processor chain
structlog.configure(
processors=[
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
add_cost_estimate, # custom processor
structlog.processors.JSONRenderer(),
],
)
log = structlog.get_logger()
log.info("llm_call", model="gpt-4o", input_tokens=500, output_tokens=1200)
# Automatically includes: "estimated_cost_usd": 0.01325
Filtering Sensitive Data
AI logs often contain user queries and model responses that may include PII. Filter these before they reach storage.
import re
SENSITIVE_PATTERNS = [
(re.compile(r"sk-[a-zA-Z0-9]{20,}"), "sk-***REDACTED***"),
(re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"), "***EMAIL***"),
(re.compile(r"\b\d{3}-\d{2}-\d{4}\b"), "***SSN***"),
]
def redact_sensitive(logger, method_name, event_dict):
for key, value in event_dict.items():
if isinstance(value, str):
for pattern, replacement in SENSITIVE_PATTERNS:
value = pattern.sub(replacement, value)
event_dict[key] = value
return event_dict
FAQ
Should I use structlog or loguru for production AI applications?
structlog is better for production systems because it integrates with Python's standard logging ecosystem, supports asyncio-safe context variables, and works well in multi-service architectures. loguru is better for single-service applications, scripts, and rapid prototyping where its simpler API saves setup time.
How do I correlate logs across multiple agent steps?
Generate a unique request ID at the entry point and bind it to the logger context. Every downstream function receives the bound logger or uses structlog's contextvars integration, which automatically propagates context across async boundaries. This lets you filter all logs for a single agent execution in your log aggregation tool.
How much logging is too much in an AI application?
Log every LLM call with model, tokens, and latency at INFO level. Log tool calls and their results at INFO. Log internal decision-making at DEBUG. Never log full prompt contents at INFO in production — they consume storage rapidly and may contain sensitive data. Use DEBUG level for full prompt logging during development.
#Python #Logging #Observability #Structlog #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.