Observability for AI Voice Agents: Distributed Tracing, Metrics, and Logs
A complete observability stack for AI voice agents — distributed tracing across STT/LLM/TTS, metrics, logs, and SLO dashboards.
The "it's slow sometimes" ticket
The worst voice-agent ticket you will ever get is "it's slow sometimes." Without proper observability you cannot tell if it was the carrier, the STT stage, the LLM first token, the tool call, or the TTS stream. With proper observability you can pull up one trace and see exactly which stage blew its budget.
This post walks through the observability stack CallSphere runs in production — distributed traces, RED metrics, structured logs, and SLO dashboards that fire alerts before customers notice.
per-call trace
│
├── span: network_in
├── span: stt
├── span: llm_first_token
├── span: tool_call (repeated)
├── span: tts_first_frame
└── span: network_out
Architecture overview
┌─────────────┐ OTLP ┌─────────────┐
│ Voice edge │────────► │ Collector │
└─────────────┘ └──────┬──────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Traces │ │ Metrics │ │ Logs │
│ (Tempo) │ │ (Prom) │ │ (Loki) │
└───────────┘ └───────────┘ └───────────┘
│
▼
┌───────────┐
│ Grafana │
│ + alerts │
└───────────┘
Prerequisites
- OpenTelemetry SDK in your edge service.
- A collector (OTel Collector).
- Storage backends: Tempo/Jaeger for traces, Prometheus for metrics, Loki for logs.
- Grafana for dashboards.
Step-by-step walkthrough
1. Instrument spans per stage
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="collector:4317", insecure=True)))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("voice-edge")
async def handle_turn(audio):
with tracer.start_as_current_span("turn") as span:
span.set_attribute("call_id", current_call_id())
with tracer.start_as_current_span("stt") as s:
text = await stt(audio)
s.set_attribute("stt.chars", len(text))
with tracer.start_as_current_span("llm") as s:
first_token_at = None
async for token in llm_stream(text):
if first_token_at is None:
first_token_at = time.time()
s.set_attribute("llm.first_token_ms", (first_token_at - s.start_time) * 1000)
2. Use the Call SID as the trace ID
Carrier Call SID is the one ID that everyone — ops, support, legal — agrees on. Use it as the trace root so you can paste a Call SID into Grafana and get the whole pipeline.
from opentelemetry.trace import SpanContext, TraceFlags
def trace_id_from_call_sid(sid: str) -> int:
return int.from_bytes(hashlib.sha256(sid.encode()).digest()[:16], "big")
3. Emit RED metrics
Rate, Errors, Duration — for every stage.
from prometheus_client import Counter, Histogram
STT_LAT = Histogram("stt_duration_seconds", "STT stage duration", buckets=[0.05, 0.1, 0.2, 0.5, 1, 2])
LLM_FT = Histogram("llm_first_token_seconds", "LLM first-token latency", buckets=[0.1, 0.2, 0.3, 0.5, 1])
ERRORS = Counter("stage_errors_total", "Errors by stage", ["stage"])
4. Structured logs with trace context
import structlog
log = structlog.get_logger()
log.info("call_end", call_id=sid, trace_id=tid, outcome="resolved", duration_sec=184)
5. Define SLOs
- Turn latency p95 < 1.2s
- STT error rate < 0.5%
- LLM 5xx < 0.1%
- Carrier answer rate > 99%
6. Build dashboards and burn-rate alerts
Use multi-window multi-burn-rate alerts so you catch fast and slow SLO burns before they become incidents.
groups:
- name: voice-slo
rules:
- alert: HighTurnLatency
expr: histogram_quantile(0.95, sum(rate(turn_duration_seconds_bucket[5m])) by (le)) > 1.2
for: 5m
labels: {severity: page}
annotations: {summary: "Turn p95 latency over 1.2s"}
Production considerations
- Sampling: sample 100% of errors, 10% of successes to control cost.
- Cardinality: do not tag metrics with caller phone numbers.
- Log volume: audio is not a log. Keep transcripts in a dedicated store.
- Trace retention: 14 days is usually enough; longer for incident review.
- Privacy: redact PII in spans and logs.
CallSphere's real implementation
CallSphere instruments its voice edge with OpenTelemetry and routes traces, metrics, and logs through a collector into Tempo, Prometheus, and Loki. Every call's Twilio SID is used as the trace root, so support tickets referencing a specific call SID pull up the full pipeline in one click. RED metrics exist for every stage of the STT → LLM → TTS pipeline powered by the OpenAI Realtime API (gpt-4o-realtime-preview-2025-06-03) at 24kHz PCM16 with server VAD.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Multi-window burn-rate alerts fire on turn latency, tool error rate, and guardrail rejection rate across all verticals — 14 healthcare tools, 10 real estate agents, 4 salon agents, 7 after-hours escalation tools, 10 plus RAG IT helpdesk tools, and the 5-specialist ElevenLabs sales pod. A GPT-4o-mini post-call pipeline produces analytics that are also exported as metrics so sentiment trends show up on the same dashboards as SRE metrics. CallSphere supports 57+ languages and maintains sub-second end-to-end latency visible in Grafana at all times.
Common pitfalls
- Metrics without traces: you know something is wrong but not where.
- Unbounded label cardinality: Prometheus will fall over.
- Logs without trace IDs: you cannot correlate.
- Alerting on raw counts: you will page on random spikes.
- No SLO: you cannot tell the difference between a blip and a burn.
FAQ
Should I use OpenTelemetry or a vendor SDK?
OpenTelemetry. It decouples you from any single vendor.
Is Grafana enough or do I need Honeycomb / Lightstep?
Grafana is enough for most teams. Honeycomb shines for exploratory trace analysis.
How do I correlate a caller complaint to a trace?
Caller number → recent calls table → Call SID → trace.
Should audio frames be traced?
No. Trace at the event level, not the frame level.
Can I use trace IDs for billing reconciliation?
Yes — join trace IDs to your call log and carrier CDRs.
Next steps
Want full-stack observability on your voice agent? Book a demo, explore the technology page, or see pricing.
#CallSphere #Observability #OpenTelemetry #VoiceAI #SLO #Tracing #AIVoiceAgents
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.