Distributed Tracing Across AI Agent Microservices: Jaeger and OpenTelemetry
Implement distributed tracing across AI agent microservices using OpenTelemetry and Jaeger. Learn trace propagation, span design, context injection, and how to visualize end-to-end agent request flows.
Why Distributed Tracing Is Non-Negotiable for Agent Systems
When a user sends a message to an AI agent backed by microservices, the request flows through 4 to 8 services: the API gateway, conversation manager, RAG retrieval, tool execution, memory store, and possibly an LLM proxy. When the response takes 5 seconds instead of 1 second, which service is the bottleneck? Without distributed tracing, answering this question requires correlating logs from multiple services by timestamp — a fragile and time-consuming process.
Distributed tracing assigns a unique trace ID to each incoming request and propagates it through every service. Each service records spans — timed operations within the trace — that show exactly where time was spent.
Setting Up OpenTelemetry in Python
OpenTelemetry is the industry-standard framework for distributed tracing. Here is a reusable setup module for AI agent services:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
OTLPSpanExporter,
)
from opentelemetry.sdk.resources import Resource
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.b3 import B3MultiFormat
def setup_tracing(service_name: str, otlp_endpoint: str = "jaeger:4317"):
resource = Resource.create({
"service.name": service_name,
"service.version": "2.1.0",
"deployment.environment": "production",
})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
# Propagate trace context in B3 format (compatible with Jaeger)
set_global_textmap(B3MultiFormat())
# Auto-instrument outgoing HTTP calls
HTTPXClientInstrumentor().instrument()
return trace.get_tracer(service_name)
Integrate it into a FastAPI service:
from fastapi import FastAPI
app = FastAPI(title="RAG Retrieval Service")
tracer = setup_tracing("rag-retrieval")
# Auto-instrument all FastAPI endpoints
FastAPIInstrumentor.instrument_app(app)
@app.post("/retrieve")
async def retrieve(request: RetrievalRequest):
with tracer.start_as_current_span("retrieve_documents") as span:
span.set_attribute("query.length", len(request.query))
span.set_attribute("top_k", request.top_k)
with tracer.start_as_current_span("generate_embedding"):
embedding = await embedder.encode(request.query)
with tracer.start_as_current_span("vector_search") as search_span:
candidates = await vector_store.search(
embedding, top_k=request.top_k * 3
)
search_span.set_attribute(
"candidates.count", len(candidates)
)
with tracer.start_as_current_span("rerank") as rerank_span:
reranked = await reranker.rerank(
request.query, candidates
)
rerank_span.set_attribute(
"reranked.count", len(reranked)
)
results = reranked[: request.top_k]
span.set_attribute("results.count", len(results))
return {"documents": results}
Trace Propagation Between Services
The critical piece is propagating trace context when one service calls another. The OpenTelemetry HTTP instrumentation handles this automatically by injecting trace headers into outgoing requests:
import httpx
from opentelemetry import context
from opentelemetry.propagate import inject
class TracedServiceClient:
def __init__(self, base_url: str):
self.base_url = base_url
self.client = httpx.AsyncClient(timeout=15.0)
async def call(self, path: str, payload: dict) -> dict:
"""Make an HTTP call with trace context propagated."""
headers = {}
inject(headers) # Injects trace ID into headers
resp = await self.client.post(
f"{self.base_url}{path}",
json=payload,
headers=headers,
)
resp.raise_for_status()
return resp.json()
When the receiving service extracts these headers (which the FastAPI auto-instrumentation does), it creates child spans under the same trace. The result is a complete picture: one trace showing the API gateway receiving the request, the conversation manager processing it, the RAG service retrieving context, and the LLM generating a response — all connected.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Designing Spans for Agent Workflows
Not every function call deserves a span. Create spans around operations that consume meaningful time or represent logical steps in the agent workflow:
async def handle_user_message(self, session_id: str, message: str):
with tracer.start_as_current_span("handle_message") as root:
root.set_attribute("session.id", session_id)
with tracer.start_as_current_span("classify_intent"):
intent = await self.router.classify(message)
trace.get_current_span().set_attribute(
"intent", intent.name
)
if intent.requires_tool:
with tracer.start_as_current_span("execute_tool") as ts:
ts.set_attribute("tool.name", intent.tool_name)
result = await self.tool_client.call(
"/execute",
{"tool": intent.tool_name, "params": intent.params},
)
ts.set_attribute("tool.success", result["success"])
with tracer.start_as_current_span("retrieve_context"):
context_docs = await self.rag_client.call(
"/retrieve", {"query": message, "top_k": 5}
)
with tracer.start_as_current_span("generate_response") as gs:
response = await self.llm.generate(
message, context_docs, intent
)
gs.set_attribute("tokens.used", response.tokens_used)
gs.set_attribute("model", response.model)
return response
Jaeger Deployment for Visualization
Deploy Jaeger alongside your agent services to visualize traces:
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
namespace: agent-system
spec:
replicas: 1
selector:
matchLabels:
app: jaeger
template:
metadata:
labels:
app: jaeger
spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:1.54
ports:
- containerPort: 16686 # UI
- containerPort: 4317 # OTLP gRPC
env:
- name: COLLECTOR_OTLP_ENABLED
value: "true"
---
apiVersion: v1
kind: Service
metadata:
name: jaeger
namespace: agent-system
spec:
selector:
app: jaeger
ports:
- name: ui
port: 16686
- name: otlp
port: 4317
Open the Jaeger UI at port 16686 to search for traces by service name, operation, or duration. The waterfall view shows exactly how time is distributed across services for each request.
FAQ
How much overhead does distributed tracing add to request latency?
With the default BatchSpanProcessor, overhead is minimal — typically under 1ms per span. Spans are buffered in memory and exported in batches to the collector, so the export does not block the request path. The primary cost is memory for buffering spans. For high-throughput agent systems, configure the batch processor's max_queue_size and max_export_batch_size to control memory usage.
Should I trace LLM API calls to external providers like OpenAI?
Yes. Wrap your LLM client calls in spans to capture the latency of external API calls, which often dominate total request time. Record the model name, token count, and response latency as span attributes. Do not record the actual prompt or response content in span attributes — this can leak sensitive user data into your tracing backend.
How do I correlate traces with application logs?
Inject the trace ID into your structured log output. Most logging libraries support this through OpenTelemetry's log integration. Add a custom log formatter that includes trace_id and span_id in every log line. In Jaeger, you can then jump from a trace to the corresponding logs, and from a log entry to the containing trace.
#DistributedTracing #OpenTelemetry #Jaeger #Observability #Microservices #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.