Distributed Tracing Across AI Agent Microservices: Jaeger and OpenTelemetry

Why Distributed Tracing Is Non-Negotiable for Agent Systems

When a user sends a message to an AI agent backed by microservices, the request flows through 4 to 8 services: the API gateway, conversation manager, RAG retrieval, tool execution, memory store, and possibly an LLM proxy. When the response takes 5 seconds instead of 1 second, which service is the bottleneck? Without distributed tracing, answering this question requires correlating logs from multiple services by timestamp — a fragile and time-consuming process.

Distributed tracing assigns a unique trace ID to each incoming request and propagates it through every service. Each service records spans — timed operations within the trace — that show exactly where time was spent.

Setting Up OpenTelemetry in Python

OpenTelemetry is the industry-standard framework for distributed tracing. Here is a reusable setup module for AI agent services:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
    OTLPSpanExporter,
)
from opentelemetry.sdk.resources import Resource
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.b3 import B3MultiFormat

def setup_tracing(service_name: str, otlp_endpoint: str = "jaeger:4317"):
    resource = Resource.create({
        "service.name": service_name,
        "service.version": "2.1.0",
        "deployment.environment": "production",
    })

    provider = TracerProvider(resource=resource)
    exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)

    # Propagate trace context in B3 format (compatible with Jaeger)
    set_global_textmap(B3MultiFormat())

    # Auto-instrument outgoing HTTP calls
    HTTPXClientInstrumentor().instrument()

    return trace.get_tracer(service_name)

Integrate it into a FastAPI service:

from fastapi import FastAPI

app = FastAPI(title="RAG Retrieval Service")
tracer = setup_tracing("rag-retrieval")

# Auto-instrument all FastAPI endpoints
FastAPIInstrumentor.instrument_app(app)

@app.post("/retrieve")
async def retrieve(request: RetrievalRequest):
    with tracer.start_as_current_span("retrieve_documents") as span:
        span.set_attribute("query.length", len(request.query))
        span.set_attribute("top_k", request.top_k)

        with tracer.start_as_current_span("generate_embedding"):
            embedding = await embedder.encode(request.query)

        with tracer.start_as_current_span("vector_search") as search_span:
            candidates = await vector_store.search(
                embedding, top_k=request.top_k * 3
            )
            search_span.set_attribute(
                "candidates.count", len(candidates)
            )

        with tracer.start_as_current_span("rerank") as rerank_span:
            reranked = await reranker.rerank(
                request.query, candidates
            )
            rerank_span.set_attribute(
                "reranked.count", len(reranked)
            )

        results = reranked[: request.top_k]
        span.set_attribute("results.count", len(results))
        return {"documents": results}

Trace Propagation Between Services

The critical piece is propagating trace context when one service calls another. The OpenTelemetry HTTP instrumentation handles this automatically by injecting trace headers into outgoing requests:

import httpx
from opentelemetry import context
from opentelemetry.propagate import inject

class TracedServiceClient:
    def __init__(self, base_url: str):
        self.base_url = base_url
        self.client = httpx.AsyncClient(timeout=15.0)

    async def call(self, path: str, payload: dict) -> dict:
        """Make an HTTP call with trace context propagated."""
        headers = {}
        inject(headers)  # Injects trace ID into headers

        resp = await self.client.post(
            f"{self.base_url}{path}",
            json=payload,
            headers=headers,
        )
        resp.raise_for_status()
        return resp.json()

When the receiving service extracts these headers (which the FastAPI auto-instrumentation does), it creates child spans under the same trace. The result is a complete picture: one trace showing the API gateway receiving the request, the conversation manager processing it, the RAG service retrieving context, and the LLM generating a response — all connected.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Designing Spans for Agent Workflows

Not every function call deserves a span. Create spans around operations that consume meaningful time or represent logical steps in the agent workflow:

async def handle_user_message(self, session_id: str, message: str):
    with tracer.start_as_current_span("handle_message") as root:
        root.set_attribute("session.id", session_id)

        with tracer.start_as_current_span("classify_intent"):
            intent = await self.router.classify(message)
            trace.get_current_span().set_attribute(
                "intent", intent.name
            )

        if intent.requires_tool:
            with tracer.start_as_current_span("execute_tool") as ts:
                ts.set_attribute("tool.name", intent.tool_name)
                result = await self.tool_client.call(
                    "/execute",
                    {"tool": intent.tool_name, "params": intent.params},
                )
                ts.set_attribute("tool.success", result["success"])

        with tracer.start_as_current_span("retrieve_context"):
            context_docs = await self.rag_client.call(
                "/retrieve", {"query": message, "top_k": 5}
            )

        with tracer.start_as_current_span("generate_response") as gs:
            response = await self.llm.generate(
                message, context_docs, intent
            )
            gs.set_attribute("tokens.used", response.tokens_used)
            gs.set_attribute("model", response.model)

        return response

Jaeger Deployment for Visualization

Deploy Jaeger alongside your agent services to visualize traces:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
  namespace: agent-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      containers:
        - name: jaeger
          image: jaegertracing/all-in-one:1.54
          ports:
            - containerPort: 16686  # UI
            - containerPort: 4317   # OTLP gRPC
          env:
            - name: COLLECTOR_OTLP_ENABLED
              value: "true"
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger
  namespace: agent-system
spec:
  selector:
    app: jaeger
  ports:
    - name: ui
      port: 16686
    - name: otlp
      port: 4317

Open the Jaeger UI at port 16686 to search for traces by service name, operation, or duration. The waterfall view shows exactly how time is distributed across services for each request.

FAQ

How much overhead does distributed tracing add to request latency?

With the default BatchSpanProcessor, overhead is minimal — typically under 1ms per span. Spans are buffered in memory and exported in batches to the collector, so the export does not block the request path. The primary cost is memory for buffering spans. For high-throughput agent systems, configure the batch processor's max_queue_size and max_export_batch_size to control memory usage.

Should I trace LLM API calls to external providers like OpenAI?

Yes. Wrap your LLM client calls in spans to capture the latency of external API calls, which often dominate total request time. Record the model name, token count, and response latency as span attributes. Do not record the actual prompt or response content in span attributes — this can leak sensitive user data into your tracing backend.

How do I correlate traces with application logs?

Inject the trace ID into your structured log output. Most logging libraries support this through OpenTelemetry's log integration. Add a custom log formatter that includes trace_id and span_id in every log line. In Jaeger, you can then jump from a trace to the corresponding logs, and from a log entry to the containing trace.

#DistributedTracing #OpenTelemetry #Jaeger #Observability #Microservices #AgenticAI #LearnAI #AIEngineering

Distributed Tracing Across AI Agent Microservices: Jaeger and OpenTelemetry

Why Distributed Tracing Is Non-Negotiable for Agent Systems

Setting Up OpenTelemetry in Python

Trace Propagation Between Services

Designing Spans for Agent Workflows

Jaeger Deployment for Visualization

FAQ

How much overhead does distributed tracing add to request latency?

Should I trace LLM API calls to external providers like OpenAI?

How do I correlate traces with application logs?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding