Building Custom Agent Dashboards: Visualizing Conversations, Costs, and Latency
Build production-grade Grafana dashboards for AI agent systems that visualize conversation throughput, per-model costs, LLM latency percentiles, and tool usage patterns using Prometheus metrics.
The Key Metrics Every Agent Dashboard Needs
Generic application dashboards track request rate, error rate, and latency. Agent dashboards need those plus metrics unique to LLM workloads: token consumption, cost per conversation, tool call success rates, and conversation completion rates. Without these, you are flying blind on the dimensions that matter most for agent reliability and cost control.
The foundation is a metrics collection layer that captures these signals at the right granularity, and a visualization layer that makes patterns visible at a glance.
Exposing Prometheus Metrics from Your Agent
Use the prometheus_client library to define counters, histograms, and gauges that capture agent-specific signals.
from prometheus_client import Counter, Histogram, Gauge, start_http_server
# Conversation metrics
conversations_total = Counter(
"agent_conversations_total",
"Total conversations started",
["agent_name", "status"],
)
# LLM call metrics
llm_call_duration = Histogram(
"agent_llm_call_duration_seconds",
"LLM call latency in seconds",
["model", "agent_name"],
buckets=[0.1, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0],
)
tokens_used = Counter(
"agent_tokens_total",
"Total tokens consumed",
["model", "token_type"], # token_type: prompt or completion
)
# Tool metrics
tool_calls_total = Counter(
"agent_tool_calls_total",
"Total tool invocations",
["tool_name", "status"],
)
# Active conversations gauge
active_conversations = Gauge(
"agent_active_conversations",
"Currently active conversations",
["agent_name"],
)
# Start metrics server on port 9090
start_http_server(9090)
Instrumenting the Agent Loop
Wrap the core agent operations to emit metrics on every call.
import time
async def instrumented_llm_call(model: str, messages: list, agent_name: str):
start = time.perf_counter()
try:
response = await llm_client.chat.completions.create(
model=model, messages=messages
)
duration = time.perf_counter() - start
llm_call_duration.labels(model=model, agent_name=agent_name).observe(duration)
tokens_used.labels(model=model, token_type="prompt").inc(
response.usage.prompt_tokens
)
tokens_used.labels(model=model, token_type="completion").inc(
response.usage.completion_tokens
)
return response
except Exception as e:
duration = time.perf_counter() - start
llm_call_duration.labels(model=model, agent_name=agent_name).observe(duration)
raise
async def instrumented_tool_call(tool_name: str, arguments: dict):
try:
result = await execute_tool(tool_name, arguments)
tool_calls_total.labels(tool_name=tool_name, status="success").inc()
return result
except Exception:
tool_calls_total.labels(tool_name=tool_name, status="error").inc()
raise
async def run_conversation(user_id: str, message: str, agent_name: str):
active_conversations.labels(agent_name=agent_name).inc()
try:
result = await agent.run(message)
conversations_total.labels(agent_name=agent_name, status="completed").inc()
return result
except Exception:
conversations_total.labels(agent_name=agent_name, status="failed").inc()
raise
finally:
active_conversations.labels(agent_name=agent_name).dec()
Building the Grafana Dashboard
Configure Prometheus as a Grafana data source, then create panels using PromQL queries for each KPI.
Conversation throughput — requests per minute over time:
rate(agent_conversations_total[5m])
LLM latency P95 — the 95th percentile response time by model:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
histogram_quantile(0.95, rate(agent_llm_call_duration_seconds_bucket[5m]))
Token burn rate — tokens per minute, split by prompt vs completion:
rate(agent_tokens_total[5m])
Cost estimation panel — multiply token rates by per-token pricing using a recording rule or Grafana transformation:
rate(agent_tokens_total{token_type="prompt", model="gpt-4o"}[5m]) * 0.0000025
+
rate(agent_tokens_total{token_type="completion", model="gpt-4o"}[5m]) * 0.00001
Tool error rate — percentage of tool calls that fail:
rate(agent_tool_calls_total{status="error"}[5m])
/ rate(agent_tool_calls_total[5m])
Setting Up Alerts
Define Prometheus alerting rules that fire when agent KPIs breach thresholds.
# prometheus-alerts.yaml
groups:
- name: agent_alerts
rules:
- alert: HighLLMLatency
expr: histogram_quantile(0.95, rate(agent_llm_call_duration_seconds_bucket[5m])) > 5
for: 3m
labels:
severity: warning
annotations:
summary: "LLM P95 latency exceeds 5 seconds"
- alert: HighToolErrorRate
expr: >
rate(agent_tool_calls_total{status="error"}[10m])
/ rate(agent_tool_calls_total[10m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "Tool error rate above 10%"
FAQ
How many Prometheus labels should I use per metric?
Keep label cardinality low. Labels like model, agent_name, and status are fine because they have a small, bounded set of values. Never use labels with high cardinality like user_id or conversation_id — these will cause Prometheus memory and performance issues. Track per-user data in a separate analytics database instead.
Should I track metrics in the agent code or use a sidecar?
Instrument directly in the agent code for LLM-specific metrics like token counts and tool call results, because only the application has that context. Use a sidecar or service mesh for infrastructure metrics like HTTP request rate and network latency. The two approaches complement each other.
How do I estimate costs when using multiple models?
Create a pricing lookup that maps model names to per-token costs, then apply it as a Grafana transformation or Prometheus recording rule. Update the pricing table whenever your provider changes rates. Some teams store costs in a database and join with token metrics in Grafana for more flexibility.
#Dashboards #Grafana #Prometheus #Monitoring #AIAgents #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.