Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics
Build a complete cost tracking system for AI agents that attributes token usage to individual users and features, sets budget alerts, and provides dashboards for controlling LLM spend in production.
Why Cost Tracking Is Critical for Production Agents
LLM costs scale with usage in ways that are easy to underestimate. A single GPT-4o call might cost fractions of a cent, but an agent that makes three LLM calls per user message — one for routing, one for the specialist, one for summarization — multiplied by thousands of daily users creates a bill that grows faster than most teams expect. Without per-user, per-feature cost attribution, you cannot answer basic questions: Which users drive the most cost? Which agent features are expensive relative to their value? Are costs growing faster than revenue?
A cost tracking system captures token usage at the call level, attributes it to users and features, stores it for analysis, and alerts when budgets are at risk.
The Token Usage Data Model
Start with a database table that records every LLM call with enough context for flexible analysis.
# SQLAlchemy model for token usage tracking
from sqlalchemy import Column, String, Integer, Float, DateTime, Index
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import declarative_base
from datetime import datetime
import uuid
Base = declarative_base()
class TokenUsage(Base):
__tablename__ = "token_usage"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
timestamp = Column(DateTime, nullable=False, default=datetime.utcnow)
user_id = Column(String, nullable=False, index=True)
conversation_id = Column(String, nullable=False, index=True)
agent_name = Column(String, nullable=False)
feature = Column(String, nullable=False) # e.g., "routing", "support", "summarization"
model = Column(String, nullable=False)
prompt_tokens = Column(Integer, nullable=False)
completion_tokens = Column(Integer, nullable=False)
total_tokens = Column(Integer, nullable=False)
cost_usd = Column(Float, nullable=False)
__table_args__ = (
Index("idx_usage_user_timestamp", "user_id", "timestamp"),
Index("idx_usage_feature_timestamp", "feature", "timestamp"),
)
Recording Token Usage from LLM Calls
Wrap your LLM client to automatically record usage after every call. Maintain a pricing table that maps models to per-token costs.
MODEL_PRICING = {
# model: (cost_per_prompt_token, cost_per_completion_token)
"gpt-4o": (0.0000025, 0.00001),
"gpt-4o-mini": (0.00000015, 0.0000006),
"claude-sonnet-4-20250514": (0.000003, 0.000015),
"claude-haiku-35": (0.0000008, 0.000004),
}
def calculate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
pricing = MODEL_PRICING.get(model, (0.000003, 0.000015))
return (prompt_tokens * pricing[0]) + (completion_tokens * pricing[1])
async def tracked_llm_call(
model: str,
messages: list,
user_id: str,
conversation_id: str,
feature: str,
agent_name: str,
db_session,
):
response = await llm_client.chat.completions.create(
model=model, messages=messages
)
usage = response.usage
cost = calculate_cost(model, usage.prompt_tokens, usage.completion_tokens)
record = TokenUsage(
user_id=user_id,
conversation_id=conversation_id,
agent_name=agent_name,
feature=feature,
model=model,
prompt_tokens=usage.prompt_tokens,
completion_tokens=usage.completion_tokens,
total_tokens=usage.total_tokens,
cost_usd=cost,
)
db_session.add(record)
await db_session.commit()
return response
Building Usage Analytics Queries
With usage data in PostgreSQL, you can answer the key cost questions with straightforward SQL.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from sqlalchemy import func, text
from datetime import datetime, timedelta
async def get_daily_cost_by_feature(db_session, days: int = 30):
"""Cost per feature per day for the last N days."""
cutoff = datetime.utcnow() - timedelta(days=days)
result = await db_session.execute(
text("""
SELECT
date_trunc('day', timestamp) AS day,
feature,
SUM(cost_usd) AS total_cost,
SUM(total_tokens) AS total_tokens,
COUNT(*) AS call_count
FROM token_usage
WHERE timestamp >= :cutoff
GROUP BY day, feature
ORDER BY day DESC, total_cost DESC
"""),
{"cutoff": cutoff},
)
return result.fetchall()
async def get_top_users_by_cost(db_session, limit: int = 20):
"""Top N users by total LLM cost in the current month."""
month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
result = await db_session.execute(
text("""
SELECT
user_id,
SUM(cost_usd) AS total_cost,
SUM(total_tokens) AS total_tokens,
COUNT(DISTINCT conversation_id) AS conversations
FROM token_usage
WHERE timestamp >= :month_start
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT :limit
"""),
{"month_start": month_start, "limit": limit},
)
return result.fetchall()
Budget Alerts
Check user and global budgets after every LLM call. When a threshold is exceeded, send alerts and optionally throttle the user.
MONTHLY_BUDGET_USD = 5000.0
PER_USER_DAILY_LIMIT_USD = 2.0
async def check_budgets(user_id: str, db_session):
"""Check both global and per-user budgets after each call."""
# Check per-user daily spend
today_start = datetime.utcnow().replace(hour=0, minute=0, second=0)
user_result = await db_session.execute(
text("""
SELECT COALESCE(SUM(cost_usd), 0)
FROM token_usage
WHERE user_id = :user_id AND timestamp >= :today_start
"""),
{"user_id": user_id, "today_start": today_start},
)
user_daily_cost = user_result.scalar()
if user_daily_cost >= PER_USER_DAILY_LIMIT_USD:
await send_alert(
severity="warning",
message=f"User {user_id} exceeded daily limit: ${user_daily_cost:.2f}",
)
raise BudgetExceededError(f"Daily usage limit reached for user {user_id}")
# Check global monthly spend
month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
global_result = await db_session.execute(
text("SELECT COALESCE(SUM(cost_usd), 0) FROM token_usage WHERE timestamp >= :month_start"),
{"month_start": month_start},
)
monthly_cost = global_result.scalar()
if monthly_cost >= MONTHLY_BUDGET_USD * 0.8:
await send_alert(
severity="critical",
message=f"Monthly budget 80% consumed: ${monthly_cost:.2f} / ${MONTHLY_BUDGET_USD}",
)
Exposing a Cost Dashboard API
Serve the analytics data through a FastAPI endpoint so your dashboard frontend can display it.
from fastapi import APIRouter, Depends
router = APIRouter(prefix="/api/costs")
@router.get("/daily-by-feature")
async def daily_costs(days: int = 30, db=Depends(get_db)):
rows = await get_daily_cost_by_feature(db, days)
return [
{"day": str(r.day.date()), "feature": r.feature,
"cost": round(r.total_cost, 4), "tokens": r.total_tokens}
for r in rows
]
@router.get("/top-users")
async def top_users(limit: int = 20, db=Depends(get_db)):
rows = await get_top_users_by_cost(db, limit)
return [
{"user_id": r.user_id, "cost": round(r.total_cost, 4),
"tokens": r.total_tokens, "conversations": r.conversations}
for r in rows
]
FAQ
How accurate is token-based cost tracking compared to the actual invoice?
Token-based tracking is typically within 2-5% of the actual invoice. Discrepancies come from retries that consume tokens before failing, cached completions that some providers discount, and rounding differences. Reconcile your tracked costs against the provider invoice monthly and adjust your pricing table if needed.
Should I track costs synchronously or asynchronously?
Use asynchronous recording. Write the usage record to a queue or background task so it does not add latency to the user response. A simple approach is to use asyncio.create_task() to fire the database write without awaiting it in the request path. For high-throughput systems, batch writes via a message queue like Redis Streams or Kafka.
How do I handle cost tracking when the agent retries a failed LLM call?
Track every attempt, including retries. Each attempt consumes tokens and incurs cost, even if the response is discarded. Add a retry_attempt field to your usage table so you can analyze retry rates and their cost impact separately from successful first-attempt calls.
#CostTracking #TokenUsage #Analytics #AIAgents #BudgetManagement #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.