Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics

Why Cost Tracking Is Critical for Production Agents

LLM costs scale with usage in ways that are easy to underestimate. A single GPT-4o call might cost fractions of a cent, but an agent that makes three LLM calls per user message — one for routing, one for the specialist, one for summarization — multiplied by thousands of daily users creates a bill that grows faster than most teams expect. Without per-user, per-feature cost attribution, you cannot answer basic questions: Which users drive the most cost? Which agent features are expensive relative to their value? Are costs growing faster than revenue?

A cost tracking system captures token usage at the call level, attributes it to users and features, stores it for analysis, and alerts when budgets are at risk.

The Token Usage Data Model

Start with a database table that records every LLM call with enough context for flexible analysis.

# SQLAlchemy model for token usage tracking
from sqlalchemy import Column, String, Integer, Float, DateTime, Index
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import declarative_base
from datetime import datetime
import uuid

Base = declarative_base()

class TokenUsage(Base):
    __tablename__ = "token_usage"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    timestamp = Column(DateTime, nullable=False, default=datetime.utcnow)
    user_id = Column(String, nullable=False, index=True)
    conversation_id = Column(String, nullable=False, index=True)
    agent_name = Column(String, nullable=False)
    feature = Column(String, nullable=False)  # e.g., "routing", "support", "summarization"
    model = Column(String, nullable=False)
    prompt_tokens = Column(Integer, nullable=False)
    completion_tokens = Column(Integer, nullable=False)
    total_tokens = Column(Integer, nullable=False)
    cost_usd = Column(Float, nullable=False)

    __table_args__ = (
        Index("idx_usage_user_timestamp", "user_id", "timestamp"),
        Index("idx_usage_feature_timestamp", "feature", "timestamp"),
    )

Recording Token Usage from LLM Calls

Wrap your LLM client to automatically record usage after every call. Maintain a pricing table that maps models to per-token costs.

MODEL_PRICING = {
    # model: (cost_per_prompt_token, cost_per_completion_token)
    "gpt-4o": (0.0000025, 0.00001),
    "gpt-4o-mini": (0.00000015, 0.0000006),
    "claude-sonnet-4-20250514": (0.000003, 0.000015),
    "claude-haiku-35": (0.0000008, 0.000004),
}

def calculate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
    pricing = MODEL_PRICING.get(model, (0.000003, 0.000015))
    return (prompt_tokens * pricing[0]) + (completion_tokens * pricing[1])

async def tracked_llm_call(
    model: str,
    messages: list,
    user_id: str,
    conversation_id: str,
    feature: str,
    agent_name: str,
    db_session,
):
    response = await llm_client.chat.completions.create(
        model=model, messages=messages
    )

    usage = response.usage
    cost = calculate_cost(model, usage.prompt_tokens, usage.completion_tokens)

    record = TokenUsage(
        user_id=user_id,
        conversation_id=conversation_id,
        agent_name=agent_name,
        feature=feature,
        model=model,
        prompt_tokens=usage.prompt_tokens,
        completion_tokens=usage.completion_tokens,
        total_tokens=usage.total_tokens,
        cost_usd=cost,
    )
    db_session.add(record)
    await db_session.commit()

    return response

Building Usage Analytics Queries

With usage data in PostgreSQL, you can answer the key cost questions with straightforward SQL.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from sqlalchemy import func, text
from datetime import datetime, timedelta

async def get_daily_cost_by_feature(db_session, days: int = 30):
    """Cost per feature per day for the last N days."""
    cutoff = datetime.utcnow() - timedelta(days=days)
    result = await db_session.execute(
        text("""
            SELECT
                date_trunc('day', timestamp) AS day,
                feature,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(*) AS call_count
            FROM token_usage
            WHERE timestamp >= :cutoff
            GROUP BY day, feature
            ORDER BY day DESC, total_cost DESC
        """),
        {"cutoff": cutoff},
    )
    return result.fetchall()

async def get_top_users_by_cost(db_session, limit: int = 20):
    """Top N users by total LLM cost in the current month."""
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    result = await db_session.execute(
        text("""
            SELECT
                user_id,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(DISTINCT conversation_id) AS conversations
            FROM token_usage
            WHERE timestamp >= :month_start
            GROUP BY user_id
            ORDER BY total_cost DESC
            LIMIT :limit
        """),
        {"month_start": month_start, "limit": limit},
    )
    return result.fetchall()

Budget Alerts

Check user and global budgets after every LLM call. When a threshold is exceeded, send alerts and optionally throttle the user.

MONTHLY_BUDGET_USD = 5000.0
PER_USER_DAILY_LIMIT_USD = 2.0

async def check_budgets(user_id: str, db_session):
    """Check both global and per-user budgets after each call."""
    # Check per-user daily spend
    today_start = datetime.utcnow().replace(hour=0, minute=0, second=0)
    user_result = await db_session.execute(
        text("""
            SELECT COALESCE(SUM(cost_usd), 0)
            FROM token_usage
            WHERE user_id = :user_id AND timestamp >= :today_start
        """),
        {"user_id": user_id, "today_start": today_start},
    )
    user_daily_cost = user_result.scalar()

    if user_daily_cost >= PER_USER_DAILY_LIMIT_USD:
        await send_alert(
            severity="warning",
            message=f"User {user_id} exceeded daily limit: ${user_daily_cost:.2f}",
        )
        raise BudgetExceededError(f"Daily usage limit reached for user {user_id}")

    # Check global monthly spend
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    global_result = await db_session.execute(
        text("SELECT COALESCE(SUM(cost_usd), 0) FROM token_usage WHERE timestamp >= :month_start"),
        {"month_start": month_start},
    )
    monthly_cost = global_result.scalar()

    if monthly_cost >= MONTHLY_BUDGET_USD * 0.8:
        await send_alert(
            severity="critical",
            message=f"Monthly budget 80% consumed: ${monthly_cost:.2f} / ${MONTHLY_BUDGET_USD}",
        )

Exposing a Cost Dashboard API

Serve the analytics data through a FastAPI endpoint so your dashboard frontend can display it.

from fastapi import APIRouter, Depends

router = APIRouter(prefix="/api/costs")

@router.get("/daily-by-feature")
async def daily_costs(days: int = 30, db=Depends(get_db)):
    rows = await get_daily_cost_by_feature(db, days)
    return [
        {"day": str(r.day.date()), "feature": r.feature,
         "cost": round(r.total_cost, 4), "tokens": r.total_tokens}
        for r in rows
    ]

@router.get("/top-users")
async def top_users(limit: int = 20, db=Depends(get_db)):
    rows = await get_top_users_by_cost(db, limit)
    return [
        {"user_id": r.user_id, "cost": round(r.total_cost, 4),
         "tokens": r.total_tokens, "conversations": r.conversations}
        for r in rows
    ]

FAQ

How accurate is token-based cost tracking compared to the actual invoice?

Token-based tracking is typically within 2-5% of the actual invoice. Discrepancies come from retries that consume tokens before failing, cached completions that some providers discount, and rounding differences. Reconcile your tracked costs against the provider invoice monthly and adjust your pricing table if needed.

Should I track costs synchronously or asynchronously?

Use asynchronous recording. Write the usage record to a queue or background task so it does not add latency to the user response. A simple approach is to use asyncio.create_task() to fire the database write without awaiting it in the request path. For high-throughput systems, batch writes via a message queue like Redis Streams or Kafka.

How do I handle cost tracking when the agent retries a failed LLM call?

Track every attempt, including retries. Each attempt consumes tokens and incurs cost, even if the response is discarded. Add a retry_attempt field to your usage table so you can analyze retry rates and their cost impact separately from successful first-attempt calls.

#CostTracking #TokenUsage #Analytics #AIAgents #BudgetManagement #AgenticAI #LearnAI #AIEngineering

Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics

Why Cost Tracking Is Critical for Production Agents

The Token Usage Data Model

Recording Token Usage from LLM Calls

Building Usage Analytics Queries

Budget Alerts

Exposing a Cost Dashboard API

FAQ

How accurate is token-based cost tracking compared to the actual invoice?

Should I track costs synchronously or asynchronously?

How do I handle cost tracking when the agent retries a failed LLM call?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding