Skip to content
Learn Agentic AI
Learn Agentic AI14 min read0 views

Usage-Based Billing for AI Agent Platforms: Metering Conversations, Tokens, and API Calls

Implement accurate usage-based billing for an AI agent SaaS platform, including real-time metering of LLM tokens and API calls, Stripe integration for invoicing, and strategies for cost transparency.

Why Usage-Based Billing Wins for Agent Platforms

Flat-rate pricing for AI agent platforms is a trap. A customer running a simple FAQ bot consumes pennies per month in LLM costs, while a customer with a complex research agent that chains ten tool calls per conversation can cost you dollars per interaction. If both pay the same flat fee, you either price out the light users or hemorrhage money on the heavy users.

Usage-based billing aligns your revenue with your costs. Customers pay for what they use, which means you can offer a generous free tier to drive adoption without worrying about a single heavy user bankrupting you. The implementation challenge is accurate, real-time metering that customers trust.

Defining Billable Units

The first design decision is what you meter. Agent platforms have three natural billable dimensions:

flowchart TD
    START["Usage-Based Billing for AI Agent Platforms: Meter…"] --> A
    A["Why Usage-Based Billing Wins for Agent …"]
    A --> B
    B["Defining Billable Units"]
    B --> C
    C["Real-Time Usage Metering"]
    C --> D
    D["Integrating Metering into the Agent Run…"]
    D --> E
    E["Stripe Invoice Generation"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
# billing_units.py — Billable unit definitions
from enum import Enum
from pydantic import BaseModel
from datetime import datetime
import uuid

class BillableUnit(str, Enum):
    LLM_INPUT_TOKEN = "llm_input_token"
    LLM_OUTPUT_TOKEN = "llm_output_token"
    TOOL_EXECUTION = "tool_execution"
    CONVERSATION = "conversation"
    API_CALL = "api_call"

class UsageEvent(BaseModel):
    id: uuid.UUID
    tenant_id: uuid.UUID
    agent_id: uuid.UUID
    conversation_id: uuid.UUID
    unit: BillableUnit
    quantity: int
    unit_cost_micros: int  # Cost in micro-dollars (1/1,000,000 of a dollar)
    timestamp: datetime
    metadata: dict = {}

# Pricing tiers per unit (in micro-dollars)
PRICING = {
    "free": {
        BillableUnit.LLM_INPUT_TOKEN: 0,
        BillableUnit.LLM_OUTPUT_TOKEN: 0,
        BillableUnit.CONVERSATION: 0,
        "limits": {"conversations_per_month": 100, "tokens_per_month": 500_000},
    },
    "pro": {
        BillableUnit.LLM_INPUT_TOKEN: 3,   # $0.000003 per input token
        BillableUnit.LLM_OUTPUT_TOKEN: 15,  # $0.000015 per output token
        BillableUnit.CONVERSATION: 1000,     # $0.001 per conversation
        "limits": {"conversations_per_month": 50_000, "tokens_per_month": 50_000_000},
    },
    "enterprise": {
        # Custom pricing negotiated per contract
    },
}

The micro-dollar approach avoids floating-point precision issues. Every calculation stays in integers until the final invoice rendering, where you divide by 1,000,000 to get dollar amounts.

Real-Time Usage Metering

The metering pipeline must be fast and reliable. You cannot block agent responses to write billing records. The solution is an asynchronous event pipeline:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

# metering.py — Async usage metering pipeline
import asyncio
from collections import defaultdict
from datetime import datetime, timedelta
import uuid

class UsageMeter:
    def __init__(self, event_store, pricing_config):
        self.event_store = event_store
        self.pricing = pricing_config
        self._buffer: list[UsageEvent] = []
        self._flush_interval = 5  # seconds
        self._buffer_limit = 500

    async def record(self, tenant_id, agent_id, conversation_id, unit, quantity, plan):
        unit_cost = self.pricing.get(plan, {}).get(unit, 0)
        event = UsageEvent(
            id=uuid.uuid4(),
            tenant_id=tenant_id,
            agent_id=agent_id,
            conversation_id=conversation_id,
            unit=unit,
            quantity=quantity,
            unit_cost_micros=unit_cost * quantity,
            timestamp=datetime.utcnow(),
        )
        self._buffer.append(event)

        if len(self._buffer) >= self._buffer_limit:
            await self._flush()

    async def _flush(self):
        if not self._buffer:
            return
        events = self._buffer.copy()
        self._buffer.clear()
        await self.event_store.bulk_insert(events)

    async def start_periodic_flush(self):
        while True:
            await asyncio.sleep(self._flush_interval)
            await self._flush()

The meter buffers events in memory and flushes them in batches. This keeps the hot path fast — recording a usage event is just a list append — while ensuring events reach persistent storage within seconds.

Integrating Metering into the Agent Runtime

The agent runtime emits usage events at each step of execution:

# runtime_billing.py — Billing-aware agent runtime wrapper
class BillingAwareRuntime:
    def __init__(self, runtime, meter: UsageMeter):
        self.runtime = runtime
        self.meter = meter

    async def execute(self, agent, messages, tenant):
        conversation_id = uuid.uuid4()
        plan = tenant["plan"]
        tenant_id = tenant["id"]

        # Check limits before execution
        current_usage = await self.meter.event_store.get_monthly_usage(tenant_id)
        limits = PRICING[plan].get("limits", {})
        if limits and current_usage.conversations >= limits["conversations_per_month"]:
            raise UsageLimitExceeded("Monthly conversation limit reached")

        # Record conversation start
        await self.meter.record(
            tenant_id, agent.id, conversation_id,
            BillableUnit.CONVERSATION, 1, plan,
        )

        # Execute and capture token usage
        result = await self.runtime.run(agent, messages)

        # Record token usage from the LLM response
        await self.meter.record(
            tenant_id, agent.id, conversation_id,
            BillableUnit.LLM_INPUT_TOKEN, result.input_tokens, plan,
        )
        await self.meter.record(
            tenant_id, agent.id, conversation_id,
            BillableUnit.LLM_OUTPUT_TOKEN, result.output_tokens, plan,
        )

        # Record tool executions
        for tool_call in result.tool_calls:
            await self.meter.record(
                tenant_id, agent.id, conversation_id,
                BillableUnit.TOOL_EXECUTION, 1, plan,
            )

        return result

Stripe Invoice Generation

At the end of each billing period, aggregate usage events into a Stripe invoice:

# invoicing.py — Stripe usage-based invoice generation
import stripe

class InvoiceGenerator:
    def __init__(self, stripe_api_key: str):
        stripe.api_key = stripe_api_key

    async def generate_monthly_invoice(self, tenant_id, billing_period_start, billing_period_end):
        usage = await self.aggregate_usage(tenant_id, billing_period_start, billing_period_end)
        tenant = await self.get_tenant(tenant_id)

        invoice = stripe.Invoice.create(
            customer=tenant.stripe_customer_id,
            auto_advance=True,
            collection_method="charge_automatically",
        )

        if usage.total_input_tokens > 0:
            stripe.InvoiceItem.create(
                customer=tenant.stripe_customer_id,
                invoice=invoice.id,
                description=f"LLM Input Tokens: {usage.total_input_tokens:,}",
                amount=usage.input_token_cost_micros // 10000,  # Stripe uses cents
                currency="usd",
            )

        if usage.total_output_tokens > 0:
            stripe.InvoiceItem.create(
                customer=tenant.stripe_customer_id,
                invoice=invoice.id,
                description=f"LLM Output Tokens: {usage.total_output_tokens:,}",
                amount=usage.output_token_cost_micros // 10000,
                currency="usd",
            )

        if usage.total_conversations > 0:
            stripe.InvoiceItem.create(
                customer=tenant.stripe_customer_id,
                invoice=invoice.id,
                description=f"Conversations: {usage.total_conversations:,}",
                amount=usage.conversation_cost_micros // 10000,
                currency="usd",
            )

        stripe.Invoice.finalize_invoice(invoice.id)
        return invoice

FAQ

How do I handle billing for agent retries and errors?

Only bill for successful completions. If the LLM returns an error or the agent loop hits a maximum iteration limit, do not charge the customer for that failed attempt. However, do record the usage event with a status: "failed" flag so you can monitor error rates and their cost impact on your infrastructure.

Should I show customers real-time usage or only on their invoice?

Show real-time usage. Build a usage dashboard that updates every few minutes with a clear projection of their current month bill. Surprises on invoices destroy trust. The metering buffer adds at most a few seconds of delay, which is close enough to real-time for dashboard purposes.

How do I set pricing that covers my LLM costs with margin?

Calculate your blended cost per token across all models you support, then apply a 3-5x markup for your margin. For example, if GPT-4o input tokens cost you $2.50 per million, charge $10-12.50 per million. The markup covers your infrastructure, support, and the value-add of your platform tooling. Review and adjust quarterly as model pricing changes.


#Billing #UsageMetering #Stripe #AIAgents #SaaS #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.