Usage-Based Billing for AI Agent Platforms: Metering Conversations, Tokens, and API Calls
Implement accurate usage-based billing for an AI agent SaaS platform, including real-time metering of LLM tokens and API calls, Stripe integration for invoicing, and strategies for cost transparency.
Why Usage-Based Billing Wins for Agent Platforms
Flat-rate pricing for AI agent platforms is a trap. A customer running a simple FAQ bot consumes pennies per month in LLM costs, while a customer with a complex research agent that chains ten tool calls per conversation can cost you dollars per interaction. If both pay the same flat fee, you either price out the light users or hemorrhage money on the heavy users.
Usage-based billing aligns your revenue with your costs. Customers pay for what they use, which means you can offer a generous free tier to drive adoption without worrying about a single heavy user bankrupting you. The implementation challenge is accurate, real-time metering that customers trust.
Defining Billable Units
The first design decision is what you meter. Agent platforms have three natural billable dimensions:
flowchart TD
START["Usage-Based Billing for AI Agent Platforms: Meter…"] --> A
A["Why Usage-Based Billing Wins for Agent …"]
A --> B
B["Defining Billable Units"]
B --> C
C["Real-Time Usage Metering"]
C --> D
D["Integrating Metering into the Agent Run…"]
D --> E
E["Stripe Invoice Generation"]
E --> F
F["FAQ"]
F --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
# billing_units.py — Billable unit definitions
from enum import Enum
from pydantic import BaseModel
from datetime import datetime
import uuid
class BillableUnit(str, Enum):
LLM_INPUT_TOKEN = "llm_input_token"
LLM_OUTPUT_TOKEN = "llm_output_token"
TOOL_EXECUTION = "tool_execution"
CONVERSATION = "conversation"
API_CALL = "api_call"
class UsageEvent(BaseModel):
id: uuid.UUID
tenant_id: uuid.UUID
agent_id: uuid.UUID
conversation_id: uuid.UUID
unit: BillableUnit
quantity: int
unit_cost_micros: int # Cost in micro-dollars (1/1,000,000 of a dollar)
timestamp: datetime
metadata: dict = {}
# Pricing tiers per unit (in micro-dollars)
PRICING = {
"free": {
BillableUnit.LLM_INPUT_TOKEN: 0,
BillableUnit.LLM_OUTPUT_TOKEN: 0,
BillableUnit.CONVERSATION: 0,
"limits": {"conversations_per_month": 100, "tokens_per_month": 500_000},
},
"pro": {
BillableUnit.LLM_INPUT_TOKEN: 3, # $0.000003 per input token
BillableUnit.LLM_OUTPUT_TOKEN: 15, # $0.000015 per output token
BillableUnit.CONVERSATION: 1000, # $0.001 per conversation
"limits": {"conversations_per_month": 50_000, "tokens_per_month": 50_000_000},
},
"enterprise": {
# Custom pricing negotiated per contract
},
}
The micro-dollar approach avoids floating-point precision issues. Every calculation stays in integers until the final invoice rendering, where you divide by 1,000,000 to get dollar amounts.
Real-Time Usage Metering
The metering pipeline must be fast and reliable. You cannot block agent responses to write billing records. The solution is an asynchronous event pipeline:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# metering.py — Async usage metering pipeline
import asyncio
from collections import defaultdict
from datetime import datetime, timedelta
import uuid
class UsageMeter:
def __init__(self, event_store, pricing_config):
self.event_store = event_store
self.pricing = pricing_config
self._buffer: list[UsageEvent] = []
self._flush_interval = 5 # seconds
self._buffer_limit = 500
async def record(self, tenant_id, agent_id, conversation_id, unit, quantity, plan):
unit_cost = self.pricing.get(plan, {}).get(unit, 0)
event = UsageEvent(
id=uuid.uuid4(),
tenant_id=tenant_id,
agent_id=agent_id,
conversation_id=conversation_id,
unit=unit,
quantity=quantity,
unit_cost_micros=unit_cost * quantity,
timestamp=datetime.utcnow(),
)
self._buffer.append(event)
if len(self._buffer) >= self._buffer_limit:
await self._flush()
async def _flush(self):
if not self._buffer:
return
events = self._buffer.copy()
self._buffer.clear()
await self.event_store.bulk_insert(events)
async def start_periodic_flush(self):
while True:
await asyncio.sleep(self._flush_interval)
await self._flush()
The meter buffers events in memory and flushes them in batches. This keeps the hot path fast — recording a usage event is just a list append — while ensuring events reach persistent storage within seconds.
Integrating Metering into the Agent Runtime
The agent runtime emits usage events at each step of execution:
# runtime_billing.py — Billing-aware agent runtime wrapper
class BillingAwareRuntime:
def __init__(self, runtime, meter: UsageMeter):
self.runtime = runtime
self.meter = meter
async def execute(self, agent, messages, tenant):
conversation_id = uuid.uuid4()
plan = tenant["plan"]
tenant_id = tenant["id"]
# Check limits before execution
current_usage = await self.meter.event_store.get_monthly_usage(tenant_id)
limits = PRICING[plan].get("limits", {})
if limits and current_usage.conversations >= limits["conversations_per_month"]:
raise UsageLimitExceeded("Monthly conversation limit reached")
# Record conversation start
await self.meter.record(
tenant_id, agent.id, conversation_id,
BillableUnit.CONVERSATION, 1, plan,
)
# Execute and capture token usage
result = await self.runtime.run(agent, messages)
# Record token usage from the LLM response
await self.meter.record(
tenant_id, agent.id, conversation_id,
BillableUnit.LLM_INPUT_TOKEN, result.input_tokens, plan,
)
await self.meter.record(
tenant_id, agent.id, conversation_id,
BillableUnit.LLM_OUTPUT_TOKEN, result.output_tokens, plan,
)
# Record tool executions
for tool_call in result.tool_calls:
await self.meter.record(
tenant_id, agent.id, conversation_id,
BillableUnit.TOOL_EXECUTION, 1, plan,
)
return result
Stripe Invoice Generation
At the end of each billing period, aggregate usage events into a Stripe invoice:
# invoicing.py — Stripe usage-based invoice generation
import stripe
class InvoiceGenerator:
def __init__(self, stripe_api_key: str):
stripe.api_key = stripe_api_key
async def generate_monthly_invoice(self, tenant_id, billing_period_start, billing_period_end):
usage = await self.aggregate_usage(tenant_id, billing_period_start, billing_period_end)
tenant = await self.get_tenant(tenant_id)
invoice = stripe.Invoice.create(
customer=tenant.stripe_customer_id,
auto_advance=True,
collection_method="charge_automatically",
)
if usage.total_input_tokens > 0:
stripe.InvoiceItem.create(
customer=tenant.stripe_customer_id,
invoice=invoice.id,
description=f"LLM Input Tokens: {usage.total_input_tokens:,}",
amount=usage.input_token_cost_micros // 10000, # Stripe uses cents
currency="usd",
)
if usage.total_output_tokens > 0:
stripe.InvoiceItem.create(
customer=tenant.stripe_customer_id,
invoice=invoice.id,
description=f"LLM Output Tokens: {usage.total_output_tokens:,}",
amount=usage.output_token_cost_micros // 10000,
currency="usd",
)
if usage.total_conversations > 0:
stripe.InvoiceItem.create(
customer=tenant.stripe_customer_id,
invoice=invoice.id,
description=f"Conversations: {usage.total_conversations:,}",
amount=usage.conversation_cost_micros // 10000,
currency="usd",
)
stripe.Invoice.finalize_invoice(invoice.id)
return invoice
FAQ
How do I handle billing for agent retries and errors?
Only bill for successful completions. If the LLM returns an error or the agent loop hits a maximum iteration limit, do not charge the customer for that failed attempt. However, do record the usage event with a status: "failed" flag so you can monitor error rates and their cost impact on your infrastructure.
Should I show customers real-time usage or only on their invoice?
Show real-time usage. Build a usage dashboard that updates every few minutes with a clear projection of their current month bill. Surprises on invoices destroy trust. The metering buffer adds at most a few seconds of delay, which is close enough to real-time for dashboard purposes.
How do I set pricing that covers my LLM costs with margin?
Calculate your blended cost per token across all models you support, then apply a 3-5x markup for your margin. For example, if GPT-4o input tokens cost you $2.50 per million, charge $10-12.50 per million. The markup covers your infrastructure, support, and the value-add of your platform tooling. Review and adjust quarterly as model pricing changes.
#Billing #UsageMetering #Stripe #AIAgents #SaaS #AgenticAI #LearnAI #AIEngineering
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.