Capstone: Building a Multi-Tenant AI Agent SaaS with Usage-Based Billing

SaaS Architecture for AI Agents

Building a multi-tenant AI agent platform requires solving four hard problems simultaneously: tenant isolation (one customer's data and agents must never leak to another), dynamic agent configuration (tenants create agents without writing code), usage metering (track every LLM call, tool invocation, and conversation), and billing (charge based on actual consumption).

This capstone builds a platform where each tenant signs up, creates agents through a web-based builder, deploys them to their own endpoints, and pays based on usage. The architecture uses a shared PostgreSQL database with row-level tenant isolation, a FastAPI backend, and Stripe for billing.

Data Model with Tenant Isolation

Every table includes a tenant_id column. All queries are scoped to the authenticated tenant.

flowchart TD
    START["Capstone: Building a Multi-Tenant AI Agent SaaS w…"] --> A
    A["SaaS Architecture for AI Agents"]
    A --> B
    B["Data Model with Tenant Isolation"]
    B --> C
    C["Tenant-Scoped Dependency Injection"]
    C --> D
    D["Dynamic Agent Builder"]
    D --> E
    E["Usage Metering"]
    E --> F
    F["Stripe Billing Integration"]
    F --> G
    G["Tenant API Endpoint"]
    G --> H
    H["FAQ"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

# models.py
from sqlalchemy import Column, String, Text, Integer, Float, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import UUID, JSONB
import uuid

class Tenant(Base):
    __tablename__ = "tenants"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    name = Column(String(200), nullable=False)
    slug = Column(String(100), unique=True, nullable=False)
    stripe_customer_id = Column(String(100), nullable=True)
    plan = Column(String(50), default="free")  # free, starter, pro, enterprise
    api_key = Column(String(100), unique=True)
    created_at = Column(DateTime, server_default="now()")

class AgentConfig(Base):
    __tablename__ = "agent_configs"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), index=True)
    name = Column(String(200))
    instructions = Column(Text)
    model = Column(String(50), default="gpt-4o")
    tools = Column(JSONB, default=[])  # list of enabled tool configs
    is_active = Column(String(10), default="true")
    created_at = Column(DateTime, server_default="now()")

class UsageRecord(Base):
    __tablename__ = "usage_records"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), index=True)
    agent_id = Column(UUID(as_uuid=True), ForeignKey("agent_configs.id"))
    event_type = Column(String(50))  # "llm_call", "tool_call", "conversation"
    tokens_input = Column(Integer, default=0)
    tokens_output = Column(Integer, default=0)
    cost_cents = Column(Float, default=0)
    metadata_ = Column(JSONB, default={})
    created_at = Column(DateTime, server_default="now()")

Tenant-Scoped Dependency Injection

Use a FastAPI dependency that extracts the tenant from the API key and scopes all database queries.

# core/auth.py
from fastapi import Depends, HTTPException, Security
from fastapi.security import APIKeyHeader

api_key_header = APIKeyHeader(name="X-API-Key")

async def get_current_tenant(
    api_key: str = Security(api_key_header),
    db=Depends(get_db),
) -> Tenant:
    tenant = db.query(Tenant).filter(Tenant.api_key == api_key).first()
    if not tenant:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return tenant

class TenantScoped:
    """Utility to scope queries to the current tenant."""
    def __init__(self, db, tenant: Tenant):
        self.db = db
        self.tenant_id = tenant.id

    def query(self, model):
        return self.db.query(model).filter(model.tenant_id == self.tenant_id)

Dynamic Agent Builder

Tenants configure agents through the admin dashboard. The backend loads agent configurations from the database and instantiates them on demand.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

# services/agent_factory.py
from agents import Agent, function_tool

# Registry of available tools that tenants can enable
TOOL_REGISTRY = {
    "search_kb": search_knowledge_base,
    "send_email": send_email_tool,
    "create_ticket": create_ticket_tool,
    "lookup_order": lookup_order_tool,
    "check_calendar": check_calendar_tool,
}

def build_agent_from_config(config: AgentConfig) -> Agent:
    """Dynamically build an Agent from a database configuration."""
    enabled_tools = []
    for tool_config in config.tools:
        tool_name = tool_config["name"]
        if tool_name in TOOL_REGISTRY:
            enabled_tools.append(TOOL_REGISTRY[tool_name])

    return Agent(
        name=config.name,
        instructions=config.instructions,
        model=config.model,
        tools=enabled_tools,
    )

Usage Metering

Every LLM call and tool invocation is recorded for billing.

# services/metering.py
from datetime import datetime

TOKEN_COSTS = {
    "gpt-4o": {"input": 0.25, "output": 1.00},      # per 100k tokens
    "gpt-4o-mini": {"input": 0.015, "output": 0.06},
}

async def record_usage(
    db, tenant_id: str, agent_id: str,
    event_type: str, tokens_in: int, tokens_out: int, model: str
):
    costs = TOKEN_COSTS.get(model, TOKEN_COSTS["gpt-4o"])
    cost = (tokens_in * costs["input"] + tokens_out * costs["output"]) / 100_000

    record = UsageRecord(
        tenant_id=tenant_id,
        agent_id=agent_id,
        event_type=event_type,
        tokens_input=tokens_in,
        tokens_output=tokens_out,
        cost_cents=cost * 100,  # store in cents
    )
    db.add(record)
    db.commit()

Stripe Billing Integration

Sync usage to Stripe at the end of each billing period using Stripe metered billing.

# services/billing.py
import stripe
from sqlalchemy import func
from datetime import datetime, timedelta

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

async def sync_usage_to_stripe(tenant_id: str, db):
    """Report usage to Stripe for metered billing."""
    tenant = db.query(Tenant).get(tenant_id)
    if not tenant.stripe_customer_id:
        return

    # Calculate usage since last sync
    period_start = datetime.utcnow() - timedelta(days=1)
    total_cost = db.query(func.sum(UsageRecord.cost_cents)).filter(
        UsageRecord.tenant_id == tenant_id,
        UsageRecord.created_at >= period_start,
    ).scalar() or 0

    # Report to Stripe
    stripe.billing.MeterEvent.create(
        event_name="ai_agent_usage",
        payload={
            "value": str(int(total_cost)),
            "stripe_customer_id": tenant.stripe_customer_id,
        },
    )

async def get_tenant_usage_summary(tenant_id: str, days: int, db) -> dict:
    since = datetime.utcnow() - timedelta(days=days)
    records = db.query(UsageRecord).filter(
        UsageRecord.tenant_id == tenant_id,
        UsageRecord.created_at >= since,
    ).all()
    return {
        "total_cost_cents": sum(r.cost_cents for r in records),
        "total_llm_calls": sum(1 for r in records if r.event_type == "llm_call"),
        "total_tokens_input": sum(r.tokens_input for r in records),
        "total_tokens_output": sum(r.tokens_output for r in records),
        "total_conversations": sum(1 for r in records if r.event_type == "conversation"),
    }

Tenant API Endpoint

Each tenant gets their own agent endpoint, authenticated by their API key.

# routes/agent_api.py
from fastapi import APIRouter

router = APIRouter()

@router.post("/v1/chat")
async def chat(
    body: ChatRequest,
    tenant: Tenant = Depends(get_current_tenant),
    db=Depends(get_db),
):
    scoped = TenantScoped(db, tenant)
    config = scoped.query(AgentConfig).filter(
        AgentConfig.id == body.agent_id
    ).first()
    if not config:
        raise HTTPException(404, "Agent not found")

    agent = build_agent_from_config(config)
    result = await Runner.run(agent, body.message)

    # Record usage
    usage = result.raw_responses[-1].usage
    await record_usage(
        db, str(tenant.id), str(config.id),
        "llm_call", usage.input_tokens, usage.output_tokens, config.model
    )

    return {"reply": result.final_output, "agent": config.name}

FAQ

How do I prevent one tenant's heavy usage from affecting others?

Implement per-tenant rate limiting using a Redis-backed token bucket. Each tenant gets a request-per-minute and tokens-per-day limit based on their plan tier. When a tenant exceeds their limit, return a 429 status code with a Retry-After header.

How do I handle tenant data deletion for compliance?

Implement a cascade delete that removes all tenant data: agent configs, usage records, conversations, and any uploaded knowledge base documents. Use a soft-delete first (mark as deleted with a timestamp) and run a hard-delete job after a 30-day grace period. Log the deletion for audit compliance.

How do I let tenants bring their own API keys?

Store tenant-provided API keys encrypted in the database. When building an agent for that tenant, configure the OpenAI client with their key instead of yours. This shifts LLM costs to the tenant while you charge only for platform usage. Validate the key on save by making a minimal API call.

#CapstoneProject #SaaS #MultiTenant #Billing #AgentBuilder #FullStackAI #AgenticAI #LearnAI #AIEngineering

Capstone: Building a Multi-Tenant AI Agent SaaS with Usage-Based Billing

SaaS Architecture for AI Agents

Data Model with Tenant Isolation

Tenant-Scoped Dependency Injection

Dynamic Agent Builder

Usage Metering

Stripe Billing Integration

Tenant API Endpoint

FAQ

How do I prevent one tenant's heavy usage from affecting others?

How do I handle tenant data deletion for compliance?

How do I let tenants bring their own API keys?

Try CallSphere AI Voice Agents

Related Articles

Building an AI Agent with Tool-Use Chains: Sequential Tool Orchestration for Complex Tasks

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Building a Hypothesis-Testing Agent: Scientific Method Applied to Data Analysis