Skip to content
Learn Agentic AI14 min read0 views

Building a Multi-Tenant AI Agent Platform: Isolating Customers in Shared Infrastructure

Design and build a multi-tenant AI agent platform with proper tenant isolation, resource quotas, data segregation, per-tenant billing, and shared infrastructure that scales efficiently without cross-tenant data leakage.

Why Multi-Tenancy Is Hard for AI Agents

Multi-tenant AI agent platforms share infrastructure across customers to reduce costs, but AI agents introduce unique isolation challenges. An agent's system prompt contains business-specific knowledge. Conversation histories contain customer PII. Tool configurations expose internal APIs. A cross-tenant data leak in an AI agent is not just a privacy violation — it could expose one customer's business logic and customer data to another.

The three pillars of AI agent multi-tenancy are data isolation (no tenant can read another tenant's data), resource isolation (one tenant's usage spike does not degrade another's experience), and configuration isolation (each tenant's agent behaves according to their specific settings).

Data Isolation with Row-Level Security

The most practical approach for most platforms is a shared database with row-level security (RLS). Every table includes a tenant_id column, and PostgreSQL enforces that queries only return rows matching the current tenant:

# Database schema with tenant isolation
SCHEMA = """
CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    plan TEXT NOT NULL DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    user_id TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID NOT NULL REFERENCES conversations(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    tokens_used INTEGER DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Row-Level Security
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
ALTER TABLE messages ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_conversations ON conversations
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

CREATE POLICY tenant_isolation_messages ON messages
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

-- Index for tenant-scoped queries
CREATE INDEX idx_messages_tenant_conv
    ON messages (tenant_id, conversation_id, created_at);
"""

Set the tenant context on every database connection before executing queries:

from contextlib import asynccontextmanager

@asynccontextmanager
async def tenant_connection(tenant_id: str):
    conn = await db_pool.acquire()
    try:
        await conn.execute(
            f"SET app.current_tenant = '{tenant_id}'"
        )
        yield conn
    finally:
        await conn.execute("RESET app.current_tenant")
        await db_pool.release(conn)

# Usage
async def get_conversation_history(
    tenant_id: str, conversation_id: str
) -> list:
    async with tenant_connection(tenant_id) as conn:
        # RLS automatically filters to this tenant
        rows = await conn.fetch(
            "SELECT role, content FROM messages "
            "WHERE conversation_id = $1 ORDER BY created_at",
            conversation_id,
        )
        return [dict(r) for r in rows]

Even if a bug in your application code accidentally passes the wrong conversation ID, RLS ensures the query returns zero rows rather than another tenant's data.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Resource Quotas and Rate Limiting

Each tenant needs resource limits to prevent one customer from consuming all capacity. Implement tiered quotas based on the customer's plan:

from dataclasses import dataclass

@dataclass
class TenantQuota:
    messages_per_minute: int
    messages_per_day: int
    max_tokens_per_message: int
    max_concurrent_sessions: int
    monthly_token_budget: int

PLAN_QUOTAS = {
    "free": TenantQuota(
        messages_per_minute=10,
        messages_per_day=100,
        max_tokens_per_message=2000,
        max_concurrent_sessions=5,
        monthly_token_budget=500_000,
    ),
    "pro": TenantQuota(
        messages_per_minute=60,
        messages_per_day=5000,
        max_tokens_per_message=8000,
        max_concurrent_sessions=50,
        monthly_token_budget=10_000_000,
    ),
    "enterprise": TenantQuota(
        messages_per_minute=300,
        messages_per_day=50000,
        max_tokens_per_message=16000,
        max_concurrent_sessions=500,
        monthly_token_budget=100_000_000,
    ),
}

class QuotaEnforcer:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def check_quota(self, tenant_id: str, plan: str) -> bool:
        quota = PLAN_QUOTAS[plan]

        # Check rate limit (sliding window)
        minute_key = f"rate:{tenant_id}:minute"
        current = await self.redis.incr(minute_key)
        if current == 1:
            await self.redis.expire(minute_key, 60)
        if current > quota.messages_per_minute:
            return False

        # Check daily limit
        day_key = f"rate:{tenant_id}:day:{today()}"
        daily = await self.redis.incr(day_key)
        if daily == 1:
            await self.redis.expire(day_key, 86400)
        if daily > quota.messages_per_day:
            return False

        return True

Tenant-Specific Agent Configuration

Each tenant configures their agent differently — custom system prompts, enabled tools, model preferences, branding. Store this configuration separately and load it per request:

class TenantAgentConfig:
    def __init__(self, redis_client, db_pool):
        self.redis = redis_client
        self.db = db_pool

    async def get_config(self, tenant_id: str) -> dict:
        cache_key = f"tenant:config:{tenant_id}"
        cached = await self.redis.get(cache_key)
        if cached:
            return json.loads(cached)

        async with tenant_connection(tenant_id) as conn:
            config = await conn.fetchrow(
                "SELECT system_prompt, model, enabled_tools, "
                "temperature, max_turns FROM agent_configs "
                "WHERE tenant_id = $1 AND active = true",
                tenant_id,
            )

        config_dict = dict(config)
        await self.redis.setex(cache_key, 300, json.dumps(config_dict))
        return config_dict

Per-Tenant Billing with Token Tracking

Track every LLM API call with the tenant ID to enable accurate billing:

class UsageMeter:
    def __init__(self, db_pool):
        self.db = db_pool

    async def record_usage(
        self,
        tenant_id: str,
        model: str,
        input_tokens: int,
        output_tokens: int,
        conversation_id: str,
    ):
        async with self.db.acquire() as conn:
            await conn.execute(
                "INSERT INTO usage_records "
                "(tenant_id, model, input_tokens, output_tokens, "
                "conversation_id, cost_cents, recorded_at) "
                "VALUES ($1, $2, $3, $4, $5, $6, NOW())",
                tenant_id,
                model,
                input_tokens,
                output_tokens,
                conversation_id,
                self._calculate_cost(model, input_tokens, output_tokens),
            )

    def _calculate_cost(
        self, model: str, input_tokens: int, output_tokens: int
    ) -> float:
        rates = {
            "gpt-4o-mini": (0.015, 0.06),
            "gpt-4o": (0.25, 1.00),
        }
        input_rate, output_rate = rates.get(model, (0.25, 1.00))
        return (
            (input_tokens / 100_000) * input_rate
            + (output_tokens / 100_000) * output_rate
        )

FAQ

Should I use a shared database or separate databases per tenant?

Use a shared database with row-level security for most cases. It is simpler to manage, migrate, and back up. Use separate databases only for enterprise customers with strict compliance requirements (healthcare, finance) or when a single tenant's data volume justifies dedicated infrastructure.

How do I prevent one tenant's agent from accidentally accessing another tenant's tools?

Load the tool configuration per-tenant at request time and only register the tools that tenant has enabled. Never use a global tool registry shared across tenants. If tools access external APIs, use tenant-specific API keys stored encrypted in the database.

What happens when a tenant exceeds their quota?

Return a 429 status code with a Retry-After header indicating when they can resume. For soft limits (approaching the monthly budget), send a notification to the tenant admin and optionally downgrade to a cheaper model rather than hard-blocking. For hard limits (daily rate limits), block immediately to protect infrastructure.


#MultiTenant #AIAgents #PlatformEngineering #TenantIsolation #SaaS #DataSegregation #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.