Skip to content
Technology10 min read0 views

Building Multi-Tenant Agentic AI Platforms: Architecture and Tenant Isolation

Architect multi-tenant agentic AI platforms with tenant isolation, per-tenant agent config, usage metering, and white-label agent interfaces.

The Multi-Tenant Challenge for AI Platforms

Building an agentic AI product for a single customer is hard. Building a platform that serves many customers — each with different agent configurations, data, compliance requirements, and usage patterns — is an order of magnitude harder.

Multi-tenancy for AI platforms introduces challenges beyond traditional SaaS because agent systems involve sensitive conversation data (tenant conversations must never leak across tenants), per-tenant customization (different tenants need different agent behaviors, tools, and knowledge bases), unpredictable resource consumption (one tenant's complex agent workflow can consume disproportionate LLM tokens), and compliance isolation (tenants in regulated industries may require complete data separation).

This guide covers the architectural decisions, isolation patterns, configuration management, and operational considerations for building multi-tenant agentic AI platforms. CallSphere's own multi-vertical platform — serving healthcare, real estate, and IT helpdesk customers — provides real-world context for these patterns.

Tenant Isolation Strategies

The fundamental architectural decision is how to isolate tenant data. Three models exist, each with different tradeoffs.

Database-Level Isolation

Each tenant gets their own database (or database schema). This provides the strongest isolation but highest operational overhead.

class DatabasePerTenantManager:
    def __init__(self, admin_db_url: str):
        self.admin_engine = create_async_engine(admin_db_url)
        self.tenant_engines: dict[str, AsyncEngine] = {}

    async def get_tenant_engine(self, tenant_id: str) -> AsyncEngine:
        if tenant_id not in self.tenant_engines:
            # Look up tenant database URL from admin DB
            async with self.admin_engine.begin() as conn:
                result = await conn.execute(
                    text("SELECT db_url FROM tenants WHERE id = :id"),
                    {"id": tenant_id},
                )
                row = result.fetchone()
                if not row:
                    raise TenantNotFoundError(tenant_id)

                self.tenant_engines[tenant_id] = create_async_engine(
                    row.db_url,
                    pool_size=5,
                    max_overflow=10,
                )

        return self.tenant_engines[tenant_id]

    async def create_tenant_database(self, tenant_id: str) -> str:
        db_name = f"tenant_{tenant_id}"
        async with self.admin_engine.begin() as conn:
            await conn.execute(text(f"CREATE DATABASE {db_name}"))

        db_url = f"postgresql+asyncpg://agent_user:password@db-host:5432/{db_name}"

        # Run migrations on new database
        await self.run_migrations(db_url)

        # Register in admin DB
        async with self.admin_engine.begin() as conn:
            await conn.execute(
                text("INSERT INTO tenants (id, db_url) VALUES (:id, :url)"),
                {"id": tenant_id, "url": db_url},
            )

        return db_url

Best for: Regulated industries (healthcare, finance) where data isolation is a compliance requirement, or tenants with very different data volumes.

Row-Level Isolation

All tenants share a single database. Every table includes a tenant_id column, and every query is filtered by tenant_id. This is operationally simpler but requires discipline to prevent data leaks.

from sqlalchemy import Column, String, event
from sqlalchemy.orm import Session

class TenantMixin:
    """Mixin that adds tenant_id to any model."""
    tenant_id = Column(String, nullable=False, index=True)

class Conversation(Base, TenantMixin):
    __tablename__ = "conversations"
    id = Column(String, primary_key=True)
    user_id = Column(String, nullable=False)
    messages = Column(JSON, default=[])
    created_at = Column(DateTime, default=datetime.utcnow)

# Automatic tenant filtering using SQLAlchemy events
class TenantQueryFilter:
    def __init__(self, session_factory):
        self.session_factory = session_factory

    def get_session(self, tenant_id: str) -> Session:
        session = self.session_factory()
        # Store tenant_id on session for automatic filtering
        session.info["tenant_id"] = tenant_id
        return session

# Middleware that enforces tenant context
class TenantMiddleware:
    async def __call__(self, request, call_next):
        tenant_id = self.extract_tenant_id(request)
        if not tenant_id:
            return JSONResponse(
                status_code=401,
                content={"error": "Tenant identification required"},
            )

        # Set tenant context for this request
        request.state.tenant_id = tenant_id
        response = await call_next(request)
        return response

    def extract_tenant_id(self, request) -> str | None:
        # From subdomain: acme.platform.com -> acme
        host = request.headers.get("host", "")
        parts = host.split(".")
        if len(parts) >= 3:
            return parts[0]

        # From header
        return request.headers.get("X-Tenant-ID")

Best for: Most SaaS platforms where tenants have similar data structures and compliance requirements are standard.

Hybrid Isolation

Combine both approaches: most tenants share a database with row-level isolation, while high-value or regulated tenants get dedicated databases. This balances operational simplicity with compliance needs.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

class HybridTenantRouter:
    def __init__(self, shared_engine, dedicated_engines: dict):
        self.shared_engine = shared_engine
        self.dedicated_engines = dedicated_engines

    async def get_engine(self, tenant_id: str) -> AsyncEngine:
        if tenant_id in self.dedicated_engines:
            return self.dedicated_engines[tenant_id]
        return self.shared_engine

Per-Tenant Agent Configuration

Each tenant needs customized agent behavior — different system prompts, tool sets, knowledge bases, and business rules.

Configuration Schema

class TenantAgentConfig(BaseModel):
    tenant_id: str
    agent_type: str  # "customer_support", "sales", "helpdesk"

    # Agent personality
    system_prompt_template: str
    agent_name: str = "Assistant"
    tone: Literal["professional", "friendly", "casual"] = "professional"
    language: str = "en"

    # Capabilities
    enabled_tools: list[str] = []
    max_tool_calls_per_turn: int = 5
    knowledge_base_ids: list[str] = []

    # LLM configuration
    model_provider: str = "openai"
    model_name: str = "gpt-4o"
    temperature: float = 0.3
    max_tokens: int = 2048

    # Business rules
    escalation_keywords: list[str] = []
    business_hours: dict | None = None
    max_conversation_turns: int = 50

    # Branding
    greeting_message: str = "Hello! How can I help you today?"
    fallback_message: str = "I'm sorry, I couldn't help with that. Let me connect you with a human agent."

class TenantConfigStore:
    def __init__(self, db):
        self.db = db
        self._cache: dict[str, TenantAgentConfig] = {}

    async def get_config(
        self,
        tenant_id: str,
        agent_type: str
    ) -> TenantAgentConfig:
        cache_key = f"{tenant_id}:{agent_type}"
        if cache_key in self._cache:
            return self._cache[cache_key]

        row = await self.db.query_one(
            "SELECT config FROM tenant_agent_configs "
            "WHERE tenant_id = $1 AND agent_type = $2",
            tenant_id, agent_type,
        )

        config = TenantAgentConfig(**row["config"])
        self._cache[cache_key] = config
        return config

Dynamic Agent Assembly

At runtime, assemble the agent using the tenant's configuration.

class TenantAgentFactory:
    def __init__(self, config_store, tool_registry, knowledge_store):
        self.config_store = config_store
        self.tool_registry = tool_registry
        self.knowledge_store = knowledge_store

    async def create_agent(
        self,
        tenant_id: str,
        agent_type: str,
        session_context: dict,
    ) -> Agent:
        config = await self.config_store.get_config(tenant_id, agent_type)

        # Build system prompt from template
        system_prompt = config.system_prompt_template.format(
            agent_name=config.agent_name,
            tenant_name=session_context.get("tenant_name", ""),
            business_rules=await self.get_business_rules(tenant_id),
        )

        # Get tenant-specific tools
        tools = self.tool_registry.get_tools(
            tool_names=config.enabled_tools,
            tenant_context={"tenant_id": tenant_id},
        )

        # Get tenant knowledge base
        knowledge_base = None
        if config.knowledge_base_ids:
            knowledge_base = await self.knowledge_store.get_merged(
                config.knowledge_base_ids
            )

        return Agent(
            system_prompt=system_prompt,
            tools=tools,
            knowledge_base=knowledge_base,
            model=config.model_name,
            temperature=config.temperature,
            max_tokens=config.max_tokens,
        )

Usage Metering and Rate Limiting

Multi-tenant platforms must track and limit resource consumption per tenant. LLM tokens are the primary cost driver, but tool executions, storage, and API calls also need metering.

Token Usage Tracking

class UsageMeter:
    def __init__(self, db, redis):
        self.db = db
        self.redis = redis

    async def record_llm_usage(
        self,
        tenant_id: str,
        model: str,
        input_tokens: int,
        output_tokens: int,
    ):
        # Real-time counter in Redis for rate limiting
        period = datetime.utcnow().strftime("%Y-%m-%d-%H")
        key = f"usage:{tenant_id}:{period}"
        total_tokens = input_tokens + output_tokens
        await self.redis.incrby(key, total_tokens)
        await self.redis.expire(key, 86400 * 7)  # 7-day TTL

        # Persistent storage for billing
        await self.db.execute(
            "INSERT INTO usage_records "
            "(tenant_id, model, input_tokens, output_tokens, timestamp) "
            "VALUES ($1, $2, $3, $4, $5)",
            tenant_id, model, input_tokens, output_tokens,
            datetime.utcnow(),
        )

    async def check_rate_limit(
        self,
        tenant_id: str,
        limit_tokens_per_hour: int,
    ) -> bool:
        period = datetime.utcnow().strftime("%Y-%m-%d-%H")
        key = f"usage:{tenant_id}:{period}"
        current = int(await self.redis.get(key) or 0)
        return current < limit_tokens_per_hour

Tiered Rate Limits

TIER_LIMITS = {
    "free": {
        "tokens_per_hour": 50_000,
        "tokens_per_month": 500_000,
        "max_concurrent_sessions": 5,
        "models_allowed": ["gpt-4o-mini"],
    },
    "pro": {
        "tokens_per_hour": 500_000,
        "tokens_per_month": 10_000_000,
        "max_concurrent_sessions": 50,
        "models_allowed": ["gpt-4o-mini", "gpt-4o", "claude-sonnet-4-20250514"],
    },
    "enterprise": {
        "tokens_per_hour": 5_000_000,
        "tokens_per_month": 100_000_000,
        "max_concurrent_sessions": 500,
        "models_allowed": ["gpt-4o-mini", "gpt-4o", "claude-sonnet-4-20250514", "claude-opus-4-20250514"],
    },
}

Custom Model Routing Per Tenant

Different tenants may need different LLM providers based on compliance, cost, or performance requirements. A tenant in the EU might require a model hosted in Europe. A cost-sensitive tenant might prefer a smaller, cheaper model.

class TenantModelRouter:
    def __init__(self, providers: dict):
        self.providers = providers  # provider_name -> client

    async def route(
        self,
        tenant_id: str,
        config: TenantAgentConfig,
        messages: list[dict],
        **kwargs,
    ) -> str:
        provider = self.providers.get(config.model_provider)
        if not provider:
            raise ValueError(
                f"Provider {config.model_provider} not configured"
            )

        # Check model access against tenant tier
        tier = await self.get_tenant_tier(tenant_id)
        allowed_models = TIER_LIMITS[tier]["models_allowed"]
        if config.model_name not in allowed_models:
            # Fall back to the best allowed model
            fallback_model = allowed_models[-1]
            logger.warning(
                f"Tenant {tenant_id} requested {config.model_name} "
                f"but tier {tier} only allows {allowed_models}. "
                f"Using {fallback_model}."
            )
            model = fallback_model
        else:
            model = config.model_name

        return await provider.chat(
            model=model,
            messages=messages,
            **kwargs,
        )

Data Isolation for Knowledge Bases

Each tenant's knowledge base must be completely isolated. In vector databases, use namespaces or separate collections per tenant.

class TenantKnowledgeBase:
    def __init__(self, vector_store):
        self.vector_store = vector_store

    async def search(
        self,
        tenant_id: str,
        query_embedding: list[float],
        top_k: int = 5,
    ) -> list[dict]:
        return await self.vector_store.query(
            vector=query_embedding,
            top_k=top_k,
            namespace=f"tenant_{tenant_id}",
        )

    async def ingest_document(
        self,
        tenant_id: str,
        document_id: str,
        chunks: list[dict],
        embeddings: list[list[float]],
    ):
        vectors = [
            {
                "id": f"{tenant_id}_{document_id}_{i}",
                "values": embedding,
                "metadata": {
                    "tenant_id": tenant_id,
                    "document_id": document_id,
                    **chunk["metadata"],
                    "content": chunk["content"],
                },
            }
            for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))
        ]

        await self.vector_store.upsert(
            vectors=vectors,
            namespace=f"tenant_{tenant_id}",
        )

White-Labeling Agent Interfaces

Tenants often want the agent to appear as their own product, not yours. White-labeling involves customizing the agent's visual appearance, domain, and branding.

Configuration-Driven Theming

class TenantBranding(BaseModel):
    tenant_id: str
    display_name: str
    logo_url: str | None = None
    primary_color: str = "#0066cc"
    secondary_color: str = "#f5f5f5"
    font_family: str = "Inter, sans-serif"
    custom_css: str | None = None
    custom_domain: str | None = None
    favicon_url: str | None = None
    chat_widget_position: Literal["bottom-right", "bottom-left"] = "bottom-right"
    powered_by_visible: bool = True

# API endpoint for client to fetch branding
@app.get("/api/branding/{tenant_id}")
async def get_branding(tenant_id: str):
    config = await branding_store.get(tenant_id)
    return config.model_dump(exclude={"tenant_id"})

Custom Domain Routing

class CustomDomainRouter:
    def __init__(self, db):
        self.db = db
        self._domain_map: dict[str, str] = {}

    async def resolve_tenant(self, hostname: str) -> str | None:
        if hostname in self._domain_map:
            return self._domain_map[hostname]

        tenant = await self.db.query_one(
            "SELECT tenant_id FROM custom_domains WHERE domain = $1",
            hostname,
        )
        if tenant:
            self._domain_map[hostname] = tenant["tenant_id"]
            return tenant["tenant_id"]

        return None

CallSphere's Multi-Vertical Architecture

CallSphere's platform serves multiple industry verticals — healthcare, real estate, and IT helpdesk — from a shared infrastructure. Each vertical is effectively a tenant type with specialized agent configurations, tools, and knowledge bases. The healthcare vertical uses HIPAA-compliant data isolation with dedicated databases per clinic. The real estate vertical shares a database with row-level isolation per brokerage. The IT helpdesk vertical uses namespace-based knowledge base isolation within a shared vector store.

This hybrid approach allows the platform to meet diverse compliance requirements while keeping operational overhead manageable. New verticals are onboarded by defining a tenant configuration template, a set of industry-specific tools, and a knowledge base ingestion pipeline.

Frequently Asked Questions

Which tenant isolation model should I start with?

Start with row-level isolation in a shared database. It is the simplest to implement and operate, and it scales well for most use cases. Add database-level isolation as a premium feature for tenants with specific compliance requirements. Do not prematurely over-engineer — row-level isolation with proper access controls is sufficient for the majority of multi-tenant applications.

How do you prevent one tenant from consuming all the resources?

Implement tiered rate limits on LLM tokens per hour and per month, concurrent session limits, and storage quotas. Use Redis for real-time rate limiting and a persistent database for billing and quota enforcement. Monitor resource consumption per tenant and alert on anomalies. For compute-intensive operations (embedding generation, large document ingestion), use tenant-specific job queues with concurrency limits.

How do you handle tenant data deletion and GDPR compliance?

For row-level isolation, delete all rows with the tenant's ID across all tables. For database-level isolation, drop the tenant's database. For vector stores, delete the tenant's namespace. Implement a tenant deletion pipeline that cascades through all data stores, generates a deletion report for compliance documentation, and runs on a scheduled basis after a cooling-off period (typically 30 days after deletion request).

Can different tenants use different LLM models?

Yes, and this is a common requirement. Store the model preference in the tenant configuration and route requests accordingly. Ensure your prompts work across all supported models, or maintain provider-specific prompt variants. Tier model access based on the tenant's subscription level — budget models for free tier, premium models for paid tiers.

How do you test multi-tenant agent systems?

Test at three levels: unit tests verify tenant isolation logic (queries always include tenant_id filters), integration tests verify that one tenant cannot access another tenant's data or configurations, and load tests verify that high-usage tenants do not degrade performance for others. Create dedicated test tenants with known data for automated testing and never use production tenant data in test environments.

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.