Skip to content
Back to Blog
Agentic AI6 min read

AI Gateway Patterns: Centralizing LLM Access Across Your Organization

Learn how to build and deploy an AI gateway that centralizes LLM access with unified authentication, rate limiting, cost tracking, and provider abstraction for enterprise teams.

The Problem: LLM Sprawl

As organizations adopt AI across teams, a familiar pattern emerges: each team creates its own API keys, builds its own prompt pipelines, and integrates directly with LLM providers. Within months, the organization faces:

  • No cost visibility: Nobody knows who is spending what on which models
  • Inconsistent security: API keys scattered across repositories and environment variables
  • No rate limiting: One team's burst traffic affects everyone's rate limits
  • Provider lock-in: Every team is tightly coupled to a specific provider
  • No audit trail: No centralized log of what data is being sent to LLMs

An AI gateway solves all of these problems by providing a single, managed entry point for all LLM traffic across the organization.

AI Gateway Architecture

An AI gateway sits between your applications and LLM providers, acting as a reverse proxy with AI-specific capabilities:

[Team A App] ---+
                |
[Team B App] ---+---> [AI Gateway] ---> [Anthropic API]
                |         |          +-> [OpenAI API]
[Team C App] ---+         |          +-> [Self-hosted Models]
                          |
                    [Gateway Features]
                    - Authentication
                    - Rate Limiting
                    - Cost Tracking
                    - Logging & Audit
                    - Caching
                    - Fallback/Routing

Core Components

from fastapi import FastAPI, Request, Depends, HTTPException
from fastapi.middleware.cors import CORSMiddleware
import httpx
import time
import json
from typing import Optional

app = FastAPI(title="AI Gateway")

# ─── Authentication ───
async def authenticate(request: Request) -> dict:
    """Validate team API key and return team context."""
    api_key = request.headers.get("X-Gateway-Key")
    if not api_key:
        raise HTTPException(status_code=401, detail="Missing X-Gateway-Key header")

    team = await get_team_by_key(api_key)
    if not team:
        raise HTTPException(status_code=401, detail="Invalid API key")

    return team

# ─── Rate Limiting ───
class RateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def check_rate_limit(self, team_id: str, model: str) -> bool:
        """Check if team is within their rate limits."""
        key = f"rate:{team_id}:{model}:{int(time.time()) // 60}"
        current = await self.redis.incr(key)
        await self.redis.expire(key, 120)  # 2 minute TTL

        limit = await self.get_team_limit(team_id, model)
        if current > limit:
            return False
        return True

    async def get_team_limit(self, team_id: str, model: str) -> int:
        """Get per-minute rate limit for team and model."""
        limits = await self.redis.hget(f"limits:{team_id}", model)
        return int(limits) if limits else 60  # default: 60 RPM

# ─── Cost Tracking ───
class CostTracker:
    PRICING = {
        "claude-sonnet-4-20250514": {"input": 3.0, "output": 15.0},
        "claude-haiku-3-5-20241022": {"input": 0.8, "output": 4.0},
        "gpt-4o": {"input": 2.5, "output": 10.0},
    }

    async def record_usage(self, team_id: str, model: str,
                           input_tokens: int, output_tokens: int):
        """Record token usage and calculate cost."""
        pricing = self.PRICING.get(model, {"input": 0, "output": 0})
        cost = (input_tokens / 1_000_000 * pricing["input"] +
                output_tokens / 1_000_000 * pricing["output"])

        await db.execute(
            """INSERT INTO usage_log (team_id, model, input_tokens, output_tokens,
               cost_usd, timestamp) VALUES ($1, $2, $3, $4, $5, NOW())""",
            team_id, model, input_tokens, output_tokens, cost
        )
        return cost

# ─── Provider Routing ───
class ProviderRouter:
    """Route requests to the appropriate LLM provider."""

    PROVIDER_MAP = {
        "claude-sonnet-4-20250514": {
            "provider": "anthropic",
            "url": "https://api.anthropic.com/v1/messages",
        },
        "claude-haiku-3-5-20241022": {
            "provider": "anthropic",
            "url": "https://api.anthropic.com/v1/messages",
        },
        "gpt-4o": {
            "provider": "openai",
            "url": "https://api.openai.com/v1/chat/completions",
        },
    }

    async def route(self, model: str, request_body: dict) -> dict:
        config = self.PROVIDER_MAP.get(model)
        if not config:
            raise HTTPException(status_code=400, detail=f"Unsupported model: {model}")

        if config["provider"] == "anthropic":
            return await self._call_anthropic(config["url"], request_body)
        elif config["provider"] == "openai":
            return await self._call_openai(config["url"], request_body)

    async def _call_anthropic(self, url: str, body: dict) -> dict:
        async with httpx.AsyncClient() as client:
            response = await client.post(
                url,
                headers={
                    "x-api-key": os.environ["ANTHROPIC_API_KEY"],
                    "anthropic-version": "2023-06-01",
                    "content-type": "application/json",
                },
                json=body,
                timeout=120.0,
            )
            response.raise_for_status()
            return response.json()

The Main Gateway Endpoint

@app.post("/v1/chat")
async def gateway_chat(request: Request, team: dict = Depends(authenticate)):
    """Main gateway endpoint -- proxies to the appropriate LLM provider."""
    body = await request.json()
    model = body.get("model", "claude-sonnet-4-20250514")

    # Rate limiting
    if not await rate_limiter.check_rate_limit(team["id"], model):
        raise HTTPException(status_code=429, detail="Rate limit exceeded")

    # Budget check
    if team.get("monthly_budget"):
        current_spend = await cost_tracker.get_monthly_spend(team["id"])
        if current_spend >= team["monthly_budget"]:
            raise HTTPException(status_code=402, detail="Monthly budget exhausted")

    # Audit logging (before sending to provider)
    request_id = str(uuid.uuid4())
    await audit_logger.log_request(request_id, team["id"], model, body)

    # PII detection (optional)
    if team.get("pii_detection_enabled"):
        pii_findings = await detect_pii(body)
        if pii_findings:
            await audit_logger.log_pii_warning(request_id, pii_findings)

    # Route to provider
    start_time = time.time()
    response = await router.route(model, body)
    latency = time.time() - start_time

    # Track costs
    input_tokens = response.get("usage", {}).get("input_tokens", 0)
    output_tokens = response.get("usage", {}).get("output_tokens", 0)
    cost = await cost_tracker.record_usage(team["id"], model, input_tokens, output_tokens)

    # Audit logging (after response)
    await audit_logger.log_response(request_id, latency, input_tokens, output_tokens, cost)

    # Add gateway metadata to response
    response["_gateway"] = {
        "request_id": request_id,
        "latency_ms": round(latency * 1000),
        "cost_usd": round(cost, 6),
    }

    return response

Fallback and Resilience Patterns

A critical gateway feature is automatic failover when a provider experiences issues:

class ResilientRouter:
    """Router with automatic failover between providers."""

    FALLBACK_CHAIN = {
        "claude-sonnet-4-20250514": ["claude-sonnet-4-20250514", "gpt-4o"],
        "gpt-4o": ["gpt-4o", "claude-sonnet-4-20250514"],
    }

    async def route_with_fallback(self, model: str, body: dict) -> dict:
        chain = self.FALLBACK_CHAIN.get(model, [model])
        last_error = None

        for fallback_model in chain:
            try:
                response = await self.route(fallback_model, body)
                if fallback_model != model:
                    logger.warning(f"Used fallback model {fallback_model} for {model}")
                return response
            except (httpx.TimeoutException, httpx.HTTPStatusError) as e:
                last_error = e
                logger.error(f"Provider error for {fallback_model}: {e}")
                continue

        raise HTTPException(status_code=502, detail=f"All providers failed: {last_error}")

Self-Service Admin Dashboard

The gateway should include an admin API for teams to manage their own usage:

@app.get("/admin/usage")
async def get_usage(team: dict = Depends(authenticate)):
    """Get team's usage statistics."""
    return {
        "team_id": team["id"],
        "current_month": {
            "total_requests": await cost_tracker.get_monthly_requests(team["id"]),
            "total_tokens": await cost_tracker.get_monthly_tokens(team["id"]),
            "total_cost_usd": await cost_tracker.get_monthly_spend(team["id"]),
            "budget_remaining": team.get("monthly_budget", 0) -
                               await cost_tracker.get_monthly_spend(team["id"]),
        },
        "by_model": await cost_tracker.get_monthly_breakdown_by_model(team["id"]),
        "daily_trend": await cost_tracker.get_daily_trend(team["id"], days=30),
    }

@app.get("/admin/audit-log")
async def get_audit_log(team: dict = Depends(authenticate),
                         limit: int = 100, offset: int = 0):
    """Get team's audit log."""
    return await audit_logger.get_team_logs(team["id"], limit, offset)

Open Source AI Gateway Options

Several open-source projects provide AI gateway functionality out of the box:

Project Language Key Features
LiteLLM Proxy Python 100+ LLM providers, cost tracking, key management
Portkey TypeScript Caching, fallbacks, load balancing, observability
Kong AI Gateway Lua/Go Enterprise API gateway with AI plugins
Cloudflare AI Gateway Managed Caching, rate limiting, analytics

For most teams, starting with LiteLLM Proxy is the fastest path to a functional AI gateway:

# litellm-config.yaml
model_list:
  - model_name: claude-sonnet
    litellm_params:
      model: claude-sonnet-4-20250514
      api_key: os.environ/ANTHROPIC_API_KEY
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY

general_settings:
  master_key: sk-gateway-master-key
  database_url: postgresql://user:pass@localhost:5432/litellm

litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379

Conclusion

An AI gateway is essential infrastructure for any organization running multiple LLM-powered applications. It provides unified authentication, cost visibility, rate limiting, audit logging, and provider abstraction -- all of which become critical as AI adoption scales. Start with an open-source solution like LiteLLM, customize as your needs grow, and make the gateway the single path for all LLM traffic in your organization.

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.