AI Gateway Patterns: Centralizing LLM Access Across Your Organization
Learn how to build and deploy an AI gateway that centralizes LLM access with unified authentication, rate limiting, cost tracking, and provider abstraction for enterprise teams.
The Problem: LLM Sprawl
As organizations adopt AI across teams, a familiar pattern emerges: each team creates its own API keys, builds its own prompt pipelines, and integrates directly with LLM providers. Within months, the organization faces:
- No cost visibility: Nobody knows who is spending what on which models
- Inconsistent security: API keys scattered across repositories and environment variables
- No rate limiting: One team's burst traffic affects everyone's rate limits
- Provider lock-in: Every team is tightly coupled to a specific provider
- No audit trail: No centralized log of what data is being sent to LLMs
An AI gateway solves all of these problems by providing a single, managed entry point for all LLM traffic across the organization.
AI Gateway Architecture
An AI gateway sits between your applications and LLM providers, acting as a reverse proxy with AI-specific capabilities:
[Team A App] ---+
|
[Team B App] ---+---> [AI Gateway] ---> [Anthropic API]
| | +-> [OpenAI API]
[Team C App] ---+ | +-> [Self-hosted Models]
|
[Gateway Features]
- Authentication
- Rate Limiting
- Cost Tracking
- Logging & Audit
- Caching
- Fallback/Routing
Core Components
from fastapi import FastAPI, Request, Depends, HTTPException
from fastapi.middleware.cors import CORSMiddleware
import httpx
import time
import json
from typing import Optional
app = FastAPI(title="AI Gateway")
# ─── Authentication ───
async def authenticate(request: Request) -> dict:
"""Validate team API key and return team context."""
api_key = request.headers.get("X-Gateway-Key")
if not api_key:
raise HTTPException(status_code=401, detail="Missing X-Gateway-Key header")
team = await get_team_by_key(api_key)
if not team:
raise HTTPException(status_code=401, detail="Invalid API key")
return team
# ─── Rate Limiting ───
class RateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
async def check_rate_limit(self, team_id: str, model: str) -> bool:
"""Check if team is within their rate limits."""
key = f"rate:{team_id}:{model}:{int(time.time()) // 60}"
current = await self.redis.incr(key)
await self.redis.expire(key, 120) # 2 minute TTL
limit = await self.get_team_limit(team_id, model)
if current > limit:
return False
return True
async def get_team_limit(self, team_id: str, model: str) -> int:
"""Get per-minute rate limit for team and model."""
limits = await self.redis.hget(f"limits:{team_id}", model)
return int(limits) if limits else 60 # default: 60 RPM
# ─── Cost Tracking ───
class CostTracker:
PRICING = {
"claude-sonnet-4-20250514": {"input": 3.0, "output": 15.0},
"claude-haiku-3-5-20241022": {"input": 0.8, "output": 4.0},
"gpt-4o": {"input": 2.5, "output": 10.0},
}
async def record_usage(self, team_id: str, model: str,
input_tokens: int, output_tokens: int):
"""Record token usage and calculate cost."""
pricing = self.PRICING.get(model, {"input": 0, "output": 0})
cost = (input_tokens / 1_000_000 * pricing["input"] +
output_tokens / 1_000_000 * pricing["output"])
await db.execute(
"""INSERT INTO usage_log (team_id, model, input_tokens, output_tokens,
cost_usd, timestamp) VALUES ($1, $2, $3, $4, $5, NOW())""",
team_id, model, input_tokens, output_tokens, cost
)
return cost
# ─── Provider Routing ───
class ProviderRouter:
"""Route requests to the appropriate LLM provider."""
PROVIDER_MAP = {
"claude-sonnet-4-20250514": {
"provider": "anthropic",
"url": "https://api.anthropic.com/v1/messages",
},
"claude-haiku-3-5-20241022": {
"provider": "anthropic",
"url": "https://api.anthropic.com/v1/messages",
},
"gpt-4o": {
"provider": "openai",
"url": "https://api.openai.com/v1/chat/completions",
},
}
async def route(self, model: str, request_body: dict) -> dict:
config = self.PROVIDER_MAP.get(model)
if not config:
raise HTTPException(status_code=400, detail=f"Unsupported model: {model}")
if config["provider"] == "anthropic":
return await self._call_anthropic(config["url"], request_body)
elif config["provider"] == "openai":
return await self._call_openai(config["url"], request_body)
async def _call_anthropic(self, url: str, body: dict) -> dict:
async with httpx.AsyncClient() as client:
response = await client.post(
url,
headers={
"x-api-key": os.environ["ANTHROPIC_API_KEY"],
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
json=body,
timeout=120.0,
)
response.raise_for_status()
return response.json()
The Main Gateway Endpoint
@app.post("/v1/chat")
async def gateway_chat(request: Request, team: dict = Depends(authenticate)):
"""Main gateway endpoint -- proxies to the appropriate LLM provider."""
body = await request.json()
model = body.get("model", "claude-sonnet-4-20250514")
# Rate limiting
if not await rate_limiter.check_rate_limit(team["id"], model):
raise HTTPException(status_code=429, detail="Rate limit exceeded")
# Budget check
if team.get("monthly_budget"):
current_spend = await cost_tracker.get_monthly_spend(team["id"])
if current_spend >= team["monthly_budget"]:
raise HTTPException(status_code=402, detail="Monthly budget exhausted")
# Audit logging (before sending to provider)
request_id = str(uuid.uuid4())
await audit_logger.log_request(request_id, team["id"], model, body)
# PII detection (optional)
if team.get("pii_detection_enabled"):
pii_findings = await detect_pii(body)
if pii_findings:
await audit_logger.log_pii_warning(request_id, pii_findings)
# Route to provider
start_time = time.time()
response = await router.route(model, body)
latency = time.time() - start_time
# Track costs
input_tokens = response.get("usage", {}).get("input_tokens", 0)
output_tokens = response.get("usage", {}).get("output_tokens", 0)
cost = await cost_tracker.record_usage(team["id"], model, input_tokens, output_tokens)
# Audit logging (after response)
await audit_logger.log_response(request_id, latency, input_tokens, output_tokens, cost)
# Add gateway metadata to response
response["_gateway"] = {
"request_id": request_id,
"latency_ms": round(latency * 1000),
"cost_usd": round(cost, 6),
}
return response
Fallback and Resilience Patterns
A critical gateway feature is automatic failover when a provider experiences issues:
class ResilientRouter:
"""Router with automatic failover between providers."""
FALLBACK_CHAIN = {
"claude-sonnet-4-20250514": ["claude-sonnet-4-20250514", "gpt-4o"],
"gpt-4o": ["gpt-4o", "claude-sonnet-4-20250514"],
}
async def route_with_fallback(self, model: str, body: dict) -> dict:
chain = self.FALLBACK_CHAIN.get(model, [model])
last_error = None
for fallback_model in chain:
try:
response = await self.route(fallback_model, body)
if fallback_model != model:
logger.warning(f"Used fallback model {fallback_model} for {model}")
return response
except (httpx.TimeoutException, httpx.HTTPStatusError) as e:
last_error = e
logger.error(f"Provider error for {fallback_model}: {e}")
continue
raise HTTPException(status_code=502, detail=f"All providers failed: {last_error}")
Self-Service Admin Dashboard
The gateway should include an admin API for teams to manage their own usage:
@app.get("/admin/usage")
async def get_usage(team: dict = Depends(authenticate)):
"""Get team's usage statistics."""
return {
"team_id": team["id"],
"current_month": {
"total_requests": await cost_tracker.get_monthly_requests(team["id"]),
"total_tokens": await cost_tracker.get_monthly_tokens(team["id"]),
"total_cost_usd": await cost_tracker.get_monthly_spend(team["id"]),
"budget_remaining": team.get("monthly_budget", 0) -
await cost_tracker.get_monthly_spend(team["id"]),
},
"by_model": await cost_tracker.get_monthly_breakdown_by_model(team["id"]),
"daily_trend": await cost_tracker.get_daily_trend(team["id"], days=30),
}
@app.get("/admin/audit-log")
async def get_audit_log(team: dict = Depends(authenticate),
limit: int = 100, offset: int = 0):
"""Get team's audit log."""
return await audit_logger.get_team_logs(team["id"], limit, offset)
Open Source AI Gateway Options
Several open-source projects provide AI gateway functionality out of the box:
| Project | Language | Key Features |
|---|---|---|
| LiteLLM Proxy | Python | 100+ LLM providers, cost tracking, key management |
| Portkey | TypeScript | Caching, fallbacks, load balancing, observability |
| Kong AI Gateway | Lua/Go | Enterprise API gateway with AI plugins |
| Cloudflare AI Gateway | Managed | Caching, rate limiting, analytics |
For most teams, starting with LiteLLM Proxy is the fastest path to a functional AI gateway:
# litellm-config.yaml
model_list:
- model_name: claude-sonnet
litellm_params:
model: claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
general_settings:
master_key: sk-gateway-master-key
database_url: postgresql://user:pass@localhost:5432/litellm
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
Conclusion
An AI gateway is essential infrastructure for any organization running multiple LLM-powered applications. It provides unified authentication, cost visibility, rate limiting, audit logging, and provider abstraction -- all of which become critical as AI adoption scales. Start with an open-source solution like LiteLLM, customize as your needs grow, and make the gateway the single path for all LLM traffic in your organization.
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.