Secure API Gateway for AI Agents: Kong, Traefik, and Custom Gateway Patterns

Why AI Agent Platforms Need an API Gateway

An API gateway is a single entry point that sits in front of your AI agent services and handles cross-cutting concerns: authentication, rate limiting, request routing, logging, and protocol translation. Without a gateway, every agent service must independently implement these concerns, leading to inconsistency and duplicated security logic.

For AI agent platforms specifically, a gateway provides three critical capabilities: it enforces rate limits to prevent a single tenant from exhausting GPU resources, it routes requests to different agent versions for A/B testing, and it transforms requests between the public API format and the internal service format.

Gateway Architecture for Multi-Agent Systems

A typical architecture places the gateway between the public internet and your internal agent services:

Client --> API Gateway --> Triage Agent --> Research Agent
                      --> Tool Executor
                      --> Conversation Service
                      --> Billing Service

The gateway handles TLS termination, authentication, rate limiting, and routing. Internal services communicate via mTLS or service tokens as discussed in previous posts.

Kong Gateway Configuration

Kong is a widely deployed API gateway with a rich plugin ecosystem. Configure it for an AI agent platform using its declarative YAML format:

# kong.yml
_format_version: "3.0"

services:
  - name: agent-api
    url: http://agent-service:8000
    routes:
      - name: agent-routes
        paths:
          - /api/agents
        strip_path: false
    plugins:
      - name: jwt
        config:
          claims_to_verify:
            - exp
          header_names:
            - Authorization
      - name: rate-limiting
        config:
          minute: 60
          hour: 1000
          policy: redis
          redis_host: redis
          redis_port: 6379
      - name: request-transformer
        config:
          add:
            headers:
              - "X-Gateway-Request-Id:$(uuid())"
              - "X-Gateway-Timestamp:$(now())"
      - name: cors
        config:
          origins:
            - "https://app.example.com"
          methods:
            - GET
            - POST
            - PUT
            - DELETE
          headers:
            - Authorization
            - Content-Type
            - X-Session-Id
          max_age: 3600

Traefik Configuration for Kubernetes

Traefik integrates natively with Kubernetes through IngressRoute custom resources, making it a natural choice for agent platforms running on K8s:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

# traefik-ingress.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: agent-api
  namespace: ai-agents
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.agents.example.com`) && PathPrefix(`/api/agents`)
      kind: Rule
      services:
        - name: agent-service
          port: 8000
      middlewares:
        - name: agent-auth
        - name: agent-rate-limit
        - name: agent-headers
  tls:
    certResolver: letsencrypt
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-rate-limit
  namespace: ai-agents
spec:
  rateLimit:
    average: 60
    burst: 20
    period: 1m
    sourceCriterion:
      requestHeaderName: X-API-Key
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-headers
  namespace: ai-agents
spec:
  headers:
    customRequestHeaders:
      X-Gateway: "traefik"
    customResponseHeaders:
      X-Content-Type-Options: "nosniff"
      X-Frame-Options: "DENY"
      Strict-Transport-Security: "max-age=31536000; includeSubDomains"

Building a Custom FastAPI Gateway

For full control, build a lightweight gateway directly in FastAPI. This is ideal when your routing logic depends on request content (like routing to different agent versions based on the model parameter):

# gateway/main.py
import time
import uuid
import httpx
from fastapi import FastAPI, Request, HTTPException, Depends
from fastapi.responses import StreamingResponse

app = FastAPI(title="Agent API Gateway")

# Service registry
SERVICES = {
    "agents": "http://agent-service:8000",
    "tools": "http://tool-service:8001",
    "conversations": "http://conversation-service:8002",
}


@app.middleware("http")
async def gateway_middleware(request: Request, call_next):
    # Add request tracking headers
    request_id = str(uuid.uuid4())
    start_time = time.time()

    response = await call_next(request)

    # Add response headers
    duration_ms = (time.time() - start_time) * 1000
    response.headers["X-Request-Id"] = request_id
    response.headers["X-Response-Time-Ms"] = f"{duration_ms:.2f}"
    return response

Content-Based Routing

Route requests to different backend services based on the request body. This is useful for directing agent execution requests to specialized model servers:

@app.post("/api/agents/execute")
async def route_agent_execution(
    request: Request,
    user: TokenPayload = Depends(get_current_user),
):
    body = await request.json()
    model = body.get("model", "default")

    # Route to different backends based on model
    routing_table = {
        "gpt-4": "http://openai-agent-service:8000",
        "claude-3": "http://anthropic-agent-service:8000",
        "local-llama": "http://local-agent-service:8000",
        "default": SERVICES["agents"],
    }

    target_url = routing_table.get(model, routing_table["default"])

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{target_url}/api/agents/execute",
            json=body,
            headers={
                "Authorization": request.headers.get("Authorization"),
                "X-Org-Id": user.org_id,
                "X-User-Id": user.sub,
            },
            timeout=120.0,
        )

    return response.json()

Gateway-Level Rate Limiting with Redis

Implement tiered rate limiting based on the user's subscription plan:

import redis.asyncio as redis

redis_client = redis.from_url("redis://redis:6379/0")

PLAN_LIMITS = {
    "free": {"rpm": 10, "rpd": 100},
    "pro": {"rpm": 60, "rpd": 5000},
    "enterprise": {"rpm": 300, "rpd": 50000},
}


async def check_rate_limit(user: TokenPayload = Depends(get_current_user)):
    plan = await get_user_plan(user.sub)
    limits = PLAN_LIMITS.get(plan, PLAN_LIMITS["free"])

    minute_key = f"rl:{user.sub}:minute:{int(time.time()) // 60}"
    day_key = f"rl:{user.sub}:day:{int(time.time()) // 86400}"

    pipe = redis_client.pipeline()
    pipe.incr(minute_key)
    pipe.expire(minute_key, 60)
    pipe.incr(day_key)
    pipe.expire(day_key, 86400)
    results = await pipe.execute()

    minute_count = results[0]
    day_count = results[2]

    if minute_count > limits["rpm"]:
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded (per minute)",
            headers={"Retry-After": "60"},
        )
    if day_count > limits["rpd"]:
        raise HTTPException(
            status_code=429,
            detail="Daily rate limit exceeded",
            headers={"Retry-After": "3600"},
        )

FAQ

Should I use Kong, Traefik, or a custom gateway?

Use Kong if you need a mature plugin ecosystem with built-in support for JWT, OAuth2, OIDC, and advanced rate limiting out of the box. Use Traefik if you are on Kubernetes and want auto-discovery of services through ingress annotations. Build a custom FastAPI gateway when you need content-based routing, complex request transformation, or business logic in the gateway layer. Many teams start with Traefik for basic routing and add a thin FastAPI gateway behind it for application-specific logic.

How do I handle streaming responses through a gateway?

AI agent responses often stream via SSE (Server-Sent Events). Your gateway must proxy the response as a stream without buffering the entire body. In a custom FastAPI gateway, use httpx.AsyncClient.stream() and return a StreamingResponse. In Kong and Traefik, disable response buffering for streaming endpoints. Test latency carefully — gateways that buffer before forwarding add significant time-to-first-token latency.

How should I version my AI agent API through the gateway?

Use URL path versioning (/v1/agents, /v2/agents) routed to different backend services. The gateway maintains a routing table that maps version prefixes to the appropriate service version. Support a Sunset response header on deprecated versions to give clients advance notice. Allow enterprise customers to pin to specific versions while gradually migrating the default version for new users.

#APIGateway #Kong #Traefik #FastAPI #AIAgents #RateLimiting #AgenticAI #LearnAI #AIEngineering

Secure API Gateway for AI Agents: Kong, Traefik, and Custom Gateway Patterns

Why AI Agent Platforms Need an API Gateway

Gateway Architecture for Multi-Agent Systems

Kong Gateway Configuration

Traefik Configuration for Kubernetes

Building a Custom FastAPI Gateway

Content-Based Routing

Gateway-Level Rate Limiting with Redis

FAQ

Should I use Kong, Traefik, or a custom gateway?

How do I handle streaming responses through a gateway?

How should I version my AI agent API through the gateway?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding