Skip to content
Learn Agentic AI14 min read0 views

Secure API Gateway for AI Agents: Kong, Traefik, and Custom Gateway Patterns

Set up a secure API gateway for AI agent systems using Kong, Traefik, and custom FastAPI patterns. Covers authentication plugins, rate limiting, request transformation, and routing strategies.

Why AI Agent Platforms Need an API Gateway

An API gateway is a single entry point that sits in front of your AI agent services and handles cross-cutting concerns: authentication, rate limiting, request routing, logging, and protocol translation. Without a gateway, every agent service must independently implement these concerns, leading to inconsistency and duplicated security logic.

For AI agent platforms specifically, a gateway provides three critical capabilities: it enforces rate limits to prevent a single tenant from exhausting GPU resources, it routes requests to different agent versions for A/B testing, and it transforms requests between the public API format and the internal service format.

Gateway Architecture for Multi-Agent Systems

A typical architecture places the gateway between the public internet and your internal agent services:

Client --> API Gateway --> Triage Agent --> Research Agent
                      --> Tool Executor
                      --> Conversation Service
                      --> Billing Service

The gateway handles TLS termination, authentication, rate limiting, and routing. Internal services communicate via mTLS or service tokens as discussed in previous posts.

Kong Gateway Configuration

Kong is a widely deployed API gateway with a rich plugin ecosystem. Configure it for an AI agent platform using its declarative YAML format:

# kong.yml
_format_version: "3.0"

services:
  - name: agent-api
    url: http://agent-service:8000
    routes:
      - name: agent-routes
        paths:
          - /api/agents
        strip_path: false
    plugins:
      - name: jwt
        config:
          claims_to_verify:
            - exp
          header_names:
            - Authorization
      - name: rate-limiting
        config:
          minute: 60
          hour: 1000
          policy: redis
          redis_host: redis
          redis_port: 6379
      - name: request-transformer
        config:
          add:
            headers:
              - "X-Gateway-Request-Id:$(uuid())"
              - "X-Gateway-Timestamp:$(now())"
      - name: cors
        config:
          origins:
            - "https://app.example.com"
          methods:
            - GET
            - POST
            - PUT
            - DELETE
          headers:
            - Authorization
            - Content-Type
            - X-Session-Id
          max_age: 3600

Traefik Configuration for Kubernetes

Traefik integrates natively with Kubernetes through IngressRoute custom resources, making it a natural choice for agent platforms running on K8s:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

# traefik-ingress.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: agent-api
  namespace: ai-agents
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.agents.example.com`) && PathPrefix(`/api/agents`)
      kind: Rule
      services:
        - name: agent-service
          port: 8000
      middlewares:
        - name: agent-auth
        - name: agent-rate-limit
        - name: agent-headers
  tls:
    certResolver: letsencrypt
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-rate-limit
  namespace: ai-agents
spec:
  rateLimit:
    average: 60
    burst: 20
    period: 1m
    sourceCriterion:
      requestHeaderName: X-API-Key
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-headers
  namespace: ai-agents
spec:
  headers:
    customRequestHeaders:
      X-Gateway: "traefik"
    customResponseHeaders:
      X-Content-Type-Options: "nosniff"
      X-Frame-Options: "DENY"
      Strict-Transport-Security: "max-age=31536000; includeSubDomains"

Building a Custom FastAPI Gateway

For full control, build a lightweight gateway directly in FastAPI. This is ideal when your routing logic depends on request content (like routing to different agent versions based on the model parameter):

# gateway/main.py
import time
import uuid
import httpx
from fastapi import FastAPI, Request, HTTPException, Depends
from fastapi.responses import StreamingResponse

app = FastAPI(title="Agent API Gateway")

# Service registry
SERVICES = {
    "agents": "http://agent-service:8000",
    "tools": "http://tool-service:8001",
    "conversations": "http://conversation-service:8002",
}


@app.middleware("http")
async def gateway_middleware(request: Request, call_next):
    # Add request tracking headers
    request_id = str(uuid.uuid4())
    start_time = time.time()

    response = await call_next(request)

    # Add response headers
    duration_ms = (time.time() - start_time) * 1000
    response.headers["X-Request-Id"] = request_id
    response.headers["X-Response-Time-Ms"] = f"{duration_ms:.2f}"
    return response

Content-Based Routing

Route requests to different backend services based on the request body. This is useful for directing agent execution requests to specialized model servers:

@app.post("/api/agents/execute")
async def route_agent_execution(
    request: Request,
    user: TokenPayload = Depends(get_current_user),
):
    body = await request.json()
    model = body.get("model", "default")

    # Route to different backends based on model
    routing_table = {
        "gpt-4": "http://openai-agent-service:8000",
        "claude-3": "http://anthropic-agent-service:8000",
        "local-llama": "http://local-agent-service:8000",
        "default": SERVICES["agents"],
    }

    target_url = routing_table.get(model, routing_table["default"])

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{target_url}/api/agents/execute",
            json=body,
            headers={
                "Authorization": request.headers.get("Authorization"),
                "X-Org-Id": user.org_id,
                "X-User-Id": user.sub,
            },
            timeout=120.0,
        )

    return response.json()

Gateway-Level Rate Limiting with Redis

Implement tiered rate limiting based on the user's subscription plan:

import redis.asyncio as redis

redis_client = redis.from_url("redis://redis:6379/0")

PLAN_LIMITS = {
    "free": {"rpm": 10, "rpd": 100},
    "pro": {"rpm": 60, "rpd": 5000},
    "enterprise": {"rpm": 300, "rpd": 50000},
}


async def check_rate_limit(user: TokenPayload = Depends(get_current_user)):
    plan = await get_user_plan(user.sub)
    limits = PLAN_LIMITS.get(plan, PLAN_LIMITS["free"])

    minute_key = f"rl:{user.sub}:minute:{int(time.time()) // 60}"
    day_key = f"rl:{user.sub}:day:{int(time.time()) // 86400}"

    pipe = redis_client.pipeline()
    pipe.incr(minute_key)
    pipe.expire(minute_key, 60)
    pipe.incr(day_key)
    pipe.expire(day_key, 86400)
    results = await pipe.execute()

    minute_count = results[0]
    day_count = results[2]

    if minute_count > limits["rpm"]:
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded (per minute)",
            headers={"Retry-After": "60"},
        )
    if day_count > limits["rpd"]:
        raise HTTPException(
            status_code=429,
            detail="Daily rate limit exceeded",
            headers={"Retry-After": "3600"},
        )

FAQ

Should I use Kong, Traefik, or a custom gateway?

Use Kong if you need a mature plugin ecosystem with built-in support for JWT, OAuth2, OIDC, and advanced rate limiting out of the box. Use Traefik if you are on Kubernetes and want auto-discovery of services through ingress annotations. Build a custom FastAPI gateway when you need content-based routing, complex request transformation, or business logic in the gateway layer. Many teams start with Traefik for basic routing and add a thin FastAPI gateway behind it for application-specific logic.

How do I handle streaming responses through a gateway?

AI agent responses often stream via SSE (Server-Sent Events). Your gateway must proxy the response as a stream without buffering the entire body. In a custom FastAPI gateway, use httpx.AsyncClient.stream() and return a StreamingResponse. In Kong and Traefik, disable response buffering for streaming endpoints. Test latency carefully — gateways that buffer before forwarding add significant time-to-first-token latency.

How should I version my AI agent API through the gateway?

Use URL path versioning (/v1/agents, /v2/agents) routed to different backend services. The gateway maintains a routing table that maps version prefixes to the appropriate service version. Support a Sunset response header on deprecated versions to give clients advance notice. Allow enterprise customers to pin to specific versions while gradually migrating the default version for new users.


#APIGateway #Kong #Traefik #FastAPI #AIAgents #RateLimiting #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.