Skip to content
Learn Agentic AI10 min read0 views

API Security Headers for AI Agent Services: CORS, CSP, and Rate Limit Headers

Configure essential security headers for AI agent APIs including CORS policies, Content Security Policy, rate limit communication headers, and other protective headers with FastAPI middleware examples.

Security Headers: Your API's First Line of Defense

HTTP security headers protect your AI agent API from common attack vectors: cross-origin abuse, content injection, information leakage, and protocol downgrade attacks. Unlike authentication and authorization (which verify who is making the request), security headers define how the request and response should be handled by browsers, proxies, and clients.

For AI agent APIs, security headers serve a dual purpose. They protect browser-based agent interfaces from XSS and clickjacking, and they communicate rate limiting information so agents can self-throttle rather than hitting walls.

CORS Configuration

Cross-Origin Resource Sharing controls which domains can call your API from a browser. For AI agent APIs, you need to balance accessibility (agents running on various domains) with security (preventing unauthorized cross-origin requests).

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

# Production CORS: restrict to known origins
app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "https://app.example.com",
        "https://dashboard.example.com",
        "https://playground.example.com",
    ],
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
    allow_headers=[
        "Authorization",
        "Content-Type",
        "X-API-Key",
        "X-Request-ID",
        "Idempotency-Key",
    ],
    expose_headers=[
        "X-Request-ID",
        "X-RateLimit-Limit",
        "X-RateLimit-Remaining",
        "X-RateLimit-Reset",
        "Retry-After",
    ],
    max_age=3600,
)

The expose_headers configuration is often overlooked. By default, browsers only expose a handful of response headers to JavaScript. Without listing your rate limit headers here, browser-based agents cannot read them, even though server-to-server agents can.

Rate Limit Headers

Rate limiting is essential for AI agent APIs where a single agent can generate hundreds of requests per minute. Communicate limits clearly using standardized headers so agents can self-regulate.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from starlette.middleware.base import BaseHTTPMiddleware
from fastapi import Request
from fastapi.responses import JSONResponse
import time

class RateLimitMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, requests_per_minute: int = 60):
        super().__init__(app)
        self.rpm = requests_per_minute
        # In production, use Redis with sliding window
        self.buckets: dict[str, dict] = {}

    async def dispatch(self, request: Request, call_next):
        client_id = self._get_client_id(request)
        now = time.time()

        bucket = self.buckets.get(client_id, {
            "count": 0, "reset_at": now + 60,
        })

        if now > bucket["reset_at"]:
            bucket = {"count": 0, "reset_at": now + 60}

        bucket["count"] += 1
        self.buckets[client_id] = bucket

        remaining = max(0, self.rpm - bucket["count"])
        reset_at = int(bucket["reset_at"])

        rate_headers = {
            "X-RateLimit-Limit": str(self.rpm),
            "X-RateLimit-Remaining": str(remaining),
            "X-RateLimit-Reset": str(reset_at),
        }

        if bucket["count"] > self.rpm:
            retry_after = int(bucket["reset_at"] - now)
            return JSONResponse(
                status_code=429,
                content={
                    "type": "https://api.example.com/errors/rate-limit",
                    "title": "Rate Limit Exceeded",
                    "detail": f"Limit: {self.rpm} requests/minute",
                    "retryable": True,
                    "retry_after_seconds": retry_after,
                },
                headers={
                    **rate_headers,
                    "Retry-After": str(retry_after),
                },
            )

        response = await call_next(request)
        for key, value in rate_headers.items():
            response.headers[key] = value
        return response

    def _get_client_id(self, request: Request) -> str:
        api_key = request.headers.get("X-API-Key", "")
        if api_key:
            return f"key:{api_key}"
        forwarded = request.headers.get("X-Forwarded-For", "")
        return f"ip:{forwarded or request.client.host}"

app.add_middleware(RateLimitMiddleware, requests_per_minute=100)

Comprehensive Security Headers Middleware

Beyond CORS and rate limiting, add headers that prevent common web attacks and information leakage.

class SecurityHeadersMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        response = await call_next(request)

        # Prevent MIME type sniffing
        response.headers["X-Content-Type-Options"] = "nosniff"

        # Prevent clickjacking
        response.headers["X-Frame-Options"] = "DENY"

        # Control referrer information
        response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"

        # Force HTTPS
        response.headers["Strict-Transport-Security"] = (
            "max-age=31536000; includeSubDomains; preload"
        )

        # Remove server identification
        response.headers.pop("Server", None)

        # Permissions Policy - disable unused browser features
        response.headers["Permissions-Policy"] = (
            "camera=(), microphone=(), geolocation=(), "
            "payment=(), usb=(), magnetometer=()"
        )

        # Content Security Policy for API responses
        if "text/html" in response.headers.get("content-type", ""):
            response.headers["Content-Security-Policy"] = (
                "default-src 'none'; "
                "script-src 'self'; "
                "style-src 'self' 'unsafe-inline'; "
                "img-src 'self' data:; "
                "font-src 'self'; "
                "connect-src 'self'"
            )

        return response

app.add_middleware(SecurityHeadersMiddleware)

Request ID Tracking

Assign a unique ID to every request for distributed tracing. If the client sends an X-Request-ID header, propagate it; otherwise, generate one. This is invaluable for debugging agent interactions across multiple services.

import uuid

class RequestIDMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        request_id = request.headers.get(
            "X-Request-ID", str(uuid.uuid4())
        )
        request.state.request_id = request_id

        response = await call_next(request)
        response.headers["X-Request-ID"] = request_id
        return response

app.add_middleware(RequestIDMiddleware)

FAQ

Should I use wildcard CORS (*) for my AI agent API?

Never use wildcard CORS in production for APIs that use cookies or bearer tokens. A wildcard origin with allow_credentials=True is actually rejected by browsers for security reasons. For public APIs that use API keys in headers rather than cookies, a wildcard origin is acceptable but still not recommended. List specific allowed origins and use environment variables to configure them per deployment environment.

What is the difference between X-RateLimit headers and the standard Retry-After header?

They serve complementary purposes. The X-RateLimit-* headers are informational and sent on every response, telling the client their current quota status (limit, remaining, reset time). The Retry-After header is directive and only sent with 429 or 503 responses, telling the client exactly how many seconds to wait before retrying. Always include both: the rate limit headers for proactive throttling and Retry-After for reactive recovery.

Should I apply rate limiting per API key or per IP address?

Apply rate limiting per API key for authenticated requests and per IP for unauthenticated requests. API key-based limiting is more accurate since multiple users may share an IP (corporate NATs, VPNs). Consider tiered rate limits based on the subscription plan — a free tier might get 10 requests per minute while an enterprise tier gets 1000. Always communicate the current tier's limits in the rate limit response headers.


#APISecurity #CORS #RateLimiting #HTTPHeaders #FastAPI #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.