CDN and Edge Caching for Agent Static Assets: Reducing Global Latency

Why CDN Matters for AI Agent Systems

AI agent interfaces are web applications. Users load JavaScript bundles, CSS files, and HTML pages before they can even send their first message. If your agent's frontend is served from a single origin in us-east-1 and your user is in Tokyo, every static asset request adds 200-300ms of round-trip latency.

A CDN (Content Delivery Network) caches your static assets at edge locations worldwide. A user in Tokyo gets assets from an edge server in Tokyo — 10ms instead of 200ms. This is not just a frontend concern. Agent systems also benefit from edge-caching API responses, pre-computed embeddings, and knowledge base snapshots.

Setting Cache Headers for Static Assets

The foundation of CDN caching is correct HTTP cache headers. Different asset types need different caching strategies.

from fastapi import FastAPI, Response
from fastapi.staticfiles import StaticFiles

app = FastAPI()

# Serve static files with aggressive caching
app.mount("/static", StaticFiles(directory="static"), name="static")

@app.middleware("http")
async def add_cache_headers(request, call_next):
    response = await call_next(request)
    path = request.url.path

    if path.startswith("/static/"):
        # Static assets with content hashes: cache forever
        if any(path.endswith(ext) for ext in [".js", ".css", ".woff2"]):
            response.headers["Cache-Control"] = "public, max-age=31536000, immutable"
        # Images: cache for 1 week
        elif any(path.endswith(ext) for ext in [".png", ".jpg", ".svg"]):
            response.headers["Cache-Control"] = "public, max-age=604800"

    elif path.startswith("/api/knowledge/"):
        # Knowledge base responses: cache at edge for 5 minutes
        response.headers["Cache-Control"] = "public, s-maxage=300, max-age=60"
        response.headers["CDN-Cache-Control"] = "max-age=300"

    elif path.startswith("/api/chat"):
        # Chat responses: never cache
        response.headers["Cache-Control"] = "no-store, no-cache"

    return response

The key distinction is max-age (browser cache) versus s-maxage (CDN/proxy cache). You can tell the CDN to cache for 5 minutes while telling the browser to cache for only 1 minute — this gives you faster invalidation at the browser while still benefiting from edge caching.

Edge Functions for Dynamic Caching

Edge functions run at CDN edge locations and can make caching decisions dynamically. This is powerful for agent systems that serve personalized but cacheable content.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

# Cloudflare Worker example (JavaScript at the edge)
# This concept applies to any edge function platform

# Python equivalent for understanding the logic:
class EdgeCacheRouter:
    """Simulates edge function caching logic."""

    def __init__(self):
        self.cache = {}

    async def handle_request(self, request: dict) -> dict:
        path = request["path"]
        user_id = request.get("headers", {}).get("x-user-id")

        # FAQ and knowledge base: cache per path (shared across users)
        if path.startswith("/api/knowledge/"):
            cache_key = f"knowledge:{path}"
            if cache_key in self.cache:
                return self.cache[cache_key]

            response = await self.fetch_origin(request)
            self.cache[cache_key] = response
            return response

        # User-specific but cacheable data: cache per user+path
        if path.startswith("/api/user-context/"):
            cache_key = f"user:{user_id}:{path}"
            if cache_key in self.cache:
                return self.cache[cache_key]

            response = await self.fetch_origin(request)
            self.cache[cache_key] = response
            return response

        # Chat messages: always pass through to origin
        return await self.fetch_origin(request)

    async def fetch_origin(self, request: dict) -> dict:
        """Forward request to the origin server."""
        pass  # Implementation depends on platform

Caching Pre-Computed Agent Responses

For common queries, you can pre-compute agent responses and cache them at the edge. This turns a 2-second LLM call into a 10ms edge cache hit.

import json
import hashlib
from typing import Optional

class PrecomputedResponseCache:
    def __init__(self, redis_client, cdn_purge_client):
        self.redis = redis_client
        self.cdn = cdn_purge_client

    async def precompute_common_queries(self, agent, queries: list[str]):
        """Pre-run the agent for common queries and cache the results."""
        for query in queries:
            result = await agent.run(query)
            cache_key = hashlib.sha256(query.lower().strip().encode()).hexdigest()

            await self.redis.set(
                f"precomputed:{cache_key}",
                json.dumps({
                    "query": query,
                    "response": result,
                    "precomputed": True,
                }),
                ex=3600,  # 1 hour TTL
            )

    async def get_precomputed(self, query: str) -> Optional[str]:
        cache_key = hashlib.sha256(query.lower().strip().encode()).hexdigest()
        cached = await self.redis.get(f"precomputed:{cache_key}")
        if cached:
            return json.loads(cached)["response"]
        return None

# Pre-compute the top 100 most common queries nightly
common_queries = [
    "What is your return policy?",
    "How do I track my order?",
    "What are your business hours?",
    "How do I cancel my subscription?",
]
await cache.precompute_common_queries(agent, common_queries)

Geographic Optimization: Routing to Nearest Origin

When edge caching is not enough (the request must reach your origin server), geographic routing sends the request to the nearest origin.

from fastapi import FastAPI, Request

app = FastAPI()

# Map regions to nearest LLM API endpoints (if using multiple regions)
REGION_ENDPOINTS = {
    "us": "https://us.api.openai.com",
    "eu": "https://eu.api.openai.com",
    "asia": "https://asia.api.openai.com",
}

def get_nearest_region(request: Request) -> str:
    """Determine the nearest region from request headers."""
    # CDNs typically inject geographic headers
    country = request.headers.get("cf-ipcountry", "US")

    region_map = {
        "US": "us", "CA": "us", "MX": "us",
        "GB": "eu", "DE": "eu", "FR": "eu", "NL": "eu",
        "JP": "asia", "KR": "asia", "SG": "asia", "AU": "asia",
    }
    return region_map.get(country, "us")

@app.post("/api/chat")
async def chat(request: Request):
    region = get_nearest_region(request)
    endpoint = REGION_ENDPOINTS[region]
    # Route the LLM call to the nearest endpoint
    return await forward_to_llm(endpoint, request)

Cache Invalidation Strategy

The hardest part of caching is knowing when to invalidate. For agent systems, use event-driven invalidation.

import asyncio

class CacheInvalidator:
    def __init__(self, redis_client, cdn_client):
        self.redis = redis_client
        self.cdn = cdn_client

    async def on_knowledge_base_updated(self, category: str):
        """Invalidate caches when knowledge base content changes."""
        # Clear Redis cache for this category
        keys = await self.redis.keys(f"knowledge:{category}:*")
        if keys:
            await self.redis.delete(*keys)

        # Purge CDN cache for knowledge endpoints
        await self.cdn.purge_by_prefix(f"/api/knowledge/{category}")

        # Re-precompute affected cached responses
        affected_queries = await self.get_queries_for_category(category)
        await self.precompute_cache.precompute_common_queries(
            self.agent, affected_queries
        )

    async def on_policy_changed(self):
        """Nuclear option: clear all cached responses."""
        await self.redis.flushdb()
        await self.cdn.purge_all()

FAQ

Should I put my LLM API calls behind a CDN?

No. LLM API calls are dynamic, personalized, and non-cacheable. What you should cache at the edge are: static frontend assets (JavaScript, CSS, images), knowledge base API responses, pre-computed answers to common queries, and user context data that changes infrequently.

How do I measure CDN cache hit rate?

Most CDN providers expose cache hit ratio in their analytics dashboards. You can also check the cf-cache-status header (Cloudflare) or x-cache header (CloudFront) in responses. A healthy agent system should have 80-95% cache hit rate for static assets and 30-60% for API responses.

What is the difference between `Cache-Control` and `CDN-Cache-Control`?

Cache-Control is the standard HTTP header respected by both browsers and CDNs. CDN-Cache-Control (supported by Cloudflare and others) overrides Cache-Control specifically for the CDN while leaving browser caching unchanged. This lets you set a 5-minute CDN cache with a 30-second browser cache, giving you fast invalidation at the browser while still reducing origin load.

#CDN #EdgeComputing #Caching #GlobalLatency #Python #AgenticAI #LearnAI #AIEngineering

CDN and Edge Caching for Agent Static Assets: Reducing Global Latency

Why CDN Matters for AI Agent Systems

Setting Cache Headers for Static Assets

Edge Functions for Dynamic Caching

Caching Pre-Computed Agent Responses

Geographic Optimization: Routing to Nearest Origin

Cache Invalidation Strategy

FAQ

Should I put my LLM API calls behind a CDN?

How do I measure CDN cache hit rate?

What is the difference between `Cache-Control` and `CDN-Cache-Control`?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding

Why CDN Matters for AI Agent Systems

Setting Cache Headers for Static Assets

Edge Functions for Dynamic Caching

Caching Pre-Computed Agent Responses

Geographic Optimization: Routing to Nearest Origin

Cache Invalidation Strategy

FAQ

Should I put my LLM API calls behind a CDN?

How do I measure CDN cache hit rate?

What is the difference between Cache-Control and CDN-Cache-Control?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding

What is the difference between `Cache-Control` and `CDN-Cache-Control`?