CDN and Edge Caching for Agent Static Assets: Reducing Global Latency
Set up CDN and edge caching for your AI agent's static assets, API responses, and pre-computed results to reduce global latency with proper cache headers, edge functions, and geographic optimization.
Why CDN Matters for AI Agent Systems
AI agent interfaces are web applications. Users load JavaScript bundles, CSS files, and HTML pages before they can even send their first message. If your agent's frontend is served from a single origin in us-east-1 and your user is in Tokyo, every static asset request adds 200-300ms of round-trip latency.
A CDN (Content Delivery Network) caches your static assets at edge locations worldwide. A user in Tokyo gets assets from an edge server in Tokyo — 10ms instead of 200ms. This is not just a frontend concern. Agent systems also benefit from edge-caching API responses, pre-computed embeddings, and knowledge base snapshots.
Setting Cache Headers for Static Assets
The foundation of CDN caching is correct HTTP cache headers. Different asset types need different caching strategies.
from fastapi import FastAPI, Response
from fastapi.staticfiles import StaticFiles
app = FastAPI()
# Serve static files with aggressive caching
app.mount("/static", StaticFiles(directory="static"), name="static")
@app.middleware("http")
async def add_cache_headers(request, call_next):
response = await call_next(request)
path = request.url.path
if path.startswith("/static/"):
# Static assets with content hashes: cache forever
if any(path.endswith(ext) for ext in [".js", ".css", ".woff2"]):
response.headers["Cache-Control"] = "public, max-age=31536000, immutable"
# Images: cache for 1 week
elif any(path.endswith(ext) for ext in [".png", ".jpg", ".svg"]):
response.headers["Cache-Control"] = "public, max-age=604800"
elif path.startswith("/api/knowledge/"):
# Knowledge base responses: cache at edge for 5 minutes
response.headers["Cache-Control"] = "public, s-maxage=300, max-age=60"
response.headers["CDN-Cache-Control"] = "max-age=300"
elif path.startswith("/api/chat"):
# Chat responses: never cache
response.headers["Cache-Control"] = "no-store, no-cache"
return response
The key distinction is max-age (browser cache) versus s-maxage (CDN/proxy cache). You can tell the CDN to cache for 5 minutes while telling the browser to cache for only 1 minute — this gives you faster invalidation at the browser while still benefiting from edge caching.
Edge Functions for Dynamic Caching
Edge functions run at CDN edge locations and can make caching decisions dynamically. This is powerful for agent systems that serve personalized but cacheable content.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# Cloudflare Worker example (JavaScript at the edge)
# This concept applies to any edge function platform
# Python equivalent for understanding the logic:
class EdgeCacheRouter:
"""Simulates edge function caching logic."""
def __init__(self):
self.cache = {}
async def handle_request(self, request: dict) -> dict:
path = request["path"]
user_id = request.get("headers", {}).get("x-user-id")
# FAQ and knowledge base: cache per path (shared across users)
if path.startswith("/api/knowledge/"):
cache_key = f"knowledge:{path}"
if cache_key in self.cache:
return self.cache[cache_key]
response = await self.fetch_origin(request)
self.cache[cache_key] = response
return response
# User-specific but cacheable data: cache per user+path
if path.startswith("/api/user-context/"):
cache_key = f"user:{user_id}:{path}"
if cache_key in self.cache:
return self.cache[cache_key]
response = await self.fetch_origin(request)
self.cache[cache_key] = response
return response
# Chat messages: always pass through to origin
return await self.fetch_origin(request)
async def fetch_origin(self, request: dict) -> dict:
"""Forward request to the origin server."""
pass # Implementation depends on platform
Caching Pre-Computed Agent Responses
For common queries, you can pre-compute agent responses and cache them at the edge. This turns a 2-second LLM call into a 10ms edge cache hit.
import json
import hashlib
from typing import Optional
class PrecomputedResponseCache:
def __init__(self, redis_client, cdn_purge_client):
self.redis = redis_client
self.cdn = cdn_purge_client
async def precompute_common_queries(self, agent, queries: list[str]):
"""Pre-run the agent for common queries and cache the results."""
for query in queries:
result = await agent.run(query)
cache_key = hashlib.sha256(query.lower().strip().encode()).hexdigest()
await self.redis.set(
f"precomputed:{cache_key}",
json.dumps({
"query": query,
"response": result,
"precomputed": True,
}),
ex=3600, # 1 hour TTL
)
async def get_precomputed(self, query: str) -> Optional[str]:
cache_key = hashlib.sha256(query.lower().strip().encode()).hexdigest()
cached = await self.redis.get(f"precomputed:{cache_key}")
if cached:
return json.loads(cached)["response"]
return None
# Pre-compute the top 100 most common queries nightly
common_queries = [
"What is your return policy?",
"How do I track my order?",
"What are your business hours?",
"How do I cancel my subscription?",
]
await cache.precompute_common_queries(agent, common_queries)
Geographic Optimization: Routing to Nearest Origin
When edge caching is not enough (the request must reach your origin server), geographic routing sends the request to the nearest origin.
from fastapi import FastAPI, Request
app = FastAPI()
# Map regions to nearest LLM API endpoints (if using multiple regions)
REGION_ENDPOINTS = {
"us": "https://us.api.openai.com",
"eu": "https://eu.api.openai.com",
"asia": "https://asia.api.openai.com",
}
def get_nearest_region(request: Request) -> str:
"""Determine the nearest region from request headers."""
# CDNs typically inject geographic headers
country = request.headers.get("cf-ipcountry", "US")
region_map = {
"US": "us", "CA": "us", "MX": "us",
"GB": "eu", "DE": "eu", "FR": "eu", "NL": "eu",
"JP": "asia", "KR": "asia", "SG": "asia", "AU": "asia",
}
return region_map.get(country, "us")
@app.post("/api/chat")
async def chat(request: Request):
region = get_nearest_region(request)
endpoint = REGION_ENDPOINTS[region]
# Route the LLM call to the nearest endpoint
return await forward_to_llm(endpoint, request)
Cache Invalidation Strategy
The hardest part of caching is knowing when to invalidate. For agent systems, use event-driven invalidation.
import asyncio
class CacheInvalidator:
def __init__(self, redis_client, cdn_client):
self.redis = redis_client
self.cdn = cdn_client
async def on_knowledge_base_updated(self, category: str):
"""Invalidate caches when knowledge base content changes."""
# Clear Redis cache for this category
keys = await self.redis.keys(f"knowledge:{category}:*")
if keys:
await self.redis.delete(*keys)
# Purge CDN cache for knowledge endpoints
await self.cdn.purge_by_prefix(f"/api/knowledge/{category}")
# Re-precompute affected cached responses
affected_queries = await self.get_queries_for_category(category)
await self.precompute_cache.precompute_common_queries(
self.agent, affected_queries
)
async def on_policy_changed(self):
"""Nuclear option: clear all cached responses."""
await self.redis.flushdb()
await self.cdn.purge_all()
FAQ
Should I put my LLM API calls behind a CDN?
No. LLM API calls are dynamic, personalized, and non-cacheable. What you should cache at the edge are: static frontend assets (JavaScript, CSS, images), knowledge base API responses, pre-computed answers to common queries, and user context data that changes infrequently.
How do I measure CDN cache hit rate?
Most CDN providers expose cache hit ratio in their analytics dashboards. You can also check the cf-cache-status header (Cloudflare) or x-cache header (CloudFront) in responses. A healthy agent system should have 80-95% cache hit rate for static assets and 30-60% for API responses.
What is the difference between Cache-Control and CDN-Cache-Control?
Cache-Control is the standard HTTP header respected by both browsers and CDNs. CDN-Cache-Control (supported by Cloudflare and others) overrides Cache-Control specifically for the CDN while leaving browser caching unchanged. This lets you set a 5-minute CDN cache with a 30-second browser cache, giving you fast invalidation at the browser while still reducing origin load.
#CDN #EdgeComputing #Caching #GlobalLatency #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.