Serverless AI Agents: Running Agents on AWS Lambda and Cloud Functions
Deploy AI agents as serverless functions on AWS Lambda and Google Cloud Functions with cold start optimization, timeout handling, stateless architecture, and cost-effective scaling strategies.
Why Serverless for AI Agents
Serverless platforms scale to zero when there is no traffic and scale to thousands of concurrent executions when demand spikes — without you managing a single server. For AI agent workloads with unpredictable traffic patterns, this translates to significant cost savings. You pay only for the milliseconds your agent is actively processing, not for idle pods waiting for requests.
However, serverless introduces constraints that require careful design: cold starts add latency, execution timeouts limit long-running agent tasks, there is no persistent local state, and you cannot maintain WebSocket connections. Understanding these tradeoffs helps you decide which agent workloads belong on Lambda and which need dedicated infrastructure.
When Serverless Works for AI Agents
Serverless is a good fit when your agent: handles simple single-turn queries with response times under 60 seconds, has bursty traffic with quiet periods, does not require persistent in-memory state between requests, and calls external LLM APIs rather than running local models.
Serverless is a poor fit when: you need WebSocket streaming, responses take longer than the platform timeout, the agent requires GPU inference, or you need persistent connections to databases that cannot handle connection surge.
AWS Lambda Agent with Python
Here is a complete Lambda function that runs an AI agent:
# lambda_function.py
import json
import os
import uuid
import boto3
from agents import Agent, Runner
# Initialize outside handler for connection reuse across invocations
agent = Agent(
name="assistant",
instructions="You are a helpful assistant. Keep responses concise.",
model=os.environ.get("AGENT_MODEL", "gpt-4o-mini"),
)
# DynamoDB for session persistence
dynamodb = boto3.resource("dynamodb")
sessions_table = dynamodb.Table(os.environ["SESSIONS_TABLE"])
def get_session_history(session_id: str) -> list:
"""Load conversation history from DynamoDB."""
try:
response = sessions_table.get_item(Key={"session_id": session_id})
return response.get("Item", {}).get("history", [])
except Exception:
return []
def save_session_history(session_id: str, history: list):
"""Persist conversation history to DynamoDB."""
sessions_table.put_item(Item={
"session_id": session_id,
"history": history,
"ttl": int(__import__("time").time()) + 3600, # 1 hour TTL
})
def handler(event, context):
try:
body = json.loads(event.get("body", "{}"))
message = body.get("message", "")
session_id = body.get("session_id") or str(uuid.uuid4())
if not message:
return {
"statusCode": 400,
"body": json.dumps({"error": "message is required"}),
}
history = get_session_history(session_id)
# Run the agent synchronously (Lambda does not support async handlers)
import asyncio
result = asyncio.get_event_loop().run_until_complete(
Runner.run(agent, message, message_history=history)
)
new_history = result.to_input_list()
save_session_history(session_id, new_history)
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({
"session_id": session_id,
"reply": result.final_output,
"remaining_time_ms": context.get_remaining_time_in_millis(),
}),
}
except Exception as e:
return {
"statusCode": 500,
"body": json.dumps({"error": str(e)}),
}
Infrastructure as Code with SAM
Define your Lambda and API Gateway with AWS SAM:
# template.yaml
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Globals:
Function:
Timeout: 90
MemorySize: 512
Runtime: python3.12
Environment:
Variables:
AGENT_MODEL: gpt-4o-mini
SESSIONS_TABLE: !Ref SessionsTable
Resources:
AgentFunction:
Type: AWS::Serverless::Function
Properties:
Handler: lambda_function.handler
CodeUri: src/
Policies:
- DynamoDBCrudPolicy:
TableName: !Ref SessionsTable
Events:
AgentApi:
Type: Api
Properties:
Path: /agent/chat
Method: post
SessionsTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: agent-sessions
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: session_id
AttributeType: S
KeySchema:
- AttributeName: session_id
KeyType: HASH
TimeToLiveSpecification:
AttributeName: ttl
Enabled: true
Deploy with:
sam build
sam deploy --guided
Cold Start Optimization
Cold starts happen when Lambda creates a new execution environment. For Python-based agents, this adds 1-3 seconds of latency. Minimize it:
# Move all imports and initialization outside the handler
import json # These run during cold start, then are cached
import os
import boto3
from agents import Agent, Runner
agent = Agent(...) # Initialized once, reused across invocations
dynamodb = boto3.resource("dynamodb") # Connection reused
def handler(event, context):
# Only request-specific logic here
pass
Use provisioned concurrency to keep warm instances ready:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# In SAM template
AgentFunction:
Type: AWS::Serverless::Function
Properties:
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 5
This keeps 5 instances warm at all times, eliminating cold starts for the first 5 concurrent requests.
Handling Timeouts Gracefully
Lambda has a maximum timeout of 15 minutes (API Gateway timeout is 29 seconds). Check remaining time and fail gracefully:
def handler(event, context):
remaining_ms = context.get_remaining_time_in_millis()
if remaining_ms < 10000: # Less than 10 seconds left
return {
"statusCode": 503,
"body": json.dumps({
"error": "Insufficient time remaining",
"suggestion": "Use async processing for complex queries",
}),
}
# For long-running tasks, use Step Functions instead
pass
Cost Comparison: Serverless vs. Kubernetes
For an agent service handling 10,000 requests per day with an average execution time of 5 seconds:
AWS Lambda: 10,000 requests x 5 seconds x 512 MB = 25,000 GB-seconds/day. At $0.0000166667 per GB-second, that is roughly $12.50/month plus API Gateway costs.
Kubernetes (2 pods, t3.medium): 2 x $30/month = $60/month, running 24/7 regardless of traffic.
Lambda wins for bursty, low-to-moderate traffic. Kubernetes wins for sustained high traffic where pods stay utilized.
Stateless Design Pattern
Since Lambda instances are ephemeral, externalize all state:
# Session state -> DynamoDB
# Cache -> ElastiCache/Redis
# File uploads -> S3
# Task queues -> SQS
# Conversation history -> DynamoDB with TTL
Never rely on /tmp storage or global variables persisting between invocations — they might, but Lambda provides no guarantee.
FAQ
Can I stream AI agent responses from AWS Lambda?
Lambda itself does not support SSE or WebSocket streaming. However, you can use Lambda Function URLs with response streaming enabled — this allows chunked transfer encoding. Alternatively, use API Gateway WebSocket APIs backed by Lambda for bidirectional streaming, though this adds architectural complexity. For simple streaming, consider keeping a dedicated FastAPI service for the streaming endpoint while using Lambda for batch processing.
How do I handle Lambda's 6 MB response payload limit?
For AI agents, 6 MB is typically more than enough for text responses. If your agent generates large outputs (like code generation or document creation), write the output to S3 and return a pre-signed URL in the Lambda response. Set the URL to expire after a reasonable period, like 15 minutes.
Is provisioned concurrency worth the cost for AI agent Lambdas?
It depends on your latency requirements. Provisioned concurrency costs roughly the same as running an equivalent EC2 instance 24/7. If your agents serve user-facing requests where a 2-3 second cold start is unacceptable, provisioned concurrency is worth it. If the agent runs background tasks where latency is not critical, on-demand concurrency is more cost-effective. Start without it and add provisioned concurrency only for latency-sensitive paths.
#Serverless #AWSLambda #AIAgents #CloudFunctions #CostOptimization #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.