Skip to content
Learn Agentic AI11 min read0 views

Deploying TypeScript AI Agents: Vercel, Railway, and Docker Strategies

A practical guide to deploying TypeScript AI agents in production. Compare Vercel serverless, Railway containers, and Docker self-hosted strategies. Covers environment configuration, scaling, health checks, monitoring, and cost optimization.

Deployment Considerations for AI Agents

AI agent applications have unique deployment requirements that differ from typical web apps. Long-running requests (LLM calls take 2-30 seconds), streaming responses that hold connections open, high memory usage during conversation context assembly, and the need for secrets management for API keys all influence your platform choice.

This guide compares three popular deployment strategies for TypeScript AI agents and provides production-ready configurations for each.

Strategy 1: Vercel Serverless

Best for: Next.js agent applications with moderate traffic and short-to-medium agent interactions.

Vercel's serverless functions handle scaling automatically and integrate tightly with Next.js. The key limitation is function execution timeout — 10 seconds on the Hobby plan, 60 seconds on Pro, and 300 seconds on Enterprise.

// app/api/agent/route.ts
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";

// Extend the default timeout for agent routes
export const maxDuration = 60;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai("gpt-4o"),
    messages,
    maxSteps: 5,
  });

  return result.toDataStreamResponse();
}

Environment variables are configured in the Vercel dashboard or via CLI:

vercel env add OPENAI_API_KEY production

Deployment is a single command:

vercel --prod

Advantages: Zero infrastructure management, automatic scaling, built-in CDN for static assets, preview deployments for every PR.

Limitations: Execution timeout caps, no persistent connections (WebSockets require separate infrastructure), cold starts add latency to the first request.

Strategy 2: Railway Containers

Best for: Agent applications that need persistent processes, WebSocket support, or longer execution times.

Railway runs your application in a container with no execution time limits. You get a persistent process that can maintain in-memory state, WebSocket connections, and background jobs.

Create a Dockerfile for your agent application:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

FROM node:20-alpine AS builder

WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
COPY . .
RUN pnpm build

FROM node:20-alpine AS runner
WORKDIR /app

RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 agent

COPY --from=builder --chown=agent:nodejs /app/.next/standalone ./
COPY --from=builder --chown=agent:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=agent:nodejs /app/public ./public

USER agent
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"

CMD ["node", "server.js"]

Configure Next.js for standalone output:

// next.config.mjs
const nextConfig = {
  output: "standalone",
};

export default nextConfig;

Railway automatically detects the Dockerfile and deploys. Set environment variables in the Railway dashboard and connect a database if needed.

Advantages: No timeout limits, persistent process, WebSocket support, easy database provisioning, generous free tier.

Limitations: Single-region by default (add replicas manually), you manage scaling configuration.

Strategy 3: Docker Self-Hosted

Best for: Full control over infrastructure, multi-service architectures, or compliance requirements.

For self-hosted deployments, use Docker Compose for development and Kubernetes for production.

Development compose file:

# docker-compose.yml
services:
  agent-app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - DATABASE_URL=postgresql://agent:secret@postgres:5432/agentdb
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: agent
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: agentdb
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

volumes:
  pgdata:

For Kubernetes, create a deployment with resource limits and health checks:

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ai-agent
  template:
    spec:
      containers:
        - name: agent
          image: registry.example.com/ai-agent:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          env:
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: agent-secrets
                  key: openai-api-key

Health Check Endpoint

Every deployment strategy needs a health check:

// app/api/health/route.ts
import { NextResponse } from "next/server";

export async function GET() {
  const checks = {
    status: "healthy",
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    memory: process.memoryUsage(),
  };

  // Optionally verify LLM connectivity
  try {
    await fetch("https://api.openai.com/v1/models", {
      headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
      signal: AbortSignal.timeout(5000),
    });
    checks.status = "healthy";
  } catch {
    checks.status = "degraded";
  }

  return NextResponse.json(checks, {
    status: checks.status === "healthy" ? 200 : 503,
  });
}

Monitoring and Observability

Track agent performance with structured logging:

// lib/logger.ts
interface AgentEvent {
  type: "request" | "tool_call" | "completion" | "error";
  agentName: string;
  duration?: number;
  tokenUsage?: { prompt: number; completion: number };
  toolName?: string;
  error?: string;
}

export function logAgentEvent(event: AgentEvent) {
  // Structured JSON logging for log aggregation tools
  console.log(JSON.stringify({
    ...event,
    timestamp: new Date().toISOString(),
    environment: process.env.NODE_ENV,
  }));
}

Set up alerts on key metrics: error rate above 5%, average response time above 10 seconds, and memory usage above 80% of limits.

Cost Optimization

AI agent costs are dominated by LLM API usage, not compute. Optimize by:

  1. Caching common queries — Use Redis to cache responses for identical or similar inputs
  2. Choosing the right model — Use GPT-4o-mini for simple tasks and GPT-4o for complex reasoning
  3. Trimming conversation context — Send only the last N messages plus the system prompt, not the entire history
  4. Setting max_tokens — Prevent runaway responses from consuming excessive tokens

FAQ

Which platform should I start with?

Start with Vercel if you are building a Next.js agent app and your interactions complete within 60 seconds. Move to Railway or Docker when you need WebSocket support, background jobs, or longer execution times. The application code remains the same across platforms — only the deployment configuration changes.

How do I handle API key rotation without downtime?

All three platforms support updating environment variables without rebuilding. On Vercel, update via the dashboard and redeploy. On Railway, update the variable and the service restarts automatically. On Kubernetes, update the secret and perform a rolling restart. Never store API keys in code or Docker images.

How many concurrent agent sessions can a single instance handle?

A Node.js instance handles concurrent requests well because agent work is I/O-bound (waiting for LLM API responses). A single instance with 512MB RAM can comfortably handle 50-100 concurrent streaming agent sessions. The bottleneck is typically LLM API rate limits, not your server's capacity.


#Deployment #Vercel #Railway #Docker #TypeScript #AIAgents #DevOps #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.