Skip to content
Learn Agentic AI11 min read1 views

WebSocket Servers for AI Agents: Real-Time Bidirectional Agent Communication

Build real-time AI agent interfaces using WebSocket connections in FastAPI with connection lifecycle management, heartbeat mechanisms, and structured message protocols.

Why WebSockets for AI Agents

REST endpoints work well for simple request-response agent interactions, but they fall short when you need real-time, bidirectional communication. Think of a coding assistant that streams tokens as it generates code, receives user interruptions mid-generation, and pushes tool execution updates back to the client — all within a single persistent connection.

WebSockets maintain a long-lived TCP connection between client and server, allowing both sides to send messages at any time without the overhead of repeated HTTP handshakes. For AI agents, this means token-by-token streaming, live status updates during tool calls, and the ability for users to cancel or redirect the agent mid-response.

Basic WebSocket Setup in FastAPI

FastAPI has native WebSocket support. Here is a minimal agent WebSocket endpoint:

# app/routes/ws_agent.py
from fastapi import APIRouter, WebSocket, WebSocketDisconnect
import json

router = APIRouter()

@router.websocket("/ws/agent")
async def agent_websocket(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            raw = await websocket.receive_text()
            message = json.loads(raw)

            if message.get("type") == "chat":
                await handle_chat(websocket, message)
            elif message.get("type") == "cancel":
                await handle_cancel(websocket, message)
            elif message.get("type") == "ping":
                await websocket.send_json({"type": "pong"})

    except WebSocketDisconnect:
        print("Client disconnected")

Defining a Message Protocol

Establish a clear protocol so clients and servers communicate consistently:

# app/models/ws_messages.py
from pydantic import BaseModel
from typing import Optional, Literal

class ClientMessage(BaseModel):
    type: Literal["chat", "cancel", "ping"]
    session_id: Optional[str] = None
    content: Optional[str] = None

class ServerMessage(BaseModel):
    type: Literal["token", "complete", "error", "status", "pong"]
    session_id: str
    content: Optional[str] = None
    metadata: Optional[dict] = None

Validate every incoming message:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

async def agent_websocket(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            raw = await websocket.receive_text()
            try:
                message = ClientMessage.model_validate_json(raw)
            except Exception:
                await websocket.send_json({
                    "type": "error",
                    "content": "Invalid message format",
                    "session_id": "",
                })
                continue

            if message.type == "chat":
                await handle_chat(websocket, message)
    except WebSocketDisconnect:
        pass

Streaming Agent Responses Token by Token

Stream the agent output as it generates, giving users immediate feedback:

from agents import Agent, Runner

agent = Agent(name="assistant", instructions="You are a helpful assistant.")

async def handle_chat(websocket: WebSocket, message: ClientMessage):
    session_id = message.session_id or str(uuid.uuid4())

    await websocket.send_json({
        "type": "status",
        "session_id": session_id,
        "content": "thinking",
    })

    result = Runner.run_streamed(agent, message.content)

    async for event in result.stream_events():
        if event.type == "raw_response_event":
            delta = event.data
            if hasattr(delta, "delta") and hasattr(delta.delta, "content"):
                await websocket.send_json({
                    "type": "token",
                    "session_id": session_id,
                    "content": delta.delta.content,
                })

    await websocket.send_json({
        "type": "complete",
        "session_id": session_id,
        "content": result.final_output,
    })

Connection Manager for Multiple Clients

Track active connections so you can broadcast updates or clean up stale sessions:

# app/services/connection_manager.py
from fastapi import WebSocket
import asyncio

class ConnectionManager:
    def __init__(self):
        self.active: dict[str, WebSocket] = {}
        self.locks: dict[str, asyncio.Lock] = {}

    async def connect(self, session_id: str, websocket: WebSocket):
        await websocket.accept()
        self.active[session_id] = websocket
        self.locks[session_id] = asyncio.Lock()

    def disconnect(self, session_id: str):
        self.active.pop(session_id, None)
        self.locks.pop(session_id, None)

    async def send(self, session_id: str, data: dict):
        ws = self.active.get(session_id)
        if ws:
            async with self.locks[session_id]:
                await ws.send_json(data)

manager = ConnectionManager()

Heartbeat Mechanism

Detect dead connections before they cause resource leaks:

import asyncio

async def heartbeat_loop(websocket: WebSocket, interval: int = 30):
    """Send pings to detect dead connections."""
    try:
        while True:
            await asyncio.sleep(interval)
            await websocket.send_json({"type": "ping"})
    except Exception:
        pass  # Connection closed

@router.websocket("/ws/agent")
async def agent_websocket(websocket: WebSocket):
    await websocket.accept()
    heartbeat_task = asyncio.create_task(
        heartbeat_loop(websocket, interval=30)
    )
    try:
        while True:
            raw = await websocket.receive_text()
            message = ClientMessage.model_validate_json(raw)
            if message.type == "chat":
                await handle_chat(websocket, message)
    except WebSocketDisconnect:
        heartbeat_task.cancel()

FAQ

How do I handle authentication on WebSocket connections?

WebSocket connections start as an HTTP upgrade request, so you can authenticate during the handshake. Pass a JWT token as a query parameter (/ws/agent?token=xxx) or in a header. Validate the token in the WebSocket endpoint before calling websocket.accept(). Reject invalid tokens by closing the connection with a 4001 code.

What happens when the WebSocket connection drops mid-agent-response?

The server receives a WebSocketDisconnect exception. Cancel any running agent tasks for that session to avoid wasting LLM tokens. On the client side, implement automatic reconnection with exponential backoff and include the session_id so the server can resume the conversation context from where it left off.

How many concurrent WebSocket connections can a single FastAPI server handle?

A single uvicorn worker can handle thousands of concurrent WebSocket connections since they are I/O-bound. The bottleneck is typically the LLM API rate limit, not the WebSocket connections themselves. Run multiple uvicorn workers with --workers 4 and use a load balancer with sticky sessions to distribute connections across workers.


#WebSocket #AIAgents #RealTime #FastAPI #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.