Production LangGraph: Deploying Stateful Agents with LangGraph Cloud
Deploy LangGraph agents to production using LangGraph Cloud with API endpoints, cron triggers, monitoring, scaling strategies, and operational best practices for stateful agent workflows.
From Development to Production
Building a LangGraph agent locally is straightforward. Running one in production — handling concurrent users, persisting state across restarts, monitoring execution, recovering from failures, and scaling under load — requires careful architecture. LangGraph Cloud provides managed infrastructure for deploying stateful agents, but you can also self-host with the right patterns.
Structuring Your Project for Deployment
LangGraph Cloud expects a specific project layout:
# langgraph.json — deployment configuration
{
"dependencies": ["."],
"graphs": {
"my_agent": "./agent/graph.py:graph"
},
"env": ".env"
}
The graphs field maps endpoint names to compiled graph objects. Your graph module exports the compiled graph:
# agent/graph.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
@tool
def lookup_order(order_id: str) -> str:
"""Look up order details by ID."""
# Production implementation here
return f"Order {order_id}: shipped, arriving March 20"
tools = [lookup_order]
llm = ChatOpenAI(model="gpt-4o-mini").bind_tools(tools)
tool_node = ToolNode(tools)
def agent(state: AgentState) -> dict:
return {"messages": [llm.invoke(state["messages"])]}
def should_continue(state: AgentState):
last = state["messages"][-1]
if hasattr(last, "tool_calls") and last.tool_calls:
return "tools"
return "end"
builder = StateGraph(AgentState)
builder.add_node("agent", agent)
builder.add_node("tools", tool_node)
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", should_continue, {
"tools": "tools",
"end": END,
})
builder.add_edge("tools", "agent")
graph = builder.compile()
Deploying to LangGraph Cloud
Deploy using the LangGraph CLI:
pip install langgraph-cli
langgraph up
This starts a local development server. For cloud deployment:
langgraph deploy --project my-agent
The deployment creates API endpoints for your graph with built-in persistence, streaming, and thread management.
API Endpoints
Once deployed, LangGraph Cloud exposes REST endpoints:
# Create a new thread
curl -X POST https://your-deployment.langgraph.app/threads \
-H "Content-Type: application/json" \
-d '{}'
# Run the agent on a thread
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs \
-H "Content-Type: application/json" \
-d '{
"assistant_id": "my_agent",
"input": {
"messages": [{"role": "human", "content": "Track order 12345"}]
}
}'
# Stream responses
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs/stream \
-H "Content-Type: application/json" \
-d '{
"assistant_id": "my_agent",
"input": {
"messages": [{"role": "human", "content": "What is the status?"}]
},
"stream_mode": "messages"
}'
Using the Python SDK
The LangGraph SDK provides a typed client for interacting with deployed agents:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from langgraph_sdk import get_client
client = get_client(url="https://your-deployment.langgraph.app")
# Create a thread
thread = await client.threads.create()
# Run the agent
result = await client.runs.create(
thread_id=thread["thread_id"],
assistant_id="my_agent",
input={"messages": [{"role": "human", "content": "Track order 12345"}]},
)
# Stream responses
async for chunk in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="my_agent",
input={"messages": [{"role": "human", "content": "Any updates?"}]},
stream_mode="messages",
):
print(chunk)
Cron Triggers for Scheduled Agents
Run agents on a schedule for monitoring, reporting, or maintenance tasks:
# langgraph.json
{
"dependencies": ["."],
"graphs": {
"monitor": "./agent/monitor.py:graph"
},
"crons": {
"daily_report": {
"graph": "monitor",
"schedule": "0 9 * * *",
"input": {
"messages": [{"role": "human", "content": "Generate daily status report"}]
}
}
}
}
The cron trigger creates a new thread for each execution, runs the graph, and stores the result. You can query past cron runs through the API.
Monitoring and Observability
LangGraph integrates with LangSmith for tracing and monitoring:
# Set environment variables for tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "production-agent"
Every graph execution is traced end-to-end, showing node timings, LLM calls, tool invocations, and state transitions. Set up alerts for error rates, latency spikes, and token usage.
Self-Hosted Production Patterns
If you prefer to self-host rather than use LangGraph Cloud, here are the essential patterns:
# Use PostgreSQL for production checkpointing
from langgraph.checkpoint.postgres import PostgresSaver
DB_URI = os.environ["DATABASE_URL"]
with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
checkpointer.setup()
graph = builder.compile(checkpointer=checkpointer)
# Wrap in FastAPI for HTTP access
from fastapi import FastAPI
app = FastAPI()
@app.post("/chat/{thread_id}")
async def chat(thread_id: str, message: str):
config = {"configurable": {"thread_id": thread_id}}
result = await graph.ainvoke(
{"messages": [{"role": "human", "content": message}]},
config,
)
return {"response": result["messages"][-1].content}
Use PostgreSQL for state persistence, Redis for caching, and a process manager like Gunicorn with Uvicorn workers for concurrency.
Scaling Considerations
Stateful agents require careful scaling. Each thread is independent, so you can distribute threads across workers. But a single thread's execution must happen on one worker since the in-progress state is in memory. Use sticky sessions or a queue-based architecture where each run is claimed by exactly one worker.
FAQ
How much does LangGraph Cloud cost?
LangGraph Cloud pricing is based on compute time and storage. Check the LangSmith pricing page for current rates. For high-volume deployments, self-hosting with PostgreSQL and your own compute is typically more cost-effective.
Can I deploy multiple graph versions simultaneously?
Yes. LangGraph Cloud supports versioned deployments. You can route traffic between versions using assistant IDs, enabling canary deployments and A/B testing of different agent configurations.
How do I handle secrets and API keys in production?
Never hardcode secrets. Use environment variables configured through the .env file referenced in langgraph.json or through your cloud provider's secrets management. LangGraph Cloud encrypts environment variables at rest and injects them at runtime.
#LangGraph #Production #Deployment #LangGraphCloud #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.