Production LangGraph: Deploying Stateful Agents with LangGraph Cloud

From Development to Production

Building a LangGraph agent locally is straightforward. Running one in production — handling concurrent users, persisting state across restarts, monitoring execution, recovering from failures, and scaling under load — requires careful architecture. LangGraph Cloud provides managed infrastructure for deploying stateful agents, but you can also self-host with the right patterns.

Structuring Your Project for Deployment

LangGraph Cloud expects a specific project layout:

# langgraph.json — deployment configuration
{
    "dependencies": ["."],
    "graphs": {
        "my_agent": "./agent/graph.py:graph"
    },
    "env": ".env"
}

The graphs field maps endpoint names to compiled graph objects. Your graph module exports the compiled graph:

# agent/graph.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

@tool
def lookup_order(order_id: str) -> str:
    """Look up order details by ID."""
    # Production implementation here
    return f"Order {order_id}: shipped, arriving March 20"

tools = [lookup_order]
llm = ChatOpenAI(model="gpt-4o-mini").bind_tools(tools)
tool_node = ToolNode(tools)

def agent(state: AgentState) -> dict:
    return {"messages": [llm.invoke(state["messages"])]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "tools"
    return "end"

builder = StateGraph(AgentState)
builder.add_node("agent", agent)
builder.add_node("tools", tool_node)
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", should_continue, {
    "tools": "tools",
    "end": END,
})
builder.add_edge("tools", "agent")

graph = builder.compile()

Deploying to LangGraph Cloud

Deploy using the LangGraph CLI:

pip install langgraph-cli
langgraph up

This starts a local development server. For cloud deployment:

langgraph deploy --project my-agent

The deployment creates API endpoints for your graph with built-in persistence, streaming, and thread management.

API Endpoints

Once deployed, LangGraph Cloud exposes REST endpoints:

# Create a new thread
curl -X POST https://your-deployment.langgraph.app/threads \
  -H "Content-Type: application/json" \
  -d '{}'

# Run the agent on a thread
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "Track order 12345"}]
    }
  }'

# Stream responses
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs/stream \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "What is the status?"}]
    },
    "stream_mode": "messages"
  }'

Using the Python SDK

The LangGraph SDK provides a typed client for interacting with deployed agents:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from langgraph_sdk import get_client

client = get_client(url="https://your-deployment.langgraph.app")

# Create a thread
thread = await client.threads.create()

# Run the agent
result = await client.runs.create(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Track order 12345"}]},
)

# Stream responses
async for chunk in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Any updates?"}]},
    stream_mode="messages",
):
    print(chunk)

Cron Triggers for Scheduled Agents

Run agents on a schedule for monitoring, reporting, or maintenance tasks:

# langgraph.json
{
    "dependencies": ["."],
    "graphs": {
        "monitor": "./agent/monitor.py:graph"
    },
    "crons": {
        "daily_report": {
            "graph": "monitor",
            "schedule": "0 9 * * *",
            "input": {
                "messages": [{"role": "human", "content": "Generate daily status report"}]
            }
        }
    }
}

The cron trigger creates a new thread for each execution, runs the graph, and stores the result. You can query past cron runs through the API.

Monitoring and Observability

LangGraph integrates with LangSmith for tracing and monitoring:

# Set environment variables for tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "production-agent"

Every graph execution is traced end-to-end, showing node timings, LLM calls, tool invocations, and state transitions. Set up alerts for error rates, latency spikes, and token usage.

Self-Hosted Production Patterns

If you prefer to self-host rather than use LangGraph Cloud, here are the essential patterns:

# Use PostgreSQL for production checkpointing
from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = os.environ["DATABASE_URL"]

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()
    graph = builder.compile(checkpointer=checkpointer)

    # Wrap in FastAPI for HTTP access
    from fastapi import FastAPI
    app = FastAPI()

    @app.post("/chat/{thread_id}")
    async def chat(thread_id: str, message: str):
        config = {"configurable": {"thread_id": thread_id}}
        result = await graph.ainvoke(
            {"messages": [{"role": "human", "content": message}]},
            config,
        )
        return {"response": result["messages"][-1].content}

Use PostgreSQL for state persistence, Redis for caching, and a process manager like Gunicorn with Uvicorn workers for concurrency.

Scaling Considerations

Stateful agents require careful scaling. Each thread is independent, so you can distribute threads across workers. But a single thread's execution must happen on one worker since the in-progress state is in memory. Use sticky sessions or a queue-based architecture where each run is claimed by exactly one worker.

FAQ

How much does LangGraph Cloud cost?

LangGraph Cloud pricing is based on compute time and storage. Check the LangSmith pricing page for current rates. For high-volume deployments, self-hosting with PostgreSQL and your own compute is typically more cost-effective.

Can I deploy multiple graph versions simultaneously?

Yes. LangGraph Cloud supports versioned deployments. You can route traffic between versions using assistant IDs, enabling canary deployments and A/B testing of different agent configurations.

How do I handle secrets and API keys in production?

Never hardcode secrets. Use environment variables configured through the .env file referenced in langgraph.json or through your cloud provider's secrets management. LangGraph Cloud encrypts environment variables at rest and injects them at runtime.

#LangGraph #Production #Deployment #LangGraphCloud #Python #AgenticAI #LearnAI #AIEngineering