LangGraph Checkpointing: Persistence and Time Travel for Agent Workflows
Implement persistence and time travel in LangGraph using MemorySaver, SqliteSaver, and PostgresSaver to checkpoint agent state, replay past executions, and recover from failures.
Why Checkpointing Matters
Without checkpointing, a LangGraph workflow is ephemeral. If the process crashes mid-execution, all state is lost and you must start over. Checkpointing solves this by saving the graph state after every node execution. This enables three critical capabilities: crash recovery, conversation memory across sessions, and time travel to inspect or replay past states.
MemorySaver: In-Memory Checkpointing
The simplest checkpointer stores state in a Python dictionary. It is perfect for development and testing:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
class State(TypedDict):
messages: Annotated[list, add_messages]
def echo(state: State) -> dict:
last = state["messages"][-1].content
return {"messages": [{"role": "assistant", "content": f"Echo: {last}"}]}
builder = StateGraph(State)
builder.add_node("echo", echo)
builder.add_edge(START, "echo")
builder.add_edge("echo", END)
memory = MemorySaver()
graph = builder.compile(checkpointer=memory)
All state is lost when the process exits. Use this only for development.
Thread IDs for Conversation Isolation
Each conversation gets its own thread ID. This lets multiple users share the same graph instance:
from langchain_core.messages import HumanMessage
# Conversation 1
config1 = {"configurable": {"thread_id": "user-alice"}}
graph.invoke({"messages": [HumanMessage(content="Hi, I'm Alice")]}, config1)
graph.invoke({"messages": [HumanMessage(content="What's my name?")]}, config1)
# Conversation 2 — completely isolated
config2 = {"configurable": {"thread_id": "user-bob"}}
graph.invoke({"messages": [HumanMessage(content="Hi, I'm Bob")]}, config2)
Each thread maintains its own state history. Alice and Bob never see each other's messages.
SqliteSaver: Persistent Local Storage
For persistence that survives process restarts, use the SQLite checkpointer:
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3
conn = sqlite3.connect("checkpoints.db", check_same_thread=False)
sqlite_saver = SqliteSaver(conn)
graph = builder.compile(checkpointer=sqlite_saver)
# State persists to disk
config = {"configurable": {"thread_id": "persistent-thread"}}
graph.invoke({"messages": [HumanMessage(content="Remember this")]}, config)
# Later, even after restart, the conversation continues
result = graph.invoke(
{"messages": [HumanMessage(content="What did I say?")]},
config,
)
The SQLite file contains the full state history for every thread, including all intermediate checkpoints.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
PostgresSaver: Production Persistence
For production deployments, use PostgreSQL:
from langgraph.checkpoint.postgres import PostgresSaver
DB_URI = "postgresql://user:password@localhost:5432/langgraph_db"
with PostgresSaver.from_conn_string(DB_URI) as pg_saver:
pg_saver.setup() # Creates tables on first run
graph = builder.compile(checkpointer=pg_saver)
config = {"configurable": {"thread_id": "prod-session-123"}}
result = graph.invoke(
{"messages": [HumanMessage(content="Process this order")]},
config,
)
PostgresSaver handles concurrent access, transactions, and connection pooling. Call setup() once to create the required checkpoint tables.
Time Travel: Inspecting Past States
Every node execution creates a checkpoint. You can list and inspect all checkpoints for a thread:
config = {"configurable": {"thread_id": "my-thread"}}
# Get current state
current = graph.get_state(config)
print("Current messages:", len(current.values["messages"]))
# List all checkpoints (state history)
history = list(graph.get_state_history(config))
for i, state in enumerate(history):
print(f"Checkpoint {i}: {len(state.values['messages'])} messages")
print(f" Created by node: {state.metadata.get('source', 'unknown')}")
Replaying from a Past Checkpoint
You can resume execution from any historical checkpoint by providing its ID:
# Get the second-to-last checkpoint
history = list(graph.get_state_history(config))
past_state = history[2] # Go back two steps
# Resume from that point with new input
past_config = {
"configurable": {
"thread_id": "my-thread",
"checkpoint_id": past_state.config["configurable"]["checkpoint_id"],
}
}
result = graph.invoke(
{"messages": [HumanMessage(content="Try a different approach")]},
past_config,
)
This creates a new branch in the state history. The original checkpoints remain untouched, giving you a full audit trail of every execution path.
FAQ
Does checkpointing add significant overhead?
MemorySaver adds negligible overhead. SqliteSaver and PostgresSaver add serialization and I/O time proportional to state size. For typical chat agents with dozens of messages, each checkpoint takes a few milliseconds. For agents with very large state objects, consider keeping state lean and storing bulk data externally.
Can I delete old checkpoints to save storage?
There is no built-in pruning API in the core library. For PostgresSaver, you can write SQL queries to delete checkpoints older than a retention period. For SqliteSaver, you can run a cleanup job against the database file directly.
Is the checkpoint format portable between saver backends?
No. Each saver serializes state in its own format. You cannot migrate checkpoints from SQLite to PostgreSQL directly. If you need to migrate, you would read state from one saver and write it to another programmatically.
#LangGraph #Checkpointing #Persistence #TimeTravel #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.