LangGraph Checkpointing: Persistence and Time Travel for Agent Workflows

Why Checkpointing Matters

Without checkpointing, a LangGraph workflow is ephemeral. If the process crashes mid-execution, all state is lost and you must start over. Checkpointing solves this by saving the graph state after every node execution. This enables three critical capabilities: crash recovery, conversation memory across sessions, and time travel to inspect or replay past states.

MemorySaver: In-Memory Checkpointing

The simplest checkpointer stores state in a Python dictionary. It is perfect for development and testing:

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]

def echo(state: State) -> dict:
    last = state["messages"][-1].content
    return {"messages": [{"role": "assistant", "content": f"Echo: {last}"}]}

builder = StateGraph(State)
builder.add_node("echo", echo)
builder.add_edge(START, "echo")
builder.add_edge("echo", END)

memory = MemorySaver()
graph = builder.compile(checkpointer=memory)

All state is lost when the process exits. Use this only for development.

Thread IDs for Conversation Isolation

Each conversation gets its own thread ID. This lets multiple users share the same graph instance:

from langchain_core.messages import HumanMessage

# Conversation 1
config1 = {"configurable": {"thread_id": "user-alice"}}
graph.invoke({"messages": [HumanMessage(content="Hi, I'm Alice")]}, config1)
graph.invoke({"messages": [HumanMessage(content="What's my name?")]}, config1)

# Conversation 2 — completely isolated
config2 = {"configurable": {"thread_id": "user-bob"}}
graph.invoke({"messages": [HumanMessage(content="Hi, I'm Bob")]}, config2)

Each thread maintains its own state history. Alice and Bob never see each other's messages.

SqliteSaver: Persistent Local Storage

For persistence that survives process restarts, use the SQLite checkpointer:

from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3

conn = sqlite3.connect("checkpoints.db", check_same_thread=False)
sqlite_saver = SqliteSaver(conn)

graph = builder.compile(checkpointer=sqlite_saver)

# State persists to disk
config = {"configurable": {"thread_id": "persistent-thread"}}
graph.invoke({"messages": [HumanMessage(content="Remember this")]}, config)

# Later, even after restart, the conversation continues
result = graph.invoke(
    {"messages": [HumanMessage(content="What did I say?")]},
    config,
)

The SQLite file contains the full state history for every thread, including all intermediate checkpoints.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

PostgresSaver: Production Persistence

For production deployments, use PostgreSQL:

from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = "postgresql://user:password@localhost:5432/langgraph_db"

with PostgresSaver.from_conn_string(DB_URI) as pg_saver:
    pg_saver.setup()  # Creates tables on first run
    graph = builder.compile(checkpointer=pg_saver)

    config = {"configurable": {"thread_id": "prod-session-123"}}
    result = graph.invoke(
        {"messages": [HumanMessage(content="Process this order")]},
        config,
    )

PostgresSaver handles concurrent access, transactions, and connection pooling. Call setup() once to create the required checkpoint tables.

Time Travel: Inspecting Past States

Every node execution creates a checkpoint. You can list and inspect all checkpoints for a thread:

config = {"configurable": {"thread_id": "my-thread"}}

# Get current state
current = graph.get_state(config)
print("Current messages:", len(current.values["messages"]))

# List all checkpoints (state history)
history = list(graph.get_state_history(config))
for i, state in enumerate(history):
    print(f"Checkpoint {i}: {len(state.values['messages'])} messages")
    print(f"  Created by node: {state.metadata.get('source', 'unknown')}")

Replaying from a Past Checkpoint

You can resume execution from any historical checkpoint by providing its ID:

# Get the second-to-last checkpoint
history = list(graph.get_state_history(config))
past_state = history[2]  # Go back two steps

# Resume from that point with new input
past_config = {
    "configurable": {
        "thread_id": "my-thread",
        "checkpoint_id": past_state.config["configurable"]["checkpoint_id"],
    }
}

result = graph.invoke(
    {"messages": [HumanMessage(content="Try a different approach")]},
    past_config,
)

This creates a new branch in the state history. The original checkpoints remain untouched, giving you a full audit trail of every execution path.

FAQ

Does checkpointing add significant overhead?

MemorySaver adds negligible overhead. SqliteSaver and PostgresSaver add serialization and I/O time proportional to state size. For typical chat agents with dozens of messages, each checkpoint takes a few milliseconds. For agents with very large state objects, consider keeping state lean and storing bulk data externally.

Can I delete old checkpoints to save storage?

There is no built-in pruning API in the core library. For PostgresSaver, you can write SQL queries to delete checkpoints older than a retention period. For SqliteSaver, you can run a cleanup job against the database file directly.

Is the checkpoint format portable between saver backends?

No. Each saver serializes state in its own format. You cannot migrate checkpoints from SQLite to PostgreSQL directly. If you need to migrate, you would read state from one saver and write it to another programmatically.

#LangGraph #Checkpointing #Persistence #TimeTravel #Python #AgenticAI #LearnAI #AIEngineering

LangGraph Checkpointing: Persistence and Time Travel for Agent Workflows

Why Checkpointing Matters

MemorySaver: In-Memory Checkpointing

Thread IDs for Conversation Isolation

SqliteSaver: Persistent Local Storage

PostgresSaver: Production Persistence

Time Travel: Inspecting Past States

Replaying from a Past Checkpoint

FAQ

Does checkpointing add significant overhead?

Can I delete old checkpoints to save storage?

Is the checkpoint format portable between saver backends?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding