LangChain Memory: ConversationBufferMemory, Summary, and Vector Store Memory
Explore LangChain's memory types for building conversational AI — from simple buffer memory to summarization and vector-store-backed long-term memory with persistence strategies.
Why Agents Need Memory
Large language models are stateless. Each API call starts fresh with no knowledge of previous interactions. For multi-turn conversations or agents that need to reference past information, you must explicitly manage state. LangChain provides memory abstractions that handle this — storing conversation history, summarizing it, or persisting it in a vector store for semantic retrieval.
Understanding the tradeoffs between memory types is essential. Too much context fills your token window and increases costs. Too little context makes the assistant forget important details mid-conversation.
ConversationBufferMemory
The simplest memory type stores every message verbatim.
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
memory = ConversationBufferMemory(return_messages=True)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ConversationChain(llm=llm, memory=memory, verbose=True)
chain.invoke({"input": "My name is Alice."})
chain.invoke({"input": "What is my name?"})
# The model correctly responds "Alice" because it sees the full history
return_messages=True stores history as message objects rather than a single string, which is preferred for chat models. The downside is obvious: as the conversation grows, you eventually exceed the model's context window.
ConversationBufferWindowMemory
This variant keeps only the last k turns, discarding older messages.
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=5, return_messages=True)
Setting k=5 retains the most recent 5 exchanges. This bounds token usage but means the agent will forget information from earlier in the conversation.
ConversationSummaryMemory
Instead of dropping old messages, this memory type summarizes the conversation history using an LLM. The summary is updated after each turn.
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
memory = ConversationSummaryMemory(
llm=llm,
return_messages=True,
)
# After many turns, instead of storing all messages,
# the memory holds a running summary like:
# "The user's name is Alice. She asked about Python decorators
# and was interested in async patterns."
The tradeoff is that summarization costs extra LLM calls and may lose nuance. It works well for long conversations where the gist matters more than exact wording.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
ConversationSummaryBufferMemory
This hybrid keeps recent messages in full while summarizing older ones. You set a max_token_limit — once the buffer exceeds that limit, the oldest messages are summarized.
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=500,
return_messages=True,
)
This gives you the best of both worlds: precise recent context and compressed long-term context.
Vector Store Memory
For agents that need to recall specific facts from potentially thousands of past interactions, vector store memory embeds conversation snippets and retrieves them via semantic search.
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
# Create or load a vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts([], embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
memory = VectorStoreRetrieverMemory(retriever=retriever)
# Save facts
memory.save_context(
{"input": "I prefer Python over JavaScript"},
{"output": "Noted, you prefer Python."},
)
memory.save_context(
{"input": "My project deadline is March 30th"},
{"output": "Got it, your deadline is March 30th."},
)
# Later, only semantically relevant memories are retrieved
relevant = memory.load_memory_variables(
{"input": "What programming language should we use?"}
)
print(relevant)
# Returns the Python preference memory, not the deadline memory
Vector store memory scales to thousands of interactions because retrieval is based on relevance, not recency.
Memory with LCEL Chains
In modern LCEL-based chains, you typically manage history explicitly using RunnableWithMessageHistory.
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder("history"),
("human", "{input}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o-mini")
with_history = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
# Each session maintains its own history
response = with_history.invoke(
{"input": "My name is Bob"},
config={"configurable": {"session_id": "user-123"}},
)
This approach gives you full control over where history is stored — in memory, Redis, a database, or any custom backend.
FAQ
Which memory type should I use for a production chatbot?
For most production chatbots, start with ConversationSummaryBufferMemory or the LCEL RunnableWithMessageHistory with a persistent backend like Redis or PostgreSQL. The summary buffer approach balances cost, context window usage, and information retention. For applications that need to recall specific facts across many sessions, add vector store memory.
Can I combine multiple memory types?
Yes. A common pattern is to use buffer memory for the current conversation and vector store memory for cross-session recall. You can inject both into the prompt — recent messages from the buffer and relevant past facts from the vector store.
How do I persist memory across server restarts?
In-memory stores like ChatMessageHistory are lost on restart. Use persistent backends: RedisChatMessageHistory, SQLChatMessageHistory, or implement a custom BaseChatMessageHistory class that reads from and writes to your database.
#LangChain #Memory #ConversationalAI #VectorStore #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.