AI Agent Memory Systems: Short-Term, Long-Term, and Episodic Memory
Explore the three types of memory that AI agents use — short-term, long-term, and episodic — with practical Python implementations and guidance on when to use each type.
Why Agents Need Memory
An LLM without memory is like a person with amnesia — brilliant in the moment but unable to learn from past interactions. Every API call starts from scratch. The model does not remember what tools it called, what the user said yesterday, or what mistakes it made last time.
Memory gives agents continuity. It allows them to reference previous conversations, avoid repeating errors, accumulate knowledge over time, and provide personalized responses. Without memory, every agent interaction is isolated and context-free.
The Three Types of Agent Memory
Agent memory systems draw from cognitive science, mapping roughly to how human memory works.
1. Short-Term Memory (Working Memory)
Short-term memory is the conversation history — the messages array that gets sent to the LLM on every call. It holds the current task context: what the user asked, what tools were called, and what results came back.
class ShortTermMemory:
"""Manages the conversation context window."""
def __init__(self, max_messages: int = 50):
self.messages: list[dict] = []
self.max_messages = max_messages
def add(self, message: dict):
self.messages.append(message)
# Evict oldest messages if over limit (keep system prompt)
if len(self.messages) > self.max_messages:
system = [m for m in self.messages if m["role"] == "system"]
others = [m for m in self.messages if m["role"] != "system"]
# Keep system messages + most recent others
self.messages = system + others[-(self.max_messages - len(system)):]
def get_context(self) -> list[dict]:
return self.messages.copy()
When to use: Always. Every agent has short-term memory — it is the conversation itself. The challenge is managing its size as conversations grow, since LLMs have finite context windows.
2. Long-Term Memory (Semantic Memory)
Long-term memory persists across conversations. It stores facts, user preferences, learned procedures, and domain knowledge that the agent can retrieve when relevant. Typically implemented with a vector database.
from openai import OpenAI
client = OpenAI()
class LongTermMemory:
"""Vector-based persistent memory store."""
def __init__(self):
self.memories: list[dict] = [] # In production, use a vector DB
def store(self, content: str, metadata: dict = None):
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=content,
).data[0].embedding
self.memories.append({
"content": content,
"embedding": embedding,
"metadata": metadata or {},
})
def recall(self, query: str, top_k: int = 5) -> list[str]:
query_embedding = client.embeddings.create(
model="text-embedding-3-small",
input=query,
).data[0].embedding
# Cosine similarity search
scored = []
for mem in self.memories:
score = self._cosine_similarity(query_embedding, mem["embedding"])
scored.append((score, mem["content"]))
scored.sort(reverse=True, key=lambda x: x[0])
return [content for _, content in scored[:top_k]]
@staticmethod
def _cosine_similarity(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x ** 2 for x in a) ** 0.5
norm_b = sum(x ** 2 for x in b) ** 0.5
return dot / (norm_a * norm_b) if norm_a and norm_b else 0.0
When to use: When your agent interacts with the same user across multiple sessions, needs to remember facts from previous conversations, or must access a large knowledge base that does not fit in the context window.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
3. Episodic Memory (Experience Memory)
Episodic memory stores complete past experiences — entire task executions with their inputs, steps taken, outcomes, and what went right or wrong. This lets agents learn from their own history.
from datetime import datetime
from dataclasses import dataclass, field
@dataclass
class Episode:
task: str
steps: list[dict]
outcome: str # "success", "failure", "partial"
lessons: list[str]
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
class EpisodicMemory:
"""Stores and retrieves complete task episodes."""
def __init__(self):
self.episodes: list[Episode] = []
def record(self, episode: Episode):
self.episodes.append(episode)
def recall_similar(self, task: str, max_episodes: int = 3) -> list[Episode]:
"""Find episodes with similar tasks. In production, use embeddings."""
# Simple keyword matching — replace with vector search
scored = []
task_words = set(task.lower().split())
for ep in self.episodes:
ep_words = set(ep.task.lower().split())
overlap = len(task_words & ep_words) / max(len(task_words), 1)
scored.append((overlap, ep))
scored.sort(reverse=True, key=lambda x: x[0])
return [ep for _, ep in scored[:max_episodes]]
def get_lessons_for_task(self, task: str) -> list[str]:
"""Extract lessons learned from similar past tasks."""
similar = self.recall_similar(task)
lessons = []
for ep in similar:
lessons.extend(ep.lessons)
return lessons
When to use: When your agent performs recurring tasks and you want it to improve over time. Episodic memory is particularly valuable for agents that handle operations tasks (deployments, incident response) where learning from past incidents directly improves future performance.
Combining All Three Memory Types
In practice, a well-designed agent uses all three types together:
def build_agent_context(
user_input: str,
short_term: ShortTermMemory,
long_term: LongTermMemory,
episodic: EpisodicMemory,
) -> list[dict]:
# Start with short-term (conversation history)
messages = short_term.get_context()
# Inject relevant long-term memories
relevant_facts = long_term.recall(user_input, top_k=3)
if relevant_facts:
memory_text = "Relevant information from previous interactions:\n"
memory_text += "\n".join(f"- {fact}" for fact in relevant_facts)
messages.insert(1, {"role": "system", "content": memory_text})
# Inject lessons from similar past tasks
lessons = episodic.get_lessons_for_task(user_input)
if lessons:
lesson_text = "Lessons from similar past tasks:\n"
lesson_text += "\n".join(f"- {lesson}" for lesson in lessons)
messages.insert(1, {"role": "system", "content": lesson_text})
return messages
This function builds the complete context for each LLM call by layering memories: the current conversation (short-term), relevant facts (long-term), and past experience (episodic). The LLM receives all of this as context and can draw on any memory type during reasoning.
FAQ
How do I decide which memory type to implement first?
Start with short-term memory — you already have it (the messages array). Add long-term memory next if your agent serves repeat users or needs access to a knowledge base. Add episodic memory last, as it requires tracking complete task executions and extracting lessons, which adds significant complexity.
Will memory make my agent slower?
Long-term memory recall adds latency (typically 50-200ms for a vector database query). However, the accuracy gains far outweigh the latency cost. You can mitigate this by running memory retrieval in parallel with other operations, caching frequent queries, and limiting the number of memories injected into context.
How do I prevent memory from growing indefinitely?
Implement memory eviction policies. For short-term memory, use a sliding window or summarize older messages. For long-term memory, set a maximum size and evict based on recency, relevance score, or access frequency. For episodic memory, keep only episodes from the last N days or the top-K most relevant episodes per task category.
#AIMemory #AIAgents #RAG #VectorDatabase #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.