Episodic Memory in AI Agents: Learning from Past Interactions and Outcomes

What Is Episodic Memory?

While semantic memory stores facts and knowledge, episodic memory records complete experiences — the full story of what happened, what actions were taken, and what the outcome was. In human cognition, episodic memory is what lets you recall not just that Paris is the capital of France, but the specific trip you took there and what you learned along the way.

For AI agents, episodic memory means storing entire interaction episodes — including the task, the sequence of actions, tool calls, intermediate results, and the final outcome. When the agent encounters a similar situation later, it can retrieve relevant past episodes and use them to make better decisions.

Defining an Episode

An episode captures the full arc of a task execution: the initial request, every step the agent took, and the result.

from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from enum import Enum

class Outcome(Enum):
    SUCCESS = "success"
    FAILURE = "failure"
    PARTIAL = "partial"

@dataclass
class ActionStep:
    action: str          # "tool_call", "llm_response", "user_input"
    detail: str          # what specifically happened
    timestamp: datetime = field(default_factory=datetime.utcnow)
    result: Optional[str] = None
    error: Optional[str] = None

@dataclass
class Episode:
    task_description: str
    steps: List[ActionStep] = field(default_factory=list)
    outcome: Outcome = Outcome.PARTIAL
    outcome_detail: str = ""
    lessons_learned: str = ""
    embedding: Optional[List[float]] = None
    created_at: datetime = field(default_factory=datetime.utcnow)
    tags: List[str] = field(default_factory=list)

    def add_step(self, action: str, detail: str, result: str = None, error: str = None):
        self.steps.append(ActionStep(
            action=action, detail=detail, result=result, error=error
        ))

    def complete(self, outcome: Outcome, detail: str = "", lessons: str = ""):
        self.outcome = outcome
        self.outcome_detail = detail
        self.lessons_learned = lessons

Building an Episodic Memory Store

The store manages episodes with both structured queries (by outcome, tags) and semantic search (by task similarity).

import json
from pathlib import Path

class EpisodicMemoryStore:
    def __init__(self, storage_path: str = "episodes.json"):
        self.storage_path = Path(storage_path)
        self.episodes: List[Episode] = []
        self._load()

    def record(self, episode: Episode):
        """Store a completed episode with its embedding."""
        # Create embedding from task description + outcome for retrieval
        summary = (
            f"Task: {episode.task_description}. "
            f"Outcome: {episode.outcome.value}. "
            f"Lessons: {episode.lessons_learned}"
        )
        episode.embedding = embed_text(summary)
        self.episodes.append(episode)
        self._save()

    def find_similar(
        self,
        task_description: str,
        top_k: int = 3,
        outcome_filter: Optional[Outcome] = None,
    ) -> List[Episode]:
        """Find past episodes similar to a given task."""
        query_embedding = embed_text(task_description)
        scored = []

        for ep in self.episodes:
            if outcome_filter and ep.outcome != outcome_filter:
                continue
            if ep.embedding is None:
                continue
            sim = cosine_similarity(query_embedding, ep.embedding)
            scored.append((ep, sim))

        scored.sort(key=lambda x: x[1], reverse=True)
        return [ep for ep, _ in scored[:top_k]]

    def get_success_patterns(self, task_description: str) -> List[Episode]:
        """Retrieve only successful episodes similar to the current task."""
        return self.find_similar(
            task_description, top_k=3, outcome_filter=Outcome.SUCCESS
        )

    def get_failure_warnings(self, task_description: str) -> List[Episode]:
        """Retrieve failed episodes to learn what to avoid."""
        return self.find_similar(
            task_description, top_k=2, outcome_filter=Outcome.FAILURE
        )

    def _save(self):
        data = []
        for ep in self.episodes:
            data.append({
                "task_description": ep.task_description,
                "steps": [
                    {"action": s.action, "detail": s.detail,
                     "result": s.result, "error": s.error}
                    for s in ep.steps
                ],
                "outcome": ep.outcome.value,
                "outcome_detail": ep.outcome_detail,
                "lessons_learned": ep.lessons_learned,
                "tags": ep.tags,
                "created_at": ep.created_at.isoformat(),
            })
        self.storage_path.write_text(json.dumps(data, indent=2))

    def _load(self):
        if not self.storage_path.exists():
            return
        data = json.loads(self.storage_path.read_text())
        for item in data:
            ep = Episode(
                task_description=item["task_description"],
                outcome=Outcome(item["outcome"]),
                outcome_detail=item.get("outcome_detail", ""),
                lessons_learned=item.get("lessons_learned", ""),
                tags=item.get("tags", []),
            )
            for step_data in item.get("steps", []):
                ep.add_step(**step_data)
            # Re-embed on load
            summary = f"Task: {ep.task_description}. Outcome: {ep.outcome.value}."
            ep.embedding = embed_text(summary)
            self.episodes.append(ep)

Integrating Episodic Memory into Agent Loops

The real power comes from using past episodes to inform current decisions. Before the agent acts, it retrieves similar past experiences and includes them in its prompt.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

async def run_agent_with_memory(
    task: str,
    agent_llm,
    episodic_store: EpisodicMemoryStore,
) -> Episode:
    current_episode = Episode(task_description=task)

    # Retrieve relevant past experiences
    successes = episodic_store.get_success_patterns(task)
    failures = episodic_store.get_failure_warnings(task)

    context_parts = []
    if successes:
        context_parts.append("Relevant successful approaches from past tasks:")
        for ep in successes:
            context_parts.append(
                f"- Task: {ep.task_description} -> {ep.lessons_learned}"
            )
    if failures:
        context_parts.append("Past failures to avoid:")
        for ep in failures:
            context_parts.append(
                f"- Task: {ep.task_description} -> {ep.lessons_learned}"
            )

    memory_context = "\n".join(context_parts) if context_parts else ""

    # Build prompt with episodic context
    prompt = f"""Task: {task}

{memory_context}

Based on the task and any relevant past experience above, plan and execute."""

    # Execute the agent loop (simplified)
    result = await agent_llm.run(prompt)
    current_episode.add_step("llm_response", result.output)
    current_episode.complete(
        outcome=Outcome.SUCCESS,
        detail=result.output[:200],
        lessons=f"Approach that worked for '{task[:50]}': {result.output[:100]}",
    )

    # Store the episode for future reference
    episodic_store.record(current_episode)
    return current_episode

The Learning Loop

Episodic memory creates a natural learning loop: the agent tries something, records the outcome, and uses that record to improve future attempts. Over dozens or hundreds of episodes, the agent accumulates practical wisdom about what works and what fails in its specific domain.

Key principles for effective episodic learning:

Record both successes and failures — failures are often more informative
Extract explicit lessons — do not just store what happened; store what was learned
Keep episodes searchable — embed the task description plus outcome for accurate retrieval
Prune old episodes — remove outdated episodes when the environment or tools change

FAQ

How is episodic memory different from few-shot prompting?

Few-shot prompting uses fixed, hand-crafted examples. Episodic memory dynamically retrieves the most relevant past experiences for each new task. The examples evolve as the agent gains more experience, making it adaptive rather than static.

How many past episodes should I inject into the prompt?

Start with 2-3 successful examples and 1-2 failure warnings. More than 5 total episodes risks consuming too much of the context window. Prioritize the most similar and most recent episodes.

Can episodic memory replace fine-tuning?

For many use cases, yes. Episodic memory achieves similar personalization without the cost and complexity of fine-tuning. However, fine-tuning changes the model's weights permanently, while episodic memory requires retrieval at runtime. For very frequent patterns, fine-tuning may be more efficient.

#EpisodicMemory #AgentLearning #FeedbackLoops #AIMemory #AgenticAI #LearnAI #AIEngineering

Episodic Memory in AI Agents: Learning from Past Interactions and Outcomes

What Is Episodic Memory?

Defining an Episode

Building an Episodic Memory Store

Integrating Episodic Memory into Agent Loops

The Learning Loop

FAQ

How is episodic memory different from few-shot prompting?

How many past episodes should I inject into the prompt?

Can episodic memory replace fine-tuning?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding