Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks

Declarative vs Procedural Memory

Most agent memory systems store facts — what the agent knows. "The user's timezone is PST." "The database uses PostgreSQL." This is declarative memory. But agents also need to remember how to do things. How to deploy a service. How to debug a failing test. How to file a bug report in the team's specific format.

Procedural memory stores sequences of actions that accomplish a task. Once an agent successfully completes a complex procedure, it records the steps so it can replay and refine the procedure next time instead of reasoning from scratch.

Skill Storage

A procedure is a named sequence of steps, each with an action type, parameters, expected outcomes, and timing metadata.

from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Optional
from enum import Enum


class StepStatus(Enum):
    PENDING = "pending"
    SUCCESS = "success"
    FAILED = "failed"
    SKIPPED = "skipped"


@dataclass
class ProcedureStep:
    action: str
    parameters: dict[str, Any]
    expected_outcome: str = ""
    actual_outcome: str = ""
    status: StepStatus = StepStatus.PENDING
    duration_ms: float = 0
    error: str = ""
    notes: str = ""


@dataclass
class Procedure:
    name: str
    description: str
    steps: list[ProcedureStep] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    last_executed: Optional[datetime] = None
    execution_count: int = 0
    success_rate: float = 0.0
    avg_duration_ms: float = 0.0
    tags: list[str] = field(default_factory=list)
    version: int = 1


class ProceduralMemory:
    def __init__(self):
        self.procedures: dict[str, Procedure] = {}
        self.execution_log: list[dict] = []

    def store_procedure(
        self,
        name: str,
        description: str,
        steps: list[dict],
        tags: list[str] | None = None,
    ) -> Procedure:
        proc_steps = [
            ProcedureStep(
                action=s["action"],
                parameters=s.get("parameters", {}),
                expected_outcome=s.get("expected_outcome", ""),
            )
            for s in steps
        ]
        proc = Procedure(
            name=name,
            description=description,
            steps=proc_steps,
            tags=tags or [],
        )
        self.procedures[name] = proc
        return proc

Procedure Recording

The most natural way to build procedural memory is recording. As the agent executes a task, it logs each step automatically. After successful completion, the recorded steps become a stored procedure.

class ProcedureRecorder:
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description
        self.steps: list[ProcedureStep] = []
        self.start_time: datetime | None = None

    def start(self):
        self.start_time = datetime.now()
        self.steps = []

    def record_step(
        self,
        action: str,
        parameters: dict,
        outcome: str = "",
        status: StepStatus = StepStatus.SUCCESS,
        duration_ms: float = 0,
    ):
        step = ProcedureStep(
            action=action,
            parameters=parameters,
            actual_outcome=outcome,
            status=status,
            duration_ms=duration_ms,
        )
        self.steps.append(step)

    def finalize(
        self, memory: ProceduralMemory
    ) -> Procedure | None:
        if not self.steps:
            return None

        successful_steps = [
            ProcedureStep(
                action=s.action,
                parameters=s.parameters,
                expected_outcome=s.actual_outcome,
            )
            for s in self.steps
            if s.status == StepStatus.SUCCESS
        ]

        if not successful_steps:
            return None

        proc = Procedure(
            name=self.name,
            description=self.description,
            steps=successful_steps,
        )
        proc.execution_count = 1
        proc.success_rate = 1.0
        proc.last_executed = datetime.now()
        memory.procedures[self.name] = proc
        return proc

Replay

When the agent encounters a familiar task, it retrieves the stored procedure and replays the steps rather than reasoning from scratch. Each step is executed with the recorded parameters, and outcomes are compared against expectations.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

async def replay_procedure(
    self,
    name: str,
    executor,  # callable that takes (action, params) -> outcome
    adapt_params: dict | None = None,
) -> dict:
    proc = self.procedures.get(name)
    if not proc:
        return {"success": False, "error": "Procedure not found"}

    results = []
    all_success = True
    total_ms = 0

    for i, step in enumerate(proc.steps):
        params = dict(step.parameters)
        if adapt_params:
            params.update(adapt_params.get(step.action, {}))

        start = datetime.now()
        try:
            outcome = await executor(step.action, params)
            duration = (datetime.now() - start).total_seconds() * 1000
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "success",
                "outcome": str(outcome),
                "duration_ms": duration,
            })
            total_ms += duration
        except Exception as e:
            all_success = False
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "failed",
                "error": str(e),
            })

    # Update procedure statistics
    proc.execution_count += 1
    proc.last_executed = datetime.now()
    total_runs = proc.execution_count
    if all_success:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1) + 1.0)
            / total_runs
        )
    else:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1))
            / total_runs
        )
    proc.avg_duration_ms = (
        (proc.avg_duration_ms * (total_runs - 1) + total_ms)
        / total_runs
    )

    return {"success": all_success, "steps": results}

Optimization Over Time

Each execution refines the procedure. Steps that consistently fail can be removed or replaced. Steps that are slow can be flagged for optimization. The agent can also merge similar procedures, keeping the most efficient variant.

def find_similar(
    self, description: str, threshold: int = 2
) -> list[Procedure]:
    """Find procedures with overlapping keywords."""
    query_words = set(description.lower().split())
    results = []
    for proc in self.procedures.values():
        proc_words = set(proc.description.lower().split())
        overlap = len(query_words & proc_words)
        if overlap >= threshold:
            results.append(proc)
    results.sort(key=lambda p: p.success_rate, reverse=True)
    return results


def optimize_procedure(self, name: str) -> Procedure | None:
    proc = self.procedures.get(name)
    if not proc or proc.execution_count < 3:
        return None  # Need enough data to optimize

    # Remove steps that fail more than they succeed
    optimized_steps = []
    for step in proc.steps:
        if step.status != StepStatus.FAILED:
            optimized_steps.append(step)

    proc.steps = optimized_steps
    proc.version += 1
    return proc

Practical Example

memory = ProceduralMemory()

# Record a deployment procedure
recorder = ProcedureRecorder(
    "deploy_backend", "Deploy backend service to production"
)
recorder.start()

recorder.record_step(
    "run_tests", {"suite": "all"}, "All 142 tests passed"
)
recorder.record_step(
    "build_image", {"tag": "v1.2.3"}, "Image built successfully"
)
recorder.record_step(
    "push_image", {"registry": "gcr.io/myproject"}, "Pushed"
)
recorder.record_step(
    "apply_k8s", {"manifest": "deploy.yaml"}, "Rollout started"
)
recorder.record_step(
    "verify_health", {"url": "/health"}, "200 OK"
)

recorder.finalize(memory)

# Next time — replay instead of reasoning from scratch
# result = await memory.replay_procedure("deploy_backend", executor)

FAQ

How does procedural memory differ from a simple script?

A script is static — it runs the same steps every time. Procedural memory is adaptive. The agent can modify parameters based on context, skip steps that are not needed, and improve the procedure based on execution history. It is a living script that learns.

When should an agent create a new procedure vs reuse an existing one?

Use the find_similar method to check for existing procedures before recording a new one. If a similar procedure exists with a high success rate, replay it with adapted parameters. Create a new procedure only when the task is genuinely novel.

Can procedures compose — calling one procedure from within another?

Yes. Treat each procedure as a callable action. A "deploy_full_stack" procedure can include a step whose action is "replay_procedure" with a parameter of "deploy_backend". This creates reusable, composable skill libraries.

#ProceduralMemory #SkillLearning #TaskExecution #Python #AgenticAI #LearnAI #AIEngineering

Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks

Declarative vs Procedural Memory

Skill Storage

Procedure Recording

Replay

Optimization Over Time

Practical Example

FAQ

How does procedural memory differ from a simple script?

When should an agent create a new procedure vs reuse an existing one?

Can procedures compose — calling one procedure from within another?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding