Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks
Build procedural memory systems that let AI agents record, store, replay, and optimize multi-step task procedures, enabling skill learning and execution improvement over time.
Declarative vs Procedural Memory
Most agent memory systems store facts — what the agent knows. "The user's timezone is PST." "The database uses PostgreSQL." This is declarative memory. But agents also need to remember how to do things. How to deploy a service. How to debug a failing test. How to file a bug report in the team's specific format.
Procedural memory stores sequences of actions that accomplish a task. Once an agent successfully completes a complex procedure, it records the steps so it can replay and refine the procedure next time instead of reasoning from scratch.
Skill Storage
A procedure is a named sequence of steps, each with an action type, parameters, expected outcomes, and timing metadata.
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Optional
from enum import Enum
class StepStatus(Enum):
PENDING = "pending"
SUCCESS = "success"
FAILED = "failed"
SKIPPED = "skipped"
@dataclass
class ProcedureStep:
action: str
parameters: dict[str, Any]
expected_outcome: str = ""
actual_outcome: str = ""
status: StepStatus = StepStatus.PENDING
duration_ms: float = 0
error: str = ""
notes: str = ""
@dataclass
class Procedure:
name: str
description: str
steps: list[ProcedureStep] = field(default_factory=list)
created_at: datetime = field(default_factory=datetime.now)
last_executed: Optional[datetime] = None
execution_count: int = 0
success_rate: float = 0.0
avg_duration_ms: float = 0.0
tags: list[str] = field(default_factory=list)
version: int = 1
class ProceduralMemory:
def __init__(self):
self.procedures: dict[str, Procedure] = {}
self.execution_log: list[dict] = []
def store_procedure(
self,
name: str,
description: str,
steps: list[dict],
tags: list[str] | None = None,
) -> Procedure:
proc_steps = [
ProcedureStep(
action=s["action"],
parameters=s.get("parameters", {}),
expected_outcome=s.get("expected_outcome", ""),
)
for s in steps
]
proc = Procedure(
name=name,
description=description,
steps=proc_steps,
tags=tags or [],
)
self.procedures[name] = proc
return proc
Procedure Recording
The most natural way to build procedural memory is recording. As the agent executes a task, it logs each step automatically. After successful completion, the recorded steps become a stored procedure.
class ProcedureRecorder:
def __init__(self, name: str, description: str):
self.name = name
self.description = description
self.steps: list[ProcedureStep] = []
self.start_time: datetime | None = None
def start(self):
self.start_time = datetime.now()
self.steps = []
def record_step(
self,
action: str,
parameters: dict,
outcome: str = "",
status: StepStatus = StepStatus.SUCCESS,
duration_ms: float = 0,
):
step = ProcedureStep(
action=action,
parameters=parameters,
actual_outcome=outcome,
status=status,
duration_ms=duration_ms,
)
self.steps.append(step)
def finalize(
self, memory: ProceduralMemory
) -> Procedure | None:
if not self.steps:
return None
successful_steps = [
ProcedureStep(
action=s.action,
parameters=s.parameters,
expected_outcome=s.actual_outcome,
)
for s in self.steps
if s.status == StepStatus.SUCCESS
]
if not successful_steps:
return None
proc = Procedure(
name=self.name,
description=self.description,
steps=successful_steps,
)
proc.execution_count = 1
proc.success_rate = 1.0
proc.last_executed = datetime.now()
memory.procedures[self.name] = proc
return proc
Replay
When the agent encounters a familiar task, it retrieves the stored procedure and replays the steps rather than reasoning from scratch. Each step is executed with the recorded parameters, and outcomes are compared against expectations.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
async def replay_procedure(
self,
name: str,
executor, # callable that takes (action, params) -> outcome
adapt_params: dict | None = None,
) -> dict:
proc = self.procedures.get(name)
if not proc:
return {"success": False, "error": "Procedure not found"}
results = []
all_success = True
total_ms = 0
for i, step in enumerate(proc.steps):
params = dict(step.parameters)
if adapt_params:
params.update(adapt_params.get(step.action, {}))
start = datetime.now()
try:
outcome = await executor(step.action, params)
duration = (datetime.now() - start).total_seconds() * 1000
results.append({
"step": i + 1,
"action": step.action,
"status": "success",
"outcome": str(outcome),
"duration_ms": duration,
})
total_ms += duration
except Exception as e:
all_success = False
results.append({
"step": i + 1,
"action": step.action,
"status": "failed",
"error": str(e),
})
# Update procedure statistics
proc.execution_count += 1
proc.last_executed = datetime.now()
total_runs = proc.execution_count
if all_success:
proc.success_rate = (
(proc.success_rate * (total_runs - 1) + 1.0)
/ total_runs
)
else:
proc.success_rate = (
(proc.success_rate * (total_runs - 1))
/ total_runs
)
proc.avg_duration_ms = (
(proc.avg_duration_ms * (total_runs - 1) + total_ms)
/ total_runs
)
return {"success": all_success, "steps": results}
Optimization Over Time
Each execution refines the procedure. Steps that consistently fail can be removed or replaced. Steps that are slow can be flagged for optimization. The agent can also merge similar procedures, keeping the most efficient variant.
def find_similar(
self, description: str, threshold: int = 2
) -> list[Procedure]:
"""Find procedures with overlapping keywords."""
query_words = set(description.lower().split())
results = []
for proc in self.procedures.values():
proc_words = set(proc.description.lower().split())
overlap = len(query_words & proc_words)
if overlap >= threshold:
results.append(proc)
results.sort(key=lambda p: p.success_rate, reverse=True)
return results
def optimize_procedure(self, name: str) -> Procedure | None:
proc = self.procedures.get(name)
if not proc or proc.execution_count < 3:
return None # Need enough data to optimize
# Remove steps that fail more than they succeed
optimized_steps = []
for step in proc.steps:
if step.status != StepStatus.FAILED:
optimized_steps.append(step)
proc.steps = optimized_steps
proc.version += 1
return proc
Practical Example
memory = ProceduralMemory()
# Record a deployment procedure
recorder = ProcedureRecorder(
"deploy_backend", "Deploy backend service to production"
)
recorder.start()
recorder.record_step(
"run_tests", {"suite": "all"}, "All 142 tests passed"
)
recorder.record_step(
"build_image", {"tag": "v1.2.3"}, "Image built successfully"
)
recorder.record_step(
"push_image", {"registry": "gcr.io/myproject"}, "Pushed"
)
recorder.record_step(
"apply_k8s", {"manifest": "deploy.yaml"}, "Rollout started"
)
recorder.record_step(
"verify_health", {"url": "/health"}, "200 OK"
)
recorder.finalize(memory)
# Next time — replay instead of reasoning from scratch
# result = await memory.replay_procedure("deploy_backend", executor)
FAQ
How does procedural memory differ from a simple script?
A script is static — it runs the same steps every time. Procedural memory is adaptive. The agent can modify parameters based on context, skip steps that are not needed, and improve the procedure based on execution history. It is a living script that learns.
When should an agent create a new procedure vs reuse an existing one?
Use the find_similar method to check for existing procedures before recording a new one. If a similar procedure exists with a high success rate, replay it with adapted parameters. Create a new procedure only when the task is genuinely novel.
Can procedures compose — calling one procedure from within another?
Yes. Treat each procedure as a callable action. A "deploy_full_stack" procedure can include a step whose action is "replay_procedure" with a parameter of "deploy_backend". This creates reusable, composable skill libraries.
#ProceduralMemory #SkillLearning #TaskExecution #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.