UFO Memory and Learning: How the Agent Remembers Successful Task Patterns
Learn how Microsoft UFO's experience learning system stores successful task executions, retrieves relevant past patterns for new tasks, and optimizes performance through memory-based action prediction.
Why Agent Memory Matters
Without memory, every UFO task starts from scratch. The agent has no recollection of successfully completing the same task yesterday or of discovering that a particular sequence of clicks is the fastest way to apply a filter in Excel. Every execution involves the same number of LLM calls, the same trial-and-error, and the same cost.
UFO addresses this with an experience learning system that records successful task executions and retrieves relevant experiences when handling new tasks. This is functionally a Retrieval-Augmented Generation (RAG) system applied to UI automation memory.
How Experience Learning Works
UFO's memory system operates in three phases: record, index, and retrieve.
Phase 1: Recording Experiences
After a task completes successfully, UFO serializes the entire execution trace — every observation, action, and outcome — into a structured experience record:
from dataclasses import dataclass, field
from datetime import datetime
import json
@dataclass
class TaskExperience:
"""A recorded successful task execution."""
task_id: str
task_description: str
application: str
steps: list[dict]
total_steps: int
start_time: datetime
end_time: datetime
success: bool
metadata: dict = field(default_factory=dict)
def to_dict(self) -> dict:
return {
"task_id": self.task_id,
"task_description": self.task_description,
"application": self.application,
"steps": self.steps,
"total_steps": self.total_steps,
"duration_seconds": (self.end_time - self.start_time).total_seconds(),
"success": self.success,
"metadata": self.metadata,
}
def record_experience(task: str, execution_trace: list[dict]) -> TaskExperience:
"""Record a successful task execution for future reference."""
experience = TaskExperience(
task_id=generate_uuid(),
task_description=task,
application=execution_trace[0].get("application", "Unknown"),
steps=[
{
"step_number": step["step"],
"observation": step["thought"],
"action_type": step["action_type"],
"target_control": step.get("control_text", ""),
"parameters": step.get("parameters", {}),
"result": step.get("result", "success"),
}
for step in execution_trace
],
total_steps=len(execution_trace),
start_time=execution_trace[0]["timestamp"],
end_time=execution_trace[-1]["timestamp"],
success=True,
)
# Save to disk
save_path = f"experience_db/{experience.task_id}.json"
with open(save_path, "w") as f:
json.dump(experience.to_dict(), f, indent=2, default=str)
return experience
Phase 2: Indexing With Embeddings
Stored experiences are indexed using text embeddings so they can be retrieved by semantic similarity:
from openai import OpenAI
import numpy as np
client = OpenAI()
def create_experience_index(experiences_dir: str) -> dict:
"""Build a vector index of task experiences."""
index = {"embeddings": [], "task_ids": [], "descriptions": []}
for exp_file in Path(experiences_dir).glob("*.json"):
with open(exp_file) as f:
exp = json.load(f)
# Create embedding from task description + key actions
summary = build_experience_summary(exp)
response = client.embeddings.create(
model="text-embedding-3-small",
input=summary
)
index["embeddings"].append(response.data[0].embedding)
index["task_ids"].append(exp["task_id"])
index["descriptions"].append(summary)
# Convert to numpy for efficient similarity search
index["embeddings"] = np.array(index["embeddings"])
return index
def build_experience_summary(experience: dict) -> str:
"""Create a searchable summary of an experience."""
steps_summary = " -> ".join(
f"{s['action_type']}({s['target_control']})"
for s in experience["steps"][:10]
)
return (
f"Task: {experience['task_description']} "
f"App: {experience['application']} "
f"Steps: {steps_summary}"
)
Phase 3: Retrieving Relevant Experiences
When a new task arrives, UFO searches the index for similar past experiences:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
def retrieve_relevant_experiences(
new_task: str,
index: dict,
top_k: int = 3,
similarity_threshold: float = 0.75,
) -> list[dict]:
"""Find past experiences relevant to the new task."""
# Embed the new task
response = client.embeddings.create(
model="text-embedding-3-small",
input=new_task
)
query_embedding = np.array(response.data[0].embedding)
# Cosine similarity search
similarities = np.dot(index["embeddings"], query_embedding) / (
np.linalg.norm(index["embeddings"], axis=1)
* np.linalg.norm(query_embedding)
)
# Filter by threshold and get top-k
candidates = [
(i, sim) for i, sim in enumerate(similarities)
if sim >= similarity_threshold
]
candidates.sort(key=lambda x: x[1], reverse=True)
top_candidates = candidates[:top_k]
# Load full experience records
results = []
for idx, score in top_candidates:
task_id = index["task_ids"][idx]
with open(f"experience_db/{task_id}.json") as f:
exp = json.load(f)
exp["similarity_score"] = float(score)
results.append(exp)
return results
Injecting Memory Into the Prompt
Retrieved experiences are included in the GPT-4V prompt as few-shot examples, giving the model a proven action sequence to follow:
def build_prompt_with_memory(
task: str,
screenshot: str,
controls: list[dict],
relevant_experiences: list[dict],
) -> str:
"""Build the action prompt enriched with past experiences."""
experience_text = ""
if relevant_experiences:
experience_text = "\n\nRelevant past experiences:\n"
for exp in relevant_experiences:
experience_text += f"\nTask: {exp['task_description']}\n"
experience_text += f"Similarity: {exp['similarity_score']:.2f}\n"
experience_text += "Successful steps:\n"
for step in exp["steps"]:
experience_text += (
f" {step['step_number']}. {step['action_type']}"
f"({step['target_control']}) - {step['observation']}\n"
)
return f"""Task: {task}
{experience_text}
Based on the annotated screenshot and any relevant past experience,
select the next action. Past experiences are suggestions — adapt them
to the current UI state if controls have changed."""
Performance Impact
Memory reduces both cost and execution time:
- Fewer exploratory actions — the agent follows proven paths instead of experimenting
- Lower token usage — successful patterns provide shorter reasoning chains
- Better first-attempt accuracy — relevant examples guide the model toward correct actions
# Measuring memory impact
def compare_with_without_memory(task: str):
"""Run the same task with and without memory retrieval."""
# Without memory
result_no_mem = run_ufo_task(task, use_memory=False)
# With memory
result_with_mem = run_ufo_task(task, use_memory=True)
print(f"Without memory: {result_no_mem['steps']} steps, "
f"${result_no_mem['cost']:.3f}")
print(f"With memory: {result_with_mem['steps']} steps, "
f"${result_with_mem['cost']:.3f}")
print(f"Step reduction: "
f"{(1 - result_with_mem['steps']/result_no_mem['steps'])*100:.0f}%")
In practice, memory-augmented execution typically reduces step count by 20-40% for tasks similar to previously recorded experiences.
FAQ
How much storage does the experience database require?
Each experience record is a JSON file of 5-50 KB depending on task complexity. The embeddings index adds roughly 6 KB per experience (1536-dimensional float32 vector). A database of 1,000 experiences takes approximately 50-60 MB total — negligible on modern systems.
Does UFO learn from failed tasks?
By default, UFO only records successful completions. However, you can configure it to also record failures and use them as negative examples in the prompt — telling the model "this approach was tried and failed" to steer it toward alternative strategies.
Can experiences transfer between machines with different screen resolutions?
Experiences are stored as abstract action sequences (click control type X, type text Y) rather than pixel coordinates, so they transfer well between machines. The vision model adapts to different layouts and resolutions when following experience-suggested action sequences.
#AgentMemory #ExperienceLearning #RAG #TaskPatterns #MicrosoftUFO #PerformanceOptimization #AIMemory #VectorSearch
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.