Memory Consolidation: Compressing and Summarizing Agent Memories Over Time
Build a memory consolidation pipeline that compresses detailed agent memories into summaries, preserving essential information while reducing storage and improving retrieval quality.
Why Raw Memories Do Not Scale
An agent that records every interaction verbatim will accumulate thousands of memory items within days. Searching through raw conversation turns is slow, expensive, and produces noisy results. The agent ends up retrieving five slightly different wordings of the same fact instead of one clean summary.
Memory consolidation solves this by periodically compressing groups of related memories into concise summaries. The detailed records are archived or deleted, and the summary takes their place. This mirrors how human memory works during sleep — the brain replays experiences and encodes the essential patterns while discarding surface details.
Consolidation Triggers
Consolidation should not run after every interaction. It needs a trigger. Common triggers include:
Count-based — consolidate after every N new memories are added to a category.
Time-based — consolidate all memories older than a threshold (e.g., 24 hours).
Size-based — consolidate when the memory store exceeds a storage budget.
from datetime import datetime, timedelta
from dataclasses import dataclass, field
@dataclass
class MemoryItem:
content: str
created_at: datetime
category: str = "general"
consolidated: bool = False
metadata: dict = field(default_factory=dict)
class ConsolidationTrigger:
def __init__(
self,
count_threshold: int = 20,
age_threshold_hours: int = 24,
size_threshold: int = 100,
):
self.count_threshold = count_threshold
self.age_threshold = timedelta(hours=age_threshold_hours)
self.size_threshold = size_threshold
def should_consolidate(
self, memories: list[MemoryItem]
) -> bool:
unconsolidated = [
m for m in memories if not m.consolidated
]
if len(unconsolidated) >= self.count_threshold:
return True
if len(memories) >= self.size_threshold:
return True
now = datetime.now()
old_items = [
m for m in unconsolidated
if (now - m.created_at) > self.age_threshold
]
if len(old_items) >= 5:
return True
return False
Summary Generation
The consolidation engine groups related memories and generates a summary using an LLM. The prompt instructs the model to extract key facts, decisions, and preferences while discarding filler.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from openai import AsyncOpenAI
async def consolidate_memories(
memories: list[MemoryItem],
client: AsyncOpenAI,
) -> str:
combined_text = "\n".join(
f"- [{m.created_at.isoformat()}] {m.content}"
for m in memories
)
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": (
"You are a memory consolidation engine. "
"Compress the following memory items into a "
"concise summary that preserves all key facts, "
"user preferences, decisions, and action items. "
"Remove redundancy and filler. Output only the "
"summary, no preamble."
),
},
{
"role": "user",
"content": combined_text,
},
],
temperature=0.1,
)
return response.choices[0].message.content
Detail Preservation
Not every detail should be compressed away. Some memories contain exact values that summaries tend to round or generalize — API keys, specific dates, numerical thresholds. A detail preservation step extracts and stores these separately.
import re
def extract_preservable_details(
memories: list[MemoryItem],
) -> list[dict]:
details = []
patterns = {
"date": r"\d{4}-\d{2}-\d{2}",
"number": r"\b\d+\.?\d*\b",
"email": r"[\w.-]+@[\w.-]+",
"url": r"https?://[^\s]+",
}
for mem in memories:
for detail_type, pattern in patterns.items():
matches = re.findall(pattern, mem.content)
for match in matches:
details.append({
"type": detail_type,
"value": match,
"source": mem.content[:80],
})
return details
The Full Consolidation Pipeline
Putting it together, the pipeline groups memories by category, generates summaries, preserves critical details, and replaces the originals.
class MemoryConsolidator:
def __init__(self, client: AsyncOpenAI):
self.client = client
self.trigger = ConsolidationTrigger()
async def run(
self, store: list[MemoryItem]
) -> list[MemoryItem]:
if not self.trigger.should_consolidate(store):
return store
# Group by category
groups: dict[str, list[MemoryItem]] = {}
fresh: list[MemoryItem] = []
for mem in store:
if mem.consolidated:
fresh.append(mem)
continue
groups.setdefault(mem.category, []).append(mem)
# Consolidate each group
for category, items in groups.items():
if len(items) < 3:
fresh.extend(items)
continue
summary = await consolidate_memories(
items, self.client
)
details = extract_preservable_details(items)
consolidated = MemoryItem(
content=summary,
created_at=datetime.now(),
category=category,
consolidated=True,
metadata={
"source_count": len(items),
"preserved_details": details,
},
)
fresh.append(consolidated)
return fresh
Storage Optimization
After consolidation, the raw memories can be archived to cold storage (a separate database table or file) rather than deleted entirely. This gives you an audit trail while keeping the active memory store lean.
A typical consolidation cycle reduces memory count by 60 to 80 percent. Running it daily keeps the active store small enough for fast retrieval while preserving all the information that matters.
FAQ
Does summarization lose important nuance?
It can if the prompt is not carefully written. The detail preservation step catches structured data like dates and numbers. For subjective nuance, instruct the LLM to preserve sentiment and reasoning, not just facts. Test by comparing agent behavior before and after consolidation.
How often should consolidation run?
For active agents, once per day or once per 50 new memories is a good starting point. Agents with bursty usage patterns benefit from count-based triggers so consolidation runs after intense sessions rather than during quiet periods.
Can I consolidate already-consolidated memories?
Yes. This is called multi-level consolidation. Daily summaries can be consolidated into weekly summaries, and weekly summaries into monthly summaries. Each level compresses further, creating a pyramid of increasingly abstract knowledge.
#MemoryConsolidation #Summarization #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.