Memory Consolidation: Compressing and Summarizing Agent Memories Over Time

Why Raw Memories Do Not Scale

An agent that records every interaction verbatim will accumulate thousands of memory items within days. Searching through raw conversation turns is slow, expensive, and produces noisy results. The agent ends up retrieving five slightly different wordings of the same fact instead of one clean summary.

Memory consolidation solves this by periodically compressing groups of related memories into concise summaries. The detailed records are archived or deleted, and the summary takes their place. This mirrors how human memory works during sleep — the brain replays experiences and encodes the essential patterns while discarding surface details.

Consolidation Triggers

Consolidation should not run after every interaction. It needs a trigger. Common triggers include:

Count-based — consolidate after every N new memories are added to a category.

Time-based — consolidate all memories older than a threshold (e.g., 24 hours).

Size-based — consolidate when the memory store exceeds a storage budget.

from datetime import datetime, timedelta
from dataclasses import dataclass, field


@dataclass
class MemoryItem:
    content: str
    created_at: datetime
    category: str = "general"
    consolidated: bool = False
    metadata: dict = field(default_factory=dict)


class ConsolidationTrigger:
    def __init__(
        self,
        count_threshold: int = 20,
        age_threshold_hours: int = 24,
        size_threshold: int = 100,
    ):
        self.count_threshold = count_threshold
        self.age_threshold = timedelta(hours=age_threshold_hours)
        self.size_threshold = size_threshold

    def should_consolidate(
        self, memories: list[MemoryItem]
    ) -> bool:
        unconsolidated = [
            m for m in memories if not m.consolidated
        ]
        if len(unconsolidated) >= self.count_threshold:
            return True
        if len(memories) >= self.size_threshold:
            return True
        now = datetime.now()
        old_items = [
            m for m in unconsolidated
            if (now - m.created_at) > self.age_threshold
        ]
        if len(old_items) >= 5:
            return True
        return False

Summary Generation

The consolidation engine groups related memories and generates a summary using an LLM. The prompt instructs the model to extract key facts, decisions, and preferences while discarding filler.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from openai import AsyncOpenAI


async def consolidate_memories(
    memories: list[MemoryItem],
    client: AsyncOpenAI,
) -> str:
    combined_text = "\n".join(
        f"- [{m.created_at.isoformat()}] {m.content}"
        for m in memories
    )
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a memory consolidation engine. "
                    "Compress the following memory items into a "
                    "concise summary that preserves all key facts, "
                    "user preferences, decisions, and action items. "
                    "Remove redundancy and filler. Output only the "
                    "summary, no preamble."
                ),
            },
            {
                "role": "user",
                "content": combined_text,
            },
        ],
        temperature=0.1,
    )
    return response.choices[0].message.content

Detail Preservation

Not every detail should be compressed away. Some memories contain exact values that summaries tend to round or generalize — API keys, specific dates, numerical thresholds. A detail preservation step extracts and stores these separately.

import re


def extract_preservable_details(
    memories: list[MemoryItem],
) -> list[dict]:
    details = []
    patterns = {
        "date": r"\d{4}-\d{2}-\d{2}",
        "number": r"\b\d+\.?\d*\b",
        "email": r"[\w.-]+@[\w.-]+",
        "url": r"https?://[^\s]+",
    }
    for mem in memories:
        for detail_type, pattern in patterns.items():
            matches = re.findall(pattern, mem.content)
            for match in matches:
                details.append({
                    "type": detail_type,
                    "value": match,
                    "source": mem.content[:80],
                })
    return details

The Full Consolidation Pipeline

Putting it together, the pipeline groups memories by category, generates summaries, preserves critical details, and replaces the originals.

class MemoryConsolidator:
    def __init__(self, client: AsyncOpenAI):
        self.client = client
        self.trigger = ConsolidationTrigger()

    async def run(
        self, store: list[MemoryItem]
    ) -> list[MemoryItem]:
        if not self.trigger.should_consolidate(store):
            return store

        # Group by category
        groups: dict[str, list[MemoryItem]] = {}
        fresh: list[MemoryItem] = []
        for mem in store:
            if mem.consolidated:
                fresh.append(mem)
                continue
            groups.setdefault(mem.category, []).append(mem)

        # Consolidate each group
        for category, items in groups.items():
            if len(items) < 3:
                fresh.extend(items)
                continue
            summary = await consolidate_memories(
                items, self.client
            )
            details = extract_preservable_details(items)
            consolidated = MemoryItem(
                content=summary,
                created_at=datetime.now(),
                category=category,
                consolidated=True,
                metadata={
                    "source_count": len(items),
                    "preserved_details": details,
                },
            )
            fresh.append(consolidated)

        return fresh

Storage Optimization

After consolidation, the raw memories can be archived to cold storage (a separate database table or file) rather than deleted entirely. This gives you an audit trail while keeping the active memory store lean.

A typical consolidation cycle reduces memory count by 60 to 80 percent. Running it daily keeps the active store small enough for fast retrieval while preserving all the information that matters.

FAQ

Does summarization lose important nuance?

It can if the prompt is not carefully written. The detail preservation step catches structured data like dates and numbers. For subjective nuance, instruct the LLM to preserve sentiment and reasoning, not just facts. Test by comparing agent behavior before and after consolidation.

How often should consolidation run?

For active agents, once per day or once per 50 new memories is a good starting point. Agents with bursty usage patterns benefit from count-based triggers so consolidation runs after intense sessions rather than during quiet periods.

Can I consolidate already-consolidated memories?

Yes. This is called multi-level consolidation. Daily summaries can be consolidated into weekly summaries, and weekly summaries into monthly summaries. Each level compresses further, creating a pyramid of increasingly abstract knowledge.

#MemoryConsolidation #Summarization #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering

Memory Consolidation: Compressing and Summarizing Agent Memories Over Time

Why Raw Memories Do Not Scale

Consolidation Triggers

Summary Generation

Detail Preservation

The Full Consolidation Pipeline

Storage Optimization

FAQ

Does summarization lose important nuance?

How often should consolidation run?

Can I consolidate already-consolidated memories?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding