Skip to content
Learn Agentic AI9 min read0 views

Memory Search Strategies: Recency, Relevance, and Importance-Weighted Retrieval

Implement and tune multi-signal memory retrieval for AI agents using recency, relevance, and importance scoring functions with combined ranking and parameter tuning strategies.

The Retrieval Quality Problem

An agent's memory is only as good as its retrieval. Storing a thousand perfectly organized memories means nothing if the agent pulls back the wrong five when answering a question. Most naive implementations use a single signal — either recency (most recent first) or relevance (best embedding match). Both fail in predictable ways.

Recency-only retrieval ignores critical old memories. Relevance-only retrieval surfaces stale facts that matched the query words but are no longer accurate. Production agents need multi-signal ranking that balances recency, relevance, and importance.

The Three Scoring Functions

Each signal produces a score between 0 and 1 for every memory candidate.

Recency Score

Recency decays exponentially from the memory's last access time. Recent memories score near 1.0, and old memories approach 0.0.

import math
from datetime import datetime
from dataclasses import dataclass, field


@dataclass
class Memory:
    content: str
    embedding: list[float]
    created_at: datetime
    last_accessed: datetime
    importance: float = 0.5
    access_count: int = 0


def recency_score(
    memory: Memory,
    now: datetime,
    half_life_hours: float = 24.0,
) -> float:
    hours_elapsed = (
        (now - memory.last_accessed).total_seconds() / 3600
    )
    decay_rate = math.log(2) / half_life_hours
    return math.exp(-decay_rate * hours_elapsed)

The half-life parameter controls the decay speed. A 24-hour half-life means a memory accessed yesterday gets a recency score of 0.5. A 168-hour half-life (one week) gives the same memory a score of about 0.95.

Relevance Score

Relevance measures how semantically close a memory is to the current query. In production, this is the cosine similarity between the query embedding and the memory embedding.

def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(x * x for x in b))
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return dot / (norm_a * norm_b)


def relevance_score(
    memory: Memory,
    query_embedding: list[float],
) -> float:
    sim = cosine_similarity(memory.embedding, query_embedding)
    # Normalize from [-1, 1] to [0, 1]
    return (sim + 1) / 2

Importance Score

Importance is a property of the memory itself, not the query. It reflects how critical this information is regardless of context. User preferences, explicit instructions, and key decisions have high importance. Transient observations have low importance.

def importance_score(memory: Memory) -> float:
    base = memory.importance
    # Boost based on access frequency
    access_boost = min(memory.access_count * 0.02, 0.2)
    return min(base + access_boost, 1.0)

Combined Ranking

The three signals are combined with configurable weights. This lets you tune the retrieval behavior for different use cases.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

@dataclass
class RetrievalWeights:
    recency: float = 0.3
    relevance: float = 0.5
    importance: float = 0.2

    def __post_init__(self):
        total = self.recency + self.relevance + self.importance
        self.recency /= total
        self.relevance /= total
        self.importance /= total


def combined_score(
    memory: Memory,
    query_embedding: list[float],
    now: datetime,
    weights: RetrievalWeights,
    half_life_hours: float = 24.0,
) -> float:
    r = recency_score(memory, now, half_life_hours)
    rel = relevance_score(memory, query_embedding)
    imp = importance_score(memory)
    return (
        weights.recency * r
        + weights.relevance * rel
        + weights.importance * imp
    )


def retrieve(
    memories: list[Memory],
    query_embedding: list[float],
    weights: RetrievalWeights | None = None,
    top_k: int = 5,
    half_life_hours: float = 24.0,
) -> list[Memory]:
    weights = weights or RetrievalWeights()
    now = datetime.now()
    scored = [
        (
            combined_score(
                m, query_embedding, now, weights, half_life_hours
            ),
            m,
        )
        for m in memories
    ]
    scored.sort(key=lambda x: x[0], reverse=True)
    results = []
    for _, mem in scored[:top_k]:
        mem.last_accessed = now
        mem.access_count += 1
        results.append(mem)
    return results

Tuning the Weights

Different agent scenarios need different weight profiles.

Customer support agents should weight importance heavily (0.4) so that account details and policies always surface. Recency matters moderately (0.3) because recent tickets provide context.

Research agents should weight relevance heavily (0.6) since the user is searching for specific knowledge. Recency and importance split the remainder.

Personal assistants should weight recency highly (0.4) because users usually ask about recent events. Importance handles persistent preferences.

# Weight profiles for common scenarios
SUPPORT_WEIGHTS = RetrievalWeights(
    recency=0.3, relevance=0.3, importance=0.4
)
RESEARCH_WEIGHTS = RetrievalWeights(
    recency=0.15, relevance=0.6, importance=0.25
)
ASSISTANT_WEIGHTS = RetrievalWeights(
    recency=0.4, relevance=0.35, importance=0.25
)

A/B Testing Your Retrieval

To tune weights empirically, log what the agent retrieves and whether the user's question was answered successfully. Compare retrieval quality across weight configurations.

@dataclass
class RetrievalLog:
    query: str
    weights_used: RetrievalWeights
    retrieved_ids: list[str]
    user_satisfied: bool | None = None

    def to_dict(self) -> dict:
        return {
            "query": self.query,
            "weights": {
                "recency": self.weights_used.recency,
                "relevance": self.weights_used.relevance,
                "importance": self.weights_used.importance,
            },
            "retrieved_count": len(self.retrieved_ids),
            "satisfied": self.user_satisfied,
        }

Collect these logs, segment by weight configuration, and compare the satisfaction rate. Shift weights toward configurations that produce higher satisfaction.

FAQ

Should the weights be static or adaptive?

Start with static weights tuned per use case. Adaptive weights that shift based on query type add complexity. For example, a question starting with "what did I just say" should boost recency, while "what is our refund policy" should boost importance. Implementing query-type detection is a good optimization once the static baseline works well.

What if two memories score identically?

Break ties with creation time — newer memories first. In practice, exact ties are rare because the three signals create a high-resolution scoring space. If you see many ties, your embeddings may lack discriminative power.

How many memories should I retrieve?

Start with 5 and adjust. Too few and the agent misses context. Too many and you waste context window tokens on low-value memories. Monitor context window utilization and reduce top_k if the agent is frequently truncating.


#MemoryRetrieval #SearchRanking #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.