Memory Search Strategies: Recency, Relevance, and Importance-Weighted Retrieval
Implement and tune multi-signal memory retrieval for AI agents using recency, relevance, and importance scoring functions with combined ranking and parameter tuning strategies.
The Retrieval Quality Problem
An agent's memory is only as good as its retrieval. Storing a thousand perfectly organized memories means nothing if the agent pulls back the wrong five when answering a question. Most naive implementations use a single signal — either recency (most recent first) or relevance (best embedding match). Both fail in predictable ways.
Recency-only retrieval ignores critical old memories. Relevance-only retrieval surfaces stale facts that matched the query words but are no longer accurate. Production agents need multi-signal ranking that balances recency, relevance, and importance.
The Three Scoring Functions
Each signal produces a score between 0 and 1 for every memory candidate.
Recency Score
Recency decays exponentially from the memory's last access time. Recent memories score near 1.0, and old memories approach 0.0.
import math
from datetime import datetime
from dataclasses import dataclass, field
@dataclass
class Memory:
content: str
embedding: list[float]
created_at: datetime
last_accessed: datetime
importance: float = 0.5
access_count: int = 0
def recency_score(
memory: Memory,
now: datetime,
half_life_hours: float = 24.0,
) -> float:
hours_elapsed = (
(now - memory.last_accessed).total_seconds() / 3600
)
decay_rate = math.log(2) / half_life_hours
return math.exp(-decay_rate * hours_elapsed)
The half-life parameter controls the decay speed. A 24-hour half-life means a memory accessed yesterday gets a recency score of 0.5. A 168-hour half-life (one week) gives the same memory a score of about 0.95.
Relevance Score
Relevance measures how semantically close a memory is to the current query. In production, this is the cosine similarity between the query embedding and the memory embedding.
def cosine_similarity(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
norm_a = math.sqrt(sum(x * x for x in a))
norm_b = math.sqrt(sum(x * x for x in b))
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)
def relevance_score(
memory: Memory,
query_embedding: list[float],
) -> float:
sim = cosine_similarity(memory.embedding, query_embedding)
# Normalize from [-1, 1] to [0, 1]
return (sim + 1) / 2
Importance Score
Importance is a property of the memory itself, not the query. It reflects how critical this information is regardless of context. User preferences, explicit instructions, and key decisions have high importance. Transient observations have low importance.
def importance_score(memory: Memory) -> float:
base = memory.importance
# Boost based on access frequency
access_boost = min(memory.access_count * 0.02, 0.2)
return min(base + access_boost, 1.0)
Combined Ranking
The three signals are combined with configurable weights. This lets you tune the retrieval behavior for different use cases.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
@dataclass
class RetrievalWeights:
recency: float = 0.3
relevance: float = 0.5
importance: float = 0.2
def __post_init__(self):
total = self.recency + self.relevance + self.importance
self.recency /= total
self.relevance /= total
self.importance /= total
def combined_score(
memory: Memory,
query_embedding: list[float],
now: datetime,
weights: RetrievalWeights,
half_life_hours: float = 24.0,
) -> float:
r = recency_score(memory, now, half_life_hours)
rel = relevance_score(memory, query_embedding)
imp = importance_score(memory)
return (
weights.recency * r
+ weights.relevance * rel
+ weights.importance * imp
)
def retrieve(
memories: list[Memory],
query_embedding: list[float],
weights: RetrievalWeights | None = None,
top_k: int = 5,
half_life_hours: float = 24.0,
) -> list[Memory]:
weights = weights or RetrievalWeights()
now = datetime.now()
scored = [
(
combined_score(
m, query_embedding, now, weights, half_life_hours
),
m,
)
for m in memories
]
scored.sort(key=lambda x: x[0], reverse=True)
results = []
for _, mem in scored[:top_k]:
mem.last_accessed = now
mem.access_count += 1
results.append(mem)
return results
Tuning the Weights
Different agent scenarios need different weight profiles.
Customer support agents should weight importance heavily (0.4) so that account details and policies always surface. Recency matters moderately (0.3) because recent tickets provide context.
Research agents should weight relevance heavily (0.6) since the user is searching for specific knowledge. Recency and importance split the remainder.
Personal assistants should weight recency highly (0.4) because users usually ask about recent events. Importance handles persistent preferences.
# Weight profiles for common scenarios
SUPPORT_WEIGHTS = RetrievalWeights(
recency=0.3, relevance=0.3, importance=0.4
)
RESEARCH_WEIGHTS = RetrievalWeights(
recency=0.15, relevance=0.6, importance=0.25
)
ASSISTANT_WEIGHTS = RetrievalWeights(
recency=0.4, relevance=0.35, importance=0.25
)
A/B Testing Your Retrieval
To tune weights empirically, log what the agent retrieves and whether the user's question was answered successfully. Compare retrieval quality across weight configurations.
@dataclass
class RetrievalLog:
query: str
weights_used: RetrievalWeights
retrieved_ids: list[str]
user_satisfied: bool | None = None
def to_dict(self) -> dict:
return {
"query": self.query,
"weights": {
"recency": self.weights_used.recency,
"relevance": self.weights_used.relevance,
"importance": self.weights_used.importance,
},
"retrieved_count": len(self.retrieved_ids),
"satisfied": self.user_satisfied,
}
Collect these logs, segment by weight configuration, and compare the satisfaction rate. Shift weights toward configurations that produce higher satisfaction.
FAQ
Should the weights be static or adaptive?
Start with static weights tuned per use case. Adaptive weights that shift based on query type add complexity. For example, a question starting with "what did I just say" should boost recency, while "what is our refund policy" should boost importance. Implementing query-type detection is a good optimization once the static baseline works well.
What if two memories score identically?
Break ties with creation time — newer memories first. In practice, exact ties are rare because the three signals create a high-resolution scoring space. If you see many ties, your embeddings may lack discriminative power.
How many memories should I retrieve?
Start with 5 and adjust. Too few and the agent misses context. Too many and you waste context window tokens on low-value memories. Monitor context window utilization and reduce top_k if the agent is frequently truncating.
#MemoryRetrieval #SearchRanking #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.