User Cohort Analysis for AI Agents: Segmenting Users by Behavior and Outcomes
Learn how to define user cohorts for AI agent interactions, perform retention analysis, cluster users by behavior patterns, and use cohort insights to personalize agent responses and improve outcomes.
Why Cohort Analysis Matters for AI Agents
Aggregate metrics hide important patterns. An overall 75% resolution rate might consist of 95% for returning users and 55% for first-time users. Without cohort analysis, you would never know that your agent's onboarding experience needs work while its handling of experienced users is excellent.
Cohort analysis groups users by shared characteristics — when they first interacted, how frequently they return, what topics they ask about — and tracks how each group's outcomes differ over time.
Defining Cohorts
The most common cohort definition is based on when a user first interacted with the agent. This acquisition cohort lets you track whether improvements to the agent benefit new users or only existing ones.
flowchart TD
START["User Cohort Analysis for AI Agents: Segmenting Us…"] --> A
A["Why Cohort Analysis Matters for AI Agen…"]
A --> B
B["Defining Cohorts"]
B --> C
C["Acquisition Cohort Retention"]
C --> D
D["Behavior-Based Clustering"]
D --> E
E["Using Cohort Insights for Personalizati…"]
E --> F
F["FAQ"]
F --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from collections import defaultdict
@dataclass
class UserProfile:
user_id: str
first_interaction: str # ISO date
total_conversations: int = 0
resolved_conversations: int = 0
topics: list[str] = field(default_factory=list)
avg_messages_per_conversation: float = 0.0
last_interaction: str = ""
def build_user_profiles(
events: list[dict],
) -> dict[str, UserProfile]:
profiles: dict[str, UserProfile] = {}
conversations_by_user: dict[str, dict] = defaultdict(dict)
for event in sorted(events, key=lambda e: e["timestamp"]):
uid = event["user_id"]
cid = event["conversation_id"]
if uid not in profiles:
profiles[uid] = UserProfile(
user_id=uid,
first_interaction=event["timestamp"][:10],
)
profiles[uid].last_interaction = event["timestamp"][:10]
if cid not in conversations_by_user[uid]:
conversations_by_user[uid][cid] = {
"message_count": 0,
"resolved": False,
"topic": event.get("metadata", {}).get("topic", "unknown"),
}
conversations_by_user[uid][cid]["message_count"] += 1
if event.get("event_type") == "resolution":
conversations_by_user[uid][cid]["resolved"] = True
for uid, convs in conversations_by_user.items():
profile = profiles[uid]
profile.total_conversations = len(convs)
profile.resolved_conversations = sum(
1 for c in convs.values() if c["resolved"]
)
profile.topics = list(set(c["topic"] for c in convs.values()))
total_msgs = sum(c["message_count"] for c in convs.values())
profile.avg_messages_per_conversation = round(
total_msgs / len(convs), 1
)
return profiles
Acquisition Cohort Retention
Retention analysis tracks what percentage of users from each weekly cohort return in subsequent weeks. This reveals whether your agent builds habit or loses users after a single interaction.
def compute_retention_table(
profiles: dict[str, UserProfile],
events: list[dict],
cohort_period: str = "week",
) -> dict[str, list[float]]:
from collections import defaultdict
def week_key(date_str: str) -> str:
dt = datetime.fromisoformat(date_str)
start = dt - timedelta(days=dt.weekday())
return start.strftime("%Y-%m-%d")
cohort_users: dict[str, set] = defaultdict(set)
for uid, profile in profiles.items():
cohort = week_key(profile.first_interaction)
cohort_users[cohort].add(uid)
user_active_weeks: dict[str, set] = defaultdict(set)
for event in events:
uid = event["user_id"]
week = week_key(event["timestamp"][:10])
user_active_weeks[uid].add(week)
sorted_weeks = sorted(set(
week_key(p.first_interaction)
for p in profiles.values()
))
retention_table = {}
for cohort_week in sorted_weeks:
users = cohort_users[cohort_week]
cohort_size = len(users)
if cohort_size == 0:
continue
retention = []
for offset, week in enumerate(sorted_weeks):
if week < cohort_week:
continue
active = sum(
1 for uid in users
if week in user_active_weeks.get(uid, set())
)
retention.append(round(active / cohort_size * 100, 1))
retention_table[cohort_week] = retention
return retention_table
Behavior-Based Clustering
Beyond time-based cohorts, you can cluster users by behavior: power users who interact daily, casual users who come weekly, and one-time users who never return.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
def segment_users(
profiles: dict[str, UserProfile],
) -> dict[str, list[str]]:
segments: dict[str, list[str]] = {
"power_users": [],
"regular_users": [],
"casual_users": [],
"one_time_users": [],
}
for uid, profile in profiles.items():
if profile.total_conversations >= 20:
segments["power_users"].append(uid)
elif profile.total_conversations >= 5:
segments["regular_users"].append(uid)
elif profile.total_conversations >= 2:
segments["casual_users"].append(uid)
else:
segments["one_time_users"].append(uid)
return segments
def segment_metrics(
segments: dict[str, list[str]],
profiles: dict[str, UserProfile],
) -> dict[str, dict]:
metrics = {}
for segment, user_ids in segments.items():
if not user_ids:
continue
segment_profiles = [profiles[uid] for uid in user_ids]
total_convs = sum(p.total_conversations for p in segment_profiles)
resolved = sum(p.resolved_conversations for p in segment_profiles)
metrics[segment] = {
"user_count": len(user_ids),
"avg_conversations": round(
total_convs / len(user_ids), 1
),
"resolution_rate": round(
resolved / total_convs * 100, 1
) if total_convs else 0,
"avg_messages": round(
sum(p.avg_messages_per_conversation for p in segment_profiles)
/ len(segment_profiles), 1
),
}
return metrics
Using Cohort Insights for Personalization
The most actionable output of cohort analysis is agent personalization. When you know a user is a first-timer, you can make the agent more verbose and helpful. When you know they are a power user, you can skip the preamble and get straight to business.
def get_personalization_context(
user_id: str, profiles: dict[str, UserProfile]
) -> dict:
profile = profiles.get(user_id)
if not profile:
return {"segment": "new", "style": "verbose", "skip_intro": False}
if profile.total_conversations >= 20:
return {
"segment": "power_user",
"style": "concise",
"skip_intro": True,
"known_topics": profile.topics,
}
elif profile.total_conversations >= 5:
return {
"segment": "regular",
"style": "balanced",
"skip_intro": True,
"known_topics": profile.topics,
}
else:
return {
"segment": "new",
"style": "verbose",
"skip_intro": False,
}
FAQ
How do I handle users who interact across multiple channels?
Implement a user identity resolution layer that maps multiple identifiers (email, phone, device ID) to a single canonical user ID. Without this, you will overcount one-time users and undercount returning users. Start with deterministic matching on email or phone, then layer in probabilistic matching using device fingerprints or behavior patterns.
What cohort size is too small to draw conclusions from?
Cohorts with fewer than 30 users produce unreliable percentages. A single user's behavior can swing the retention rate by 3 or more percentage points. If your weekly cohorts are that small, aggregate into monthly cohorts instead. For statistical tests comparing cohorts, aim for at least 100 users per group.
Should I rebuild cohort data from scratch or maintain it incrementally?
Maintain incrementally for efficiency, but run a full rebuild weekly as a consistency check. Incremental updates process only new events and are fast. The weekly full rebuild catches any data quality issues, late-arriving events, or schema changes that the incremental pipeline might miss.
#CohortAnalysis #UserSegmentation #Retention #Analytics #AIAgents #AgenticAI #LearnAI #AIEngineering
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.