Agent Conversation Analytics: Understanding User Behavior and Agent Performance

Beyond Uptime: Understanding How Agents Actually Perform

An agent can be online, fast, and error-free while still failing its users. If 40% of conversations end with the user rephrasing their question three times and then leaving, your monitoring will show green dashboards while your users are frustrated. Conversation analytics bridges this gap by measuring what matters from the user's perspective: Did the agent solve the problem? How many turns did it take? Where did users give up?

These analytics feed directly into product decisions — which features to build, which prompts to rewrite, and where to invest in better tooling.

Defining Conversation Events

Capture structured events throughout the conversation lifecycle. These events form the raw data for all downstream analytics.

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional
import uuid

class ConversationEvent(Enum):
    STARTED = "started"
    USER_MESSAGE = "user_message"
    AGENT_RESPONSE = "agent_response"
    TOOL_CALLED = "tool_called"
    HANDOFF_REQUESTED = "handoff_requested"
    FEEDBACK_RECEIVED = "feedback_received"
    COMPLETED = "completed"
    ABANDONED = "abandoned"

@dataclass
class EventRecord:
    id: str = field(default_factory=lambda: str(uuid.uuid4()))
    conversation_id: str = ""
    user_id: str = ""
    event_type: ConversationEvent = ConversationEvent.STARTED
    timestamp: datetime = field(default_factory=datetime.utcnow)
    metadata: dict = field(default_factory=dict)

class ConversationTracker:
    def __init__(self, event_store):
        self.store = event_store

    async def record(
        self,
        conversation_id: str,
        user_id: str,
        event_type: ConversationEvent,
        **metadata,
    ):
        event = EventRecord(
            conversation_id=conversation_id,
            user_id=user_id,
            event_type=event_type,
            metadata=metadata,
        )
        await self.store.insert(event)
        return event

Instrumenting the Agent

Emit events at each meaningful point in the conversation flow.

tracker = ConversationTracker(event_store)

async def run_conversation(user_message: str, user_id: str, conversation_id: str):
    await tracker.record(
        conversation_id, user_id,
        ConversationEvent.STARTED,
        channel="web",
    )

    turn_count = 0
    while True:
        turn_count += 1
        await tracker.record(
            conversation_id, user_id,
            ConversationEvent.USER_MESSAGE,
            message_length=len(user_message),
            turn=turn_count,
        )

        response = await agent.run(user_message)

        if response.tool_calls:
            for tc in response.tool_calls:
                await tracker.record(
                    conversation_id, user_id,
                    ConversationEvent.TOOL_CALLED,
                    tool_name=tc.function.name,
                    turn=turn_count,
                )

        await tracker.record(
            conversation_id, user_id,
            ConversationEvent.AGENT_RESPONSE,
            response_length=len(response.content),
            turn=turn_count,
            model=response.model,
        )

        if is_conversation_complete(response):
            await tracker.record(
                conversation_id, user_id,
                ConversationEvent.COMPLETED,
                total_turns=turn_count,
            )
            break

        user_message = await get_next_user_message()
        if user_message is None:  # User left
            await tracker.record(
                conversation_id, user_id,
                ConversationEvent.ABANDONED,
                abandoned_at_turn=turn_count,
            )
            break

    return response.content

Key Analytics Queries

With events stored in a database, calculate the metrics that matter.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from sqlalchemy import text

async def get_conversation_metrics(db, days: int = 7):
    """Core conversation performance metrics."""
    result = await db.execute(text("""
        WITH conversations AS (
            SELECT
                conversation_id,
                MIN(CASE WHEN event_type = 'started' THEN timestamp END) AS start_time,
                MAX(CASE WHEN event_type = 'completed' THEN timestamp END) AS end_time,
                BOOL_OR(event_type = 'completed') AS was_completed,
                BOOL_OR(event_type = 'abandoned') AS was_abandoned,
                BOOL_OR(event_type = 'handoff_requested') AS had_handoff,
                COUNT(CASE WHEN event_type = 'user_message' THEN 1 END) AS user_turns
            FROM conversation_events
            WHERE timestamp >= NOW() - INTERVAL ':days days'
            GROUP BY conversation_id
        )
        SELECT
            COUNT(*) AS total_conversations,
            ROUND(AVG(CASE WHEN was_completed THEN 1.0 ELSE 0.0 END) * 100, 1) AS completion_rate,
            ROUND(AVG(CASE WHEN was_abandoned THEN 1.0 ELSE 0.0 END) * 100, 1) AS abandonment_rate,
            ROUND(AVG(CASE WHEN had_handoff THEN 1.0 ELSE 0.0 END) * 100, 1) AS handoff_rate,
            ROUND(AVG(user_turns), 1) AS avg_turns,
            ROUND(AVG(EXTRACT(EPOCH FROM (end_time - start_time))), 0) AS avg_duration_seconds
        FROM conversations
    """), {"days": days})
    return result.fetchone()

async def get_drop_off_analysis(db, days: int = 7):
    """Find which turn users most commonly abandon at."""
    result = await db.execute(text("""
        SELECT
            (metadata->>'abandoned_at_turn')::int AS abandon_turn,
            COUNT(*) AS abandon_count
        FROM conversation_events
        WHERE event_type = 'abandoned'
          AND timestamp >= NOW() - INTERVAL ':days days'
        GROUP BY abandon_turn
        ORDER BY abandon_count DESC
        LIMIT 10
    """), {"days": days})
    return result.fetchall()

Measuring User Satisfaction

Capture explicit feedback (thumbs up/down) and infer implicit satisfaction from behavior signals.

async def calculate_satisfaction_score(db, conversation_id: str) -> float:
    """Combine explicit and implicit satisfaction signals."""
    events = await db.execute(text("""
        SELECT event_type, metadata
        FROM conversation_events
        WHERE conversation_id = :cid
        ORDER BY timestamp
    """), {"cid": conversation_id})

    rows = events.fetchall()
    signals = []

    for row in rows:
        if row.event_type == "feedback_received":
            rating = row.metadata.get("rating")
            if rating == "positive":
                signals.append(1.0)
            elif rating == "negative":
                signals.append(0.0)

    # Implicit signals
    user_messages = [r for r in rows if r.event_type == "user_message"]
    completed = any(r.event_type == "completed" for r in rows)
    handoff = any(r.event_type == "handoff_requested" for r in rows)

    if completed and len(user_messages) <= 3:
        signals.append(0.9)  # Resolved quickly
    elif handoff:
        signals.append(0.3)  # Needed human help
    elif not completed:
        signals.append(0.1)  # Abandoned

    # Detect rephrasing (user sends similar messages multiple times)
    if len(user_messages) > 2:
        rephrase_penalty = max(0, (len(user_messages) - 3) * 0.1)
        signals.append(max(0.0, 0.8 - rephrase_penalty))

    return sum(signals) / len(signals) if signals else 0.5

FAQ

How do I detect that a user is rephrasing their question out of frustration?

Compare consecutive user messages using embedding similarity. If two sequential messages have cosine similarity above 0.85 but the words are different, the user is likely rephrasing because the agent did not understand or adequately address their first attempt. Track the rephrase rate as a key quality indicator — a rising rephrase rate is an early warning of prompt or retrieval degradation.

What is a good conversation completion rate?

It depends on the agent's domain. Customer support agents that handle well-scoped tasks should target 70-85% completion. General-purpose assistants might see 50-60% because users often explore or ask questions outside the agent's scope. More important than the absolute number is the trend — a 5% drop in completion rate over a week signals a real problem worth investigating.

Should I track analytics per agent or per conversation?

Both. Per-conversation analytics help you debug individual interactions and identify specific failure patterns. Per-agent analytics reveal systemic trends — which agent types perform best, which need prompt improvements, and how performance compares across models. Aggregate first by agent, then drill down into conversations for root cause analysis.

#ConversationAnalytics #UserBehavior #AgentPerformance #Metrics #AIAgents #AgenticAI #LearnAI #AIEngineering

Agent Conversation Analytics: Understanding User Behavior and Agent Performance

Beyond Uptime: Understanding How Agents Actually Perform

Defining Conversation Events

Instrumenting the Agent

Key Analytics Queries

Measuring User Satisfaction

FAQ

How do I detect that a user is rephrasing their question out of frustration?

What is a good conversation completion rate?

Should I track analytics per agent or per conversation?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding