Skip to content
Learn Agentic AI13 min read0 views

Building a Survey Analysis Agent: AI-Powered Qualitative Data Processing

Build an AI agent that processes survey responses at scale — categorizing open-ended answers, detecting sentiment, extracting recurring themes, and generating executive-ready reports with statistical backing.

The Qualitative Data Problem

Quantitative survey data (ratings, multiple choice) is easy to analyze — pivot tables and averages handle it well. But the richest insights hide in open-ended responses: "What would you improve about our product?" Reading and manually coding 5,000 free-text responses takes weeks. An AI survey analysis agent categorizes responses, measures sentiment, extracts themes, and generates reports in minutes.

The agent combines rule-based tools for structured data with LLM-powered tools for the qualitative analysis that makes survey data truly valuable.

Loading Survey Data

The first tool loads survey responses and separates quantitative from qualitative fields:

import pandas as pd
import json
from agents import Agent, Runner, function_tool

_survey_data: dict = {}

@function_tool
def load_survey(file_path: str) -> str:
    """Load survey responses from a CSV file. Identifies quantitative
    and qualitative (text) columns automatically."""
    try:
        df = pd.read_csv(file_path)
    except Exception as e:
        return f"Error loading survey: {e}"

    numeric_cols = df.select_dtypes(include="number").columns.tolist()
    text_cols = df.select_dtypes(include="object").columns.tolist()

    _survey_data["df"] = df
    _survey_data["numeric_cols"] = numeric_cols
    _survey_data["text_cols"] = text_cols

    profile = (
        f"Survey loaded: {len(df)} responses\n"
        f"Quantitative columns ({len(numeric_cols)}): {', '.join(numeric_cols)}\n"
        f"Text columns ({len(text_cols)}): {', '.join(text_cols)}\n"
        f"\nSample text responses from '{text_cols[0]}' (first 3):\n"
    )
    for i, val in enumerate(df[text_cols[0]].dropna().head(3)):
        profile += f"  {i+1}. {str(val)[:200]}\n"

    return profile

Quantitative Summary Tool

Handle the easy part first — aggregate ratings, NPS scores, and numeric fields:

@function_tool
def quantitative_summary() -> str:
    """Generate statistical summaries for all numeric survey columns."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    numeric_cols = _survey_data["numeric_cols"]

    if not numeric_cols:
        return "No numeric columns found in survey."

    lines = ["Quantitative Summary:"]
    for col in numeric_cols:
        series = df[col].dropna()
        lines.append(
            f"\n  {col}:\n"
            f"    Mean: {series.mean():.2f}\n"
            f"    Median: {series.median():.2f}\n"
            f"    Std Dev: {series.std():.2f}\n"
            f"    Min: {series.min()}, Max: {series.max()}\n"
            f"    Response count: {len(series)}"
        )

    return "\n".join(lines)

Categorization Tool

This tool processes batches of open-ended responses through the LLM to assign categories:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

_categorized: list[dict] = []

@function_tool
def categorize_responses(
    column: str, categories: str, batch_size: int = 20
) -> str:
    """Categorize text responses into predefined categories.
    Returns a summary of category distribution.
    Categories should be comma-separated."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    if column not in df.columns:
        return f"Column '{column}' not found."

    responses = df[column].dropna().tolist()
    cat_list = [c.strip() for c in categories.split(",")]

    # Store for the LLM to process in the agent loop
    _categorized.clear()
    batch = responses[:batch_size]

    return (
        f"Ready to categorize {len(responses)} responses into: {cat_list}\n"
        f"First batch ({len(batch)} responses):\n"
        + "\n".join(f"  [{i}] {r[:150]}" for i, r in enumerate(batch))
        + "\n\nAssign each response a category from the list above. "
        "Return as JSON: [{index: category}, ...]"
    )

Sentiment Analysis Tool

Measure the emotional tone of responses using a structured scoring approach:

@function_tool
def analyze_sentiment(column: str, sample_size: int = 50) -> str:
    """Analyze sentiment distribution across text responses.
    Returns responses grouped for LLM-based sentiment scoring."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    responses = df[column].dropna().tolist()
    sample = responses[:sample_size]

    return (
        f"Analyze sentiment for {len(sample)} responses from '{column}'.\n"
        f"Score each as: positive, neutral, or negative.\n\n"
        + "\n".join(f"  [{i}] {r[:200]}" for i, r in enumerate(sample))
        + "\n\nReturn counts: {{positive: N, neutral: N, negative: N}} "
        "and list the 3 most positive and 3 most negative verbatims."
    )

Theme Extraction Tool

Beyond predefined categories, the agent should discover emergent themes:

@function_tool
def extract_themes(column: str, num_themes: int = 5) -> str:
    """Extract the top recurring themes from open-ended responses.
    Provides response samples for LLM-based theme identification."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    responses = df[column].dropna().tolist()

    return (
        f"Identify the top {num_themes} themes from {len(responses)} responses.\n"
        f"For each theme provide: name, description, frequency estimate, "
        f"and 2 representative quotes.\n\n"
        f"Responses (showing first 30):\n"
        + "\n".join(f"  [{i}] {r[:200]}" for i, r in enumerate(responses[:30]))
    )

Assembling the Survey Agent

survey_agent = Agent(
    name="Survey Analyst",
    instructions="""You are a survey analysis agent. When given survey data:
1. Call load_survey to understand the structure.
2. Call quantitative_summary for all numeric metrics.
3. For each text column, call analyze_sentiment to gauge overall tone.
4. Call extract_themes to discover what respondents care about most.
5. If the user specifies categories, use categorize_responses.
6. Produce a final report with:
   - Executive Summary (3-5 bullet points)
   - Quantitative Highlights
   - Sentiment Overview
   - Key Themes (with supporting quotes)
   - Recommendations based on the data""",
    tools=[
        load_survey, quantitative_summary, categorize_responses,
        analyze_sentiment, extract_themes,
    ],
)

Running the Analysis

result = Runner.run_sync(
    survey_agent,
    "Analyze the file customer_feedback_q1.csv. I want to understand "
    "overall satisfaction, what themes emerge from the open-ended feedback, "
    "and what our top 3 priorities for improvement should be.",
)
print(result.final_output)

The agent loads the data, summarizes the 1-5 satisfaction ratings (mean: 3.7), runs sentiment analysis on the comments (62% positive, 15% negative), extracts five themes (pricing concerns, onboarding friction, feature requests for mobile, praise for support team, integration gaps), and recommends priorities based on frequency and sentiment intensity.

FAQ

How does this handle surveys in multiple languages?

The LLM naturally processes text in many languages. For best results, add an instruction: "Detect the language of each response and analyze it in that language, then translate theme names and quotes to English for the report." This handles multilingual surveys without pre-translation.

Can the agent process thousands of responses without hitting token limits?

Process responses in batches. The categorization and sentiment tools shown above use a batch_size parameter. The agent processes each batch, accumulates results in tool state, and synthesizes at the end. For very large surveys (10,000+ responses), pre-filter with keyword matching before LLM analysis.

How do I validate the accuracy of AI-generated categories?

Run a calibration step: manually code 50-100 responses and compare them against the agent's categorization. Calculate inter-rater agreement (Cohen's kappa). If agreement is above 0.7, the agent is reliable for the remaining responses.


#SurveyAnalysis #SentimentAnalysis #QualitativeData #NLP #AIAgents #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.