Building a Data Analysis Agent: Natural Language to SQL and Visualizations

Why Data Analysis Needs an Agent Layer

Most organizations store critical business data in SQL databases, but only a fraction of employees know how to write SQL. A data analysis agent bridges this gap by accepting natural language questions like "What were our top 5 products by revenue last quarter?" and returning both the answer and a visualization — no SQL knowledge required.

This is not just a text-to-SQL translator. A proper data analysis agent forms a loop: it understands the schema, generates a query, executes it, checks for errors, builds a chart if appropriate, and explains the results in plain language. Each of these steps requires a different tool, and the agent orchestrates them autonomously.

Architecture Overview

The agent needs four core tools:

Schema inspector — retrieves table names, columns, and types from the database
SQL executor — runs a generated query and returns rows
Chart generator — creates visualizations from tabular results
Interpreter — produces a natural language summary of the data

Here is the foundational setup:

import sqlite3
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import io
import base64
from agents import Agent, Runner, function_tool

DB_PATH = "sales.db"

def get_connection():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    return conn

Tool 1: Schema Inspector

The agent must understand the database structure before writing SQL. This tool returns every table with its columns and types:

@function_tool
def inspect_schema() -> str:
    """Return all table names, column names, and column types in the database."""
    conn = get_connection()
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = [row["name"] for row in cursor.fetchall()]

    schema_parts = []
    for table in tables:
        cursor.execute(f"PRAGMA table_info({table})")
        cols = cursor.fetchall()
        col_defs = [f"  {c['name']} ({c['type']})" for c in cols]
        schema_parts.append(f"Table: {table}\n" + "\n".join(col_defs))

    conn.close()
    return "\n\n".join(schema_parts)

Tool 2: SQL Executor with Safety Checks

Never let an LLM run arbitrary write operations on your database. The executor validates that the query is read-only before running it:

@function_tool
def execute_sql(query: str) -> str:
    """Execute a read-only SQL query and return up to 50 rows as text."""
    normalized = query.strip().upper()
    if not normalized.startswith("SELECT"):
        return "Error: Only SELECT queries are allowed."

    conn = get_connection()
    try:
        cursor = conn.cursor()
        cursor.execute(query)
        rows = cursor.fetchmany(50)
        if not rows:
            return "Query returned 0 rows."
        headers = rows[0].keys()
        lines = ["\t".join(headers)]
        for row in rows:
            lines.append("\t".join(str(row[h]) for h in headers))
        return "\n".join(lines)
    except Exception as e:
        return f"SQL Error: {e}"
    finally:
        conn.close()

Tool 3: Chart Generator

When the data suits a visual representation, the agent can produce a bar chart, line chart, or pie chart:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

@function_tool
def generate_chart(
    chart_type: str, labels: list[str], values: list[float], title: str
) -> str:
    """Create a chart and return it as a base64-encoded PNG image.
    chart_type must be one of: bar, line, pie."""
    fig, ax = plt.subplots(figsize=(8, 5))
    if chart_type == "bar":
        ax.bar(labels, values)
    elif chart_type == "line":
        ax.plot(labels, values, marker="o")
    elif chart_type == "pie":
        ax.pie(values, labels=labels, autopct="%1.1f%%")
    else:
        return "Error: Unsupported chart type."

    ax.set_title(title)
    fig.tight_layout()

    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    plt.close(fig)
    buf.seek(0)
    return base64.b64encode(buf.read()).decode()

Assembling the Agent

Wire the tools together and give the agent clear instructions about its workflow:

data_analyst = Agent(
    name="Data Analyst",
    instructions="""You are a data analysis agent. When the user asks a question:
1. Call inspect_schema to understand the database structure.
2. Write a SQL query to answer the question. Use execute_sql to run it.
3. If the query fails, read the error and fix the SQL.
4. If the results suit a chart, call generate_chart.
5. Always end with a plain-English interpretation of the findings.""",
    tools=[inspect_schema, execute_sql, generate_chart],
)

result = Runner.run_sync(
    data_analyst, "Which product category had the highest revenue last month?"
)
print(result.final_output)

The agent loop handles the rest. It inspects the schema, discovers the orders and products tables, writes a JOIN with a date filter, executes the query, generates a bar chart, and summarizes: "Electronics led with $42,300 in revenue last month, followed by Apparel at $31,800."

Production Hardening Tips

Query cost limits. Wrap the executor with a timeout and a row cap so a poorly written query cannot lock the database or return millions of rows.

Schema caching. Call inspect_schema once per session and inject the result into the agent instructions rather than calling the tool on every question.

Parameterized queries. For databases with user-supplied filter values, extend the executor to accept parameters and use parameterized queries to prevent SQL injection.

FAQ

Can this agent handle JOINs across multiple tables?

Yes. By providing the full schema — including foreign key relationships — the LLM reliably generates multi-table JOINs. Include sample rows in the schema description if the column names are ambiguous.

How do I prevent the agent from running destructive queries?

The executor shown above rejects any query that does not start with SELECT. For stronger guarantees, connect with a read-only database user that has no INSERT, UPDATE, or DELETE privileges at the database level.

What if the generated SQL is incorrect?

The agent loop naturally handles this. When execute_sql returns an error message, the LLM reads it, identifies the issue (wrong column name, missing GROUP BY), and generates a corrected query on the next iteration.

#DataAnalysis #TexttoSQL #Visualization #Python #AIAgents #AgenticAI #LearnAI #AIEngineering

Building a Data Analysis Agent: Natural Language to SQL and Visualizations

Why Data Analysis Needs an Agent Layer

Architecture Overview

Tool 1: Schema Inspector

Tool 2: SQL Executor with Safety Checks

Tool 3: Chart Generator

Assembling the Agent

Production Hardening Tips

FAQ

Can this agent handle JOINs across multiple tables?

How do I prevent the agent from running destructive queries?

What if the generated SQL is incorrect?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding