Building a Data Analysis Agent: Natural Language to SQL and Visualizations
Learn how to build an AI agent that converts natural language questions into SQL queries, executes them against a database, generates charts from the results, and provides plain-English interpretations of the data.
Why Data Analysis Needs an Agent Layer
Most organizations store critical business data in SQL databases, but only a fraction of employees know how to write SQL. A data analysis agent bridges this gap by accepting natural language questions like "What were our top 5 products by revenue last quarter?" and returning both the answer and a visualization — no SQL knowledge required.
This is not just a text-to-SQL translator. A proper data analysis agent forms a loop: it understands the schema, generates a query, executes it, checks for errors, builds a chart if appropriate, and explains the results in plain language. Each of these steps requires a different tool, and the agent orchestrates them autonomously.
Architecture Overview
The agent needs four core tools:
- Schema inspector — retrieves table names, columns, and types from the database
- SQL executor — runs a generated query and returns rows
- Chart generator — creates visualizations from tabular results
- Interpreter — produces a natural language summary of the data
Here is the foundational setup:
import sqlite3
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import io
import base64
from agents import Agent, Runner, function_tool
DB_PATH = "sales.db"
def get_connection():
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
return conn
Tool 1: Schema Inspector
The agent must understand the database structure before writing SQL. This tool returns every table with its columns and types:
@function_tool
def inspect_schema() -> str:
"""Return all table names, column names, and column types in the database."""
conn = get_connection()
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = [row["name"] for row in cursor.fetchall()]
schema_parts = []
for table in tables:
cursor.execute(f"PRAGMA table_info({table})")
cols = cursor.fetchall()
col_defs = [f" {c['name']} ({c['type']})" for c in cols]
schema_parts.append(f"Table: {table}\n" + "\n".join(col_defs))
conn.close()
return "\n\n".join(schema_parts)
Tool 2: SQL Executor with Safety Checks
Never let an LLM run arbitrary write operations on your database. The executor validates that the query is read-only before running it:
@function_tool
def execute_sql(query: str) -> str:
"""Execute a read-only SQL query and return up to 50 rows as text."""
normalized = query.strip().upper()
if not normalized.startswith("SELECT"):
return "Error: Only SELECT queries are allowed."
conn = get_connection()
try:
cursor = conn.cursor()
cursor.execute(query)
rows = cursor.fetchmany(50)
if not rows:
return "Query returned 0 rows."
headers = rows[0].keys()
lines = ["\t".join(headers)]
for row in rows:
lines.append("\t".join(str(row[h]) for h in headers))
return "\n".join(lines)
except Exception as e:
return f"SQL Error: {e}"
finally:
conn.close()
Tool 3: Chart Generator
When the data suits a visual representation, the agent can produce a bar chart, line chart, or pie chart:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
@function_tool
def generate_chart(
chart_type: str, labels: list[str], values: list[float], title: str
) -> str:
"""Create a chart and return it as a base64-encoded PNG image.
chart_type must be one of: bar, line, pie."""
fig, ax = plt.subplots(figsize=(8, 5))
if chart_type == "bar":
ax.bar(labels, values)
elif chart_type == "line":
ax.plot(labels, values, marker="o")
elif chart_type == "pie":
ax.pie(values, labels=labels, autopct="%1.1f%%")
else:
return "Error: Unsupported chart type."
ax.set_title(title)
fig.tight_layout()
buf = io.BytesIO()
fig.savefig(buf, format="png")
plt.close(fig)
buf.seek(0)
return base64.b64encode(buf.read()).decode()
Assembling the Agent
Wire the tools together and give the agent clear instructions about its workflow:
data_analyst = Agent(
name="Data Analyst",
instructions="""You are a data analysis agent. When the user asks a question:
1. Call inspect_schema to understand the database structure.
2. Write a SQL query to answer the question. Use execute_sql to run it.
3. If the query fails, read the error and fix the SQL.
4. If the results suit a chart, call generate_chart.
5. Always end with a plain-English interpretation of the findings.""",
tools=[inspect_schema, execute_sql, generate_chart],
)
result = Runner.run_sync(
data_analyst, "Which product category had the highest revenue last month?"
)
print(result.final_output)
The agent loop handles the rest. It inspects the schema, discovers the orders and products tables, writes a JOIN with a date filter, executes the query, generates a bar chart, and summarizes: "Electronics led with $42,300 in revenue last month, followed by Apparel at $31,800."
Production Hardening Tips
Query cost limits. Wrap the executor with a timeout and a row cap so a poorly written query cannot lock the database or return millions of rows.
Schema caching. Call inspect_schema once per session and inject the result into the agent instructions rather than calling the tool on every question.
Parameterized queries. For databases with user-supplied filter values, extend the executor to accept parameters and use parameterized queries to prevent SQL injection.
FAQ
Can this agent handle JOINs across multiple tables?
Yes. By providing the full schema — including foreign key relationships — the LLM reliably generates multi-table JOINs. Include sample rows in the schema description if the column names are ambiguous.
How do I prevent the agent from running destructive queries?
The executor shown above rejects any query that does not start with SELECT. For stronger guarantees, connect with a read-only database user that has no INSERT, UPDATE, or DELETE privileges at the database level.
What if the generated SQL is incorrect?
The agent loop naturally handles this. When execute_sql returns an error message, the LLM reads it, identifies the issue (wrong column name, missing GROUP BY), and generates a corrected query on the next iteration.
#DataAnalysis #TexttoSQL #Visualization #Python #AIAgents #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.