Building AI Report Generation for SaaS: Natural Language to Analytics
Implement a natural language report builder that lets SaaS users ask questions in plain English and get back charts, tables, and exportable reports from their product data.
Why Natural Language Reports Matter
Most SaaS products have a reporting section that requires users to select filters, choose chart types, and configure date ranges manually. Power users love it. Everyone else avoids it. When a VP asks "How did our conversion rate change after we launched the new pricing page?", they want to type that question and get an answer — not spend 15 minutes configuring a funnel report.
AI report generation bridges this gap by translating natural language into database queries, visualizations, and exportable documents.
Safe Data Access Layer
The AI must query your database without risking SQL injection or unauthorized data access. Build a restricted query layer that only allows SELECT statements on approved tables.
import re
import sqlalchemy
from sqlalchemy import text
from dataclasses import dataclass
@dataclass
class TableSchema:
name: str
columns: list[dict] # {"name": str, "type": str, "description": str}
description: str
ALLOWED_TABLES: dict[str, TableSchema] = {
"deals": TableSchema(
name="deals",
columns=[
{"name": "id", "type": "UUID", "description": "Deal ID"},
{"name": "name", "type": "VARCHAR", "description": "Deal name"},
{"name": "value", "type": "DECIMAL", "description": "Deal value in USD"},
{"name": "stage", "type": "VARCHAR", "description": "Pipeline stage"},
{"name": "created_at", "type": "TIMESTAMP", "description": "Creation date"},
{"name": "closed_at", "type": "TIMESTAMP", "description": "Close date"},
{"name": "tenant_id", "type": "UUID", "description": "Tenant ID"},
],
description="Sales deals and opportunities",
),
"contacts": TableSchema(
name="contacts",
columns=[
{"name": "id", "type": "UUID", "description": "Contact ID"},
{"name": "name", "type": "VARCHAR", "description": "Full name"},
{"name": "email", "type": "VARCHAR", "description": "Email address"},
{"name": "company", "type": "VARCHAR", "description": "Company name"},
{"name": "created_at", "type": "TIMESTAMP", "description": "Creation date"},
{"name": "tenant_id", "type": "UUID", "description": "Tenant ID"},
],
description="Contact records",
),
}
def validate_query(sql: str, tenant_id: str) -> str:
"""Validate and sandbox the generated SQL."""
sql_upper = sql.strip().upper()
# Only allow SELECT statements
if not sql_upper.startswith("SELECT"):
raise ValueError("Only SELECT queries are allowed.")
# Block dangerous keywords
forbidden = ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE",
"CREATE", "GRANT", "REVOKE", "EXEC"]
for keyword in forbidden:
if re.search(rf'\b{keyword}\b', sql_upper):
raise ValueError(f"Forbidden keyword: {keyword}")
# Ensure tenant_id filter is present
if "tenant_id" not in sql.lower():
raise ValueError("Query must include tenant_id filter.")
return sql
Text-to-SQL with Schema Context
Feed the LLM your table schemas so it generates accurate queries. Always include column descriptions — they are more valuable than column names for query accuracy.
async def generate_report_query(question: str, tenant_id: str,
llm_client) -> dict:
schema_description = ""
for table in ALLOWED_TABLES.values():
cols = ", ".join(
[f"{c['name']} ({c['type']}: {c['description']})"
for c in table.columns]
)
schema_description += f"\nTable: {table.name} - {table.description}\n"
schema_description += f" Columns: {cols}\n"
prompt = f"""You are a SQL query generator for a SaaS analytics system.
Generate a PostgreSQL query to answer the user's question.
RULES:
- Only use tables and columns from the schema below
- ALWAYS filter by tenant_id = '{tenant_id}'
- Use aggregate functions (COUNT, SUM, AVG) for summary questions
- Include ORDER BY and LIMIT where appropriate
- Return JSON with: "sql", "chart_type" (bar, line, pie, table, number),
"title", "x_axis", "y_axis"
SCHEMA:
{schema_description}
User question: {question}"""
response = await llm_client.chat(
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
)
return response
Executing Queries and Building Charts
Run the validated query and transform results into chart-ready data structures.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from enum import Enum
from pydantic import BaseModel
class ChartType(str, Enum):
BAR = "bar"
LINE = "line"
PIE = "pie"
TABLE = "table"
NUMBER = "number"
class ReportResult(BaseModel):
title: str
chart_type: ChartType
data: list[dict]
x_axis: str | None = None
y_axis: str | None = None
summary: str
async def execute_report(query_plan: dict, tenant_id: str,
db_engine) -> ReportResult:
sql = validate_query(query_plan["sql"], tenant_id)
async with db_engine.connect() as conn:
result = await conn.execute(text(sql))
rows = [dict(row._mapping) for row in result.fetchall()]
# For single-number results
if query_plan.get("chart_type") == "number" and len(rows) == 1:
value = list(rows[0].values())[0]
return ReportResult(
title=query_plan["title"],
chart_type=ChartType.NUMBER,
data=[{"value": value}],
summary=f"{query_plan['title']}: {value}",
)
# Serialize datetime and decimal values
import json
from datetime import datetime
from decimal import Decimal
def serialize(obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, Decimal):
return float(obj)
return obj
serialized_rows = [
{k: serialize(v) for k, v in row.items()} for row in rows
]
return ReportResult(
title=query_plan["title"],
chart_type=ChartType(query_plan.get("chart_type", "table")),
data=serialized_rows,
x_axis=query_plan.get("x_axis"),
y_axis=query_plan.get("y_axis"),
summary=f"Found {len(rows)} records for: {query_plan['title']}",
)
Export to Multiple Formats
Users need reports in PDF, CSV, and email-ready formats.
import csv
import io
def export_csv(report: ReportResult) -> str:
if not report.data:
return ""
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=report.data[0].keys())
writer.writeheader()
writer.writerows(report.data)
return output.getvalue()
def export_html_table(report: ReportResult) -> str:
if not report.data:
return "<p>No data available.</p>"
headers = list(report.data[0].keys())
html = f"<h2>{report.title}</h2><table border='1'><tr>"
html += "".join(f"<th>{h}</th>" for h in headers)
html += "</tr>"
for row in report.data:
html += "<tr>"
html += "".join(f"<td>{row.get(h, '')}</td>" for h in headers)
html += "</tr>"
html += "</table>"
return html
The Complete Report API
from fastapi import FastAPI, Depends
from pydantic import BaseModel
app = FastAPI()
class ReportRequest(BaseModel):
question: str
@app.post("/api/reports/generate", response_model=ReportResult)
async def generate_report(
req: ReportRequest,
tenant_id: str = Depends(get_current_tenant),
llm_client = Depends(get_llm_client),
db_engine = Depends(get_db_engine),
):
query_plan = await generate_report_query(
question=req.question,
tenant_id=tenant_id,
llm_client=llm_client,
)
report = await execute_report(query_plan, tenant_id, db_engine)
return report
FAQ
How do I prevent the AI from generating expensive queries?
Add a query cost estimator using EXPLAIN before execution. Set a maximum estimated cost threshold (e.g., 10,000 cost units) and reject queries that exceed it. Also enforce a hard row limit with LIMIT 10000 appended to every query, and set a statement timeout at the database level (e.g., 30 seconds).
What if the AI generates an incorrect query?
Show the generated SQL to the user alongside the results, with an "Edit Query" option. Log all generated queries with the original question for quality monitoring. Build a feedback loop where users can flag incorrect results, and use those examples to improve the system prompt with few-shot examples of correct query patterns.
How do I handle questions that span multiple tables?
Include JOIN relationships in your schema description. Specify which columns are foreign keys and how tables relate. The LLM handles multi-table queries well when the schema description includes lines like "deals.contact_id references contacts.id" — this gives it the explicit relationship it needs to write correct JOINs.
#AIReports #NaturalLanguageAnalytics #SaaS #TexttoSQL #Python #DataVisualization #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.