Text-to-SQL Fundamentals: Converting Natural Language Questions to Database Queries

What Is Text-to-SQL?

Text-to-SQL is the task of converting a natural language question into a valid SQL query that can be executed against a relational database. Instead of requiring users to learn SQL syntax, they can ask questions like "How many orders were placed last month?" and receive both the generated query and its results.

This capability has been studied in NLP research for decades, but large language models have made it genuinely practical. Modern LLMs can parse complex questions, reason about database schemas, and produce syntactically correct SQL with high accuracy — often exceeding 85% on standard benchmarks.

The Text-to-SQL Pipeline

A complete text-to-SQL system involves four stages that run sequentially for each user question.

Stage 1: Schema Understanding. The system loads the database schema — table names, column names, data types, primary keys, and foreign key relationships. This schema context is essential because the LLM needs to know what tables and columns exist to write valid queries.

Stage 2: Question Analysis. The natural language question is parsed to identify the intent (aggregation, filtering, sorting), the entities being referenced, and any implicit constraints. The phrase "top customers" implies ordering and a LIMIT clause.

Stage 3: Query Generation. The LLM receives the schema context and the analyzed question, then produces a SQL query. This is typically done through a carefully engineered prompt that includes the schema, instructions, and optionally a few examples.

Stage 4: Execution and Formatting. The generated SQL is validated, executed against the database, and the results are formatted into a human-readable response.

A Minimal Implementation

Here is the simplest possible text-to-SQL system using an LLM:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import openai
import sqlite3

def get_schema(db_path: str) -> str:
    """Extract CREATE TABLE statements from a SQLite database."""
    conn = sqlite3.connect(db_path)
    cursor = conn.execute(
        "SELECT sql FROM sqlite_master WHERE type='table' AND sql IS NOT NULL"
    )
    schema = "\n\n".join(row[0] for row in cursor.fetchall())
    conn.close()
    return schema

def text_to_sql(question: str, schema: str) -> str:
    """Convert a natural language question to SQL."""
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a SQL expert. Given the following database schema,
convert the user's question into a valid SQL query.

Schema:
{schema}

Return ONLY the SQL query, no explanation.""",
            },
            {"role": "user", "content": question},
        ],
        temperature=0,
    )
    return response.choices[0].message.content.strip()

# Usage
schema = get_schema("sales.db")
sql = text_to_sql("What are the top 5 products by revenue?", schema)
print(sql)

This produces a query like SELECT product_name, SUM(price * quantity) AS revenue FROM orders GROUP BY product_name ORDER BY revenue DESC LIMIT 5.

Why Schema Context Matters

The most common failure mode in text-to-SQL is hallucinated column names. Without schema context, an LLM might generate SELECT customer_name FROM users when the actual column is full_name in a table called customers. Providing the full CREATE TABLE statements eliminates most of these errors.

Including foreign key relationships is equally important. When a user asks "Which customers have not placed any orders?", the LLM needs to know that orders.customer_id references customers.id to generate the correct LEFT JOIN with a NULL check.

Challenges in Real-World Systems

Ambiguity. The question "Show me sales for last quarter" requires knowing the current date and what "quarter" means in the business context. Does "sales" mean revenue, order count, or unit volume?

Complex queries. Questions involving multiple aggregations, window functions, or correlated subqueries push the boundaries of current LLM accuracy. A question like "What percentage of each department's budget has been spent this fiscal year?" requires CTEs or subqueries that LLMs sometimes get wrong.

Performance. The generated SQL might be correct but inefficient. An LLM might produce a correlated subquery where a JOIN would perform better. Production systems need query analysis to catch these cases.

FAQ

How accurate is text-to-SQL with current LLMs?

GPT-4 class models achieve 80-87% execution accuracy on the Spider benchmark, which tests across 200 different databases. On simpler single-table queries, accuracy often exceeds 95%. Production systems improve on these numbers by adding schema context, few-shot examples, and error correction loops.

Can text-to-SQL work with any database?

Yes. The approach is database-agnostic as long as you can extract the schema and execute queries. PostgreSQL, MySQL, SQLite, SQL Server, and BigQuery all work. You may need to adjust the system prompt to specify dialect-specific syntax like ILIKE in PostgreSQL versus LOWER() LIKE in MySQL.

Is text-to-SQL safe to run against production databases?

Not without safeguards. You must enforce read-only access, query complexity limits, and result size caps. Never allow DELETE, UPDATE, INSERT, or DDL statements from AI-generated queries. Use a read replica or a dedicated analytics database.

#TextToSQL #NaturalLanguageProcessing #LLM #DatabaseQuerying #AgenticAI #SQLGeneration #AIEngineering

Text-to-SQL Fundamentals: Converting Natural Language Questions to Database Queries

What Is Text-to-SQL?

The Text-to-SQL Pipeline

A Minimal Implementation

Why Schema Context Matters

Challenges in Real-World Systems

FAQ

How accurate is text-to-SQL with current LLMs?

Can text-to-SQL work with any database?

Is text-to-SQL safe to run against production databases?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding