Text-to-SQL Error Correction: Self-Healing Queries That Fix Their Own Mistakes

Why Error Correction Matters

Even the best text-to-SQL systems produce invalid queries 10-20% of the time. Column name mismatches, syntax errors, incorrect data type comparisons, and ambiguous GROUP BY clauses are common failure modes. Without error correction, these failures become dead ends for the user.

A self-healing query system catches execution errors, feeds them back to the LLM with the original context, and asks for a corrected version. This simple loop recovers from 60-80% of first-attempt failures, pushing overall system accuracy from 85% to 95%+.

Error Categories in AI-Generated SQL

Understanding error types helps you build targeted correction strategies:

Syntax errors — Missing parentheses, incorrect keyword order, invalid function names. These produce immediate database errors with clear messages.

Schema errors — Referencing non-existent tables or columns. The error message includes the problematic identifier, making correction straightforward.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Type errors — Comparing incompatible types, like WHERE date_column = 42. These may produce errors or silently return wrong results.

Logic errors — Valid SQL that returns incorrect results. The query runs successfully but answers the wrong question. These are the hardest to detect.

Building the Self-Correction Loop

import openai
import psycopg2
from dataclasses import dataclass
from typing import Optional

@dataclass
class QueryAttempt:
    sql: str
    error: Optional[str]
    results: Optional[list[dict]]

class SelfHealingSQLAgent:
    def __init__(self, conn_string: str, schema: str):
        self.conn_string = conn_string
        self.schema = schema
        self.client = openai.OpenAI()
        self.max_retries = 3

    def _generate(self, question: str,
                  previous_attempts: list[QueryAttempt]) -> str:
        """Generate SQL, incorporating error context from previous attempts."""
        system_msg = f"""You are a PostgreSQL expert. Generate a SQL query for
the user's question using this schema:

{self.schema}

Return ONLY the SQL query."""

        # Build error context from previous attempts
        if previous_attempts:
            error_context = "\n\nPrevious attempts that failed:\n"
            for i, attempt in enumerate(previous_attempts, 1):
                error_context += f"\nAttempt {i}:\n"
                error_context += f"SQL: {attempt.sql}\n"
                error_context += f"Error: {attempt.error}\n"
            error_context += "\nFix the issues and generate a corrected query."
            system_msg += error_context

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_msg},
                {"role": "user", "content": question},
            ],
            temperature=0,
        )
        return response.choices[0].message.content.strip()

    def _execute(self, sql: str) -> QueryAttempt:
        """Execute SQL and return the attempt result."""
        try:
            conn = psycopg2.connect(self.conn_string)
            cur = conn.cursor()
            cur.execute(sql)
            columns = [desc[0] for desc in cur.description]
            rows = [dict(zip(columns, row)) for row in cur.fetchall()]
            conn.close()
            return QueryAttempt(sql=sql, error=None, results=rows)
        except Exception as e:
            return QueryAttempt(sql=sql, error=str(e), results=None)

    def ask(self, question: str) -> dict:
        """Ask a question with automatic error correction."""
        attempts = []

        for retry in range(self.max_retries):
            sql = self._generate(question, attempts)
            attempt = self._execute(sql)
            attempts.append(attempt)

            if attempt.error is None:
                return {
                    "sql": sql,
                    "results": attempt.results,
                    "total_attempts": retry + 1,
                    "corrections": [a.error for a in attempts[:-1]],
                }

        # All retries exhausted
        return {
            "error": "Failed after all retry attempts",
            "attempts": [
                {"sql": a.sql, "error": a.error} for a in attempts
            ],
        }

Detecting Logic Errors with Result Validation

Syntax and schema errors produce exceptions. Logic errors are harder because the query succeeds but returns wrong data. Use heuristic checks to catch common logic errors:

class ResultValidator:
    """Detect potential logic errors in query results."""

    def validate(self, question: str, sql: str,
                 results: list[dict]) -> tuple[bool, str]:
        checks = [
            self._check_empty_results,
            self._check_unreasonable_counts,
            self._check_null_heavy_results,
            self._check_aggregation_mismatch,
        ]
        for check in checks:
            is_valid, reason = check(question, sql, results)
            if not is_valid:
                return False, reason
        return True, "Results look reasonable"

    def _check_empty_results(self, question: str, sql: str,
                             results: list[dict]) -> tuple[bool, str]:
        if not results:
            # Empty results might be correct, but flag for review
            return False, ("Query returned 0 rows. This may indicate an "
                          "overly restrictive WHERE clause or a wrong table.")
        return True, "OK"

    def _check_unreasonable_counts(self, question: str, sql: str,
                                    results: list[dict]) -> tuple[bool, str]:
        upper_q = question.upper()
        if any(word in upper_q for word in ["HOW MANY", "COUNT", "TOTAL"]):
            # For count questions, check if result is a single row
            if len(results) > 1 and "count" not in str(results[0].keys()).lower():
                return False, ("Question asks for a count but query returned "
                              "multiple rows without aggregation.")
        return True, "OK"

    def _check_null_heavy_results(self, question: str, sql: str,
                                   results: list[dict]) -> tuple[bool, str]:
        if results:
            null_count = sum(
                1 for row in results
                for v in row.values() if v is None
            )
            total_values = len(results) * len(results[0])
            if total_values > 0 and null_count / total_values > 0.5:
                return False, ("Over 50% of result values are NULL, "
                              "suggesting a wrong JOIN or missing data.")
        return True, "OK"

    def _check_aggregation_mismatch(self, question: str, sql: str,
                                     results: list[dict]) -> tuple[bool, str]:
        upper_q = question.upper()
        has_group_word = any(w in upper_q for w in ["EACH", "PER", "BY", "EVERY"])
        if has_group_word and len(results) <= 1:
            return False, ("Question implies grouping ('each', 'per', 'by') "
                          "but only 1 row returned. Missing GROUP BY?")
        return True, "OK"

Integrating Result Validation into the Agent

class SelfHealingSQLAgentV2(SelfHealingSQLAgent):
    def __init__(self, conn_string: str, schema: str):
        super().__init__(conn_string, schema)
        self.result_validator = ResultValidator()

    def ask(self, question: str) -> dict:
        attempts = []

        for retry in range(self.max_retries):
            sql = self._generate(question, attempts)
            attempt = self._execute(sql)

            if attempt.error:
                attempts.append(attempt)
                continue

            # Validate results for logic errors
            is_valid, reason = self.result_validator.validate(
                question, sql, attempt.results
            )
            if not is_valid:
                attempt.error = f"Result validation: {reason}"
                attempts.append(attempt)
                continue

            attempts.append(attempt)
            return {
                "sql": sql,
                "results": attempt.results,
                "total_attempts": retry + 1,
            }

        return {"error": "All attempts failed", "attempts": len(attempts)}

FAQ

How many retries should I allow?

Three retries is the sweet spot. Most recoverable errors are fixed on the second attempt when the LLM sees the error message. A third attempt handles edge cases where the second fix introduces a new issue. Beyond three retries, the success rate drops sharply — if the model cannot fix it in three tries, it likely needs human intervention or a clearer question.

Does error correction increase latency significantly?

On average, 85% of queries succeed on the first attempt, so the typical user experiences no added latency. For the 15% that need correction, each retry adds 1-2 seconds (one LLM call plus one database query). The total worst-case for three retries is about 6 seconds, which is acceptable for analytical queries where users expect some processing time.

Can I use the error correction history to improve the model over time?

Yes. Log all correction events — the original question, the failed SQL, the error message, and the corrected SQL. This creates a training dataset of common mistakes. You can use it for few-shot examples in the prompt or for fine-tuning to reduce first-attempt errors over time.

#ErrorCorrection #SelfHealing #TextToSQL #QueryRetry #AgenticAI #LLMDebugging #SQLFixing #AIReliability

Text-to-SQL Error Correction: Self-Healing Queries That Fix Their Own Mistakes

Why Error Correction Matters

Error Categories in AI-Generated SQL

Building the Self-Correction Loop

Detecting Logic Errors with Result Validation

Integrating Result Validation into the Agent

FAQ

How many retries should I allow?

Does error correction increase latency significantly?

Can I use the error correction history to improve the model over time?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding