Text-to-SQL with Open-Source Models: SQLCoder, NSQL, and DeFog SQLCoder

Why Open-Source Text-to-SQL Models?

Commercial APIs like GPT-4 and Claude deliver excellent text-to-SQL accuracy, but they come with trade-offs: data leaves your network, costs scale with usage, and latency depends on external servers. Open-source models solve all three problems. You run them on your own infrastructure, pay nothing per query, and get deterministic low-latency responses.

The accuracy gap between open-source and commercial models has narrowed dramatically. DeFog SQLCoder 2, based on CodeLlama-34B, achieves accuracy comparable to GPT-4 on the Spider benchmark while running on a single A100 GPU.

The Open-Source Text-to-SQL Landscape

DeFog SQLCoder is the most widely adopted open-source text-to-SQL model. SQLCoder 2 (34B parameters) was the first open model to match GPT-4 on standard benchmarks. The smaller SQLCoder-7B variant runs on consumer GPUs while maintaining strong accuracy on single-table queries.

NSQL (NumbersStation) is a family of models specifically fine-tuned for SQL generation. NSQL-6B and NSQL-350M offer options at different performance-accuracy trade-off points. The 350M model is small enough to run on CPUs.

CodeLlama and StarCoder are general code generation models that perform reasonably well on SQL tasks without SQL-specific fine-tuning. They are good starting points if you want a single model for multiple code generation tasks.

Deploying SQLCoder Locally with vLLM

vLLM provides high-throughput inference with PagedAttention, making it ideal for serving text-to-SQL models in production.

# Install vLLM
# pip install vllm

# Start the server (in terminal)
# python -m vllm.entrypoints.openai.api_server \
#   --model defog/sqlcoder-34b-alpha \
#   --tensor-parallel-size 2 \
#   --max-model-len 8192 \
#   --port 8000

# Client code using the OpenAI-compatible API
import openai

client = openai.OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed",
)

def text_to_sql_local(question: str, schema: str) -> str:
    """Generate SQL using a locally hosted SQLCoder model."""
    prompt = f"""### Task
Generate a SQL query to answer [QUESTION]{question}[/QUESTION]

### Database Schema
The query will run on a database with the following schema:
{schema}

### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]
[SQL]
"""
    response = client.completions.create(
        model="defog/sqlcoder-34b-alpha",
        prompt=prompt,
        max_tokens=512,
        temperature=0,
        stop=["[/SQL]", "###", "\n\n"],
    )
    return response.choices[0].text.strip()

Running SQLCoder-7B on Consumer Hardware

The 7B parameter model runs on a single GPU with 16GB VRAM, making it accessible for development and smaller deployments.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "defog/sqlcoder-7b-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
)

def generate_sql(question: str, schema: str) -> str:
    prompt = f"""### Task
Generate a SQL query to answer the following question:
\`{question}\`

### Database Schema
{schema}

### SQL
"""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            do_sample=False,
            num_beams=1,
            pad_token_id=tokenizer.eos_token_id,
        )
    generated = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True,
    )
    # Extract SQL from generated text
    sql = generated.split(";")[0] + ";"
    return sql.strip()

# Usage
schema = """CREATE TABLE products (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  category TEXT,
  price DECIMAL(8,2),
  stock_quantity INTEGER
);"""

sql = generate_sql("What are the cheapest 5 products in Electronics?", schema)
print(sql)

Accuracy Comparison

Based on the Spider benchmark (execution accuracy):

Model	Parameters	Accuracy	Hardware Requirement
GPT-4o	Unknown	~86%	API only
DeFog SQLCoder 2	34B	~84%	2x A100 (80GB)
DeFog SQLCoder-7B-2	7B	~78%	1x RTX 4090 (24GB)
NSQL-6B	6B	~75%	1x RTX 3090 (24GB)
CodeLlama-34B	34B	~72%	2x A100 (80GB)
NSQL-350M	350M	~62%	CPU (16GB RAM)

These numbers are for zero-shot evaluation. With few-shot examples specific to your schema, accuracy typically improves by 5-10% across all models.

Fine-Tuning on Your Schema

The biggest accuracy boost comes from fine-tuning on question-query pairs specific to your database. Even 100 examples can dramatically improve accuracy on your schema.

# Prepare training data in the format SQLCoder expects
training_data = [
    {
        "question": "How many active users signed up last week?",
        "schema": "CREATE TABLE users (id INT, status VARCHAR, created_at DATE);",
        "sql": "SELECT COUNT(*) FROM users WHERE status = 'active' AND created_at >= CURRENT_DATE - INTERVAL '7 days'",
    },
    # Add 100+ examples covering your common query patterns
]

# Fine-tune using QLoRA for efficiency
# pip install peft trl datasets
from peft import LoraConfig
from trl import SFTTrainer

lora_config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"],
    task_type="CAUSAL_LM",
)
# Configure SFTTrainer with your training_data and lora_config

Fine-tuning with QLoRA requires only about 10GB of GPU memory and completes in 1-2 hours on 100 examples.

FAQ

When should I use open-source models instead of GPT-4?

Choose open-source when data privacy is a hard requirement (healthcare, finance), when you need predictable per-query costs at high volume (10,000+ queries/day), or when you need sub-200ms latency for interactive applications. Choose commercial APIs when accuracy on complex queries is critical and you have fewer than 1,000 queries per day.

Can I run SQLCoder without a GPU?

NSQL-350M runs on CPUs at acceptable speeds for batch processing. For SQLCoder-7B, you can use quantized versions (GGUF format with llama.cpp) that run on CPUs with 32GB RAM, though latency will be 5-10 seconds per query instead of sub-second.

How do I evaluate accuracy on my specific database?

Create a test set of 50-100 question-query pairs covering your most common query patterns. Run each question through the model, execute both the generated SQL and the reference SQL, and compare results. Execution accuracy (do the results match?) is more meaningful than exact match (is the SQL identical?).

#SQLCoder #OpenSourceAI #TextToSQL #LocalLLM #DeFog #NSQL #FineTuning #AgenticAI

Text-to-SQL with Open-Source Models: SQLCoder, NSQL, and DeFog SQLCoder

Why Open-Source Text-to-SQL Models?

The Open-Source Text-to-SQL Landscape

Deploying SQLCoder Locally with vLLM

Running SQLCoder-7B on Consumer Hardware

Accuracy Comparison

Fine-Tuning on Your Schema

FAQ

When should I use open-source models instead of GPT-4?

Can I run SQLCoder without a GPU?

How do I evaluate accuracy on my specific database?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding