Text-to-SQL with Open-Source Models: SQLCoder, NSQL, and DeFog SQLCoder
Compare open-source text-to-SQL models including SQLCoder, NSQL, and DeFog SQLCoder. Learn how to deploy them locally, fine-tune on your schema, and evaluate accuracy against commercial alternatives.
Why Open-Source Text-to-SQL Models?
Commercial APIs like GPT-4 and Claude deliver excellent text-to-SQL accuracy, but they come with trade-offs: data leaves your network, costs scale with usage, and latency depends on external servers. Open-source models solve all three problems. You run them on your own infrastructure, pay nothing per query, and get deterministic low-latency responses.
The accuracy gap between open-source and commercial models has narrowed dramatically. DeFog SQLCoder 2, based on CodeLlama-34B, achieves accuracy comparable to GPT-4 on the Spider benchmark while running on a single A100 GPU.
The Open-Source Text-to-SQL Landscape
DeFog SQLCoder is the most widely adopted open-source text-to-SQL model. SQLCoder 2 (34B parameters) was the first open model to match GPT-4 on standard benchmarks. The smaller SQLCoder-7B variant runs on consumer GPUs while maintaining strong accuracy on single-table queries.
NSQL (NumbersStation) is a family of models specifically fine-tuned for SQL generation. NSQL-6B and NSQL-350M offer options at different performance-accuracy trade-off points. The 350M model is small enough to run on CPUs.
CodeLlama and StarCoder are general code generation models that perform reasonably well on SQL tasks without SQL-specific fine-tuning. They are good starting points if you want a single model for multiple code generation tasks.
Deploying SQLCoder Locally with vLLM
vLLM provides high-throughput inference with PagedAttention, making it ideal for serving text-to-SQL models in production.
# Install vLLM
# pip install vllm
# Start the server (in terminal)
# python -m vllm.entrypoints.openai.api_server \
# --model defog/sqlcoder-34b-alpha \
# --tensor-parallel-size 2 \
# --max-model-len 8192 \
# --port 8000
# Client code using the OpenAI-compatible API
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed",
)
def text_to_sql_local(question: str, schema: str) -> str:
"""Generate SQL using a locally hosted SQLCoder model."""
prompt = f"""### Task
Generate a SQL query to answer [QUESTION]{question}[/QUESTION]
### Database Schema
The query will run on a database with the following schema:
{schema}
### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]
[SQL]
"""
response = client.completions.create(
model="defog/sqlcoder-34b-alpha",
prompt=prompt,
max_tokens=512,
temperature=0,
stop=["[/SQL]", "###", "\n\n"],
)
return response.choices[0].text.strip()
Running SQLCoder-7B on Consumer Hardware
The 7B parameter model runs on a single GPU with 16GB VRAM, making it accessible for development and smaller deployments.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "defog/sqlcoder-7b-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
)
def generate_sql(question: str, schema: str) -> str:
prompt = f"""### Task
Generate a SQL query to answer the following question:
\`{question}\`
### Database Schema
{schema}
### SQL
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
num_beams=1,
pad_token_id=tokenizer.eos_token_id,
)
generated = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True,
)
# Extract SQL from generated text
sql = generated.split(";")[0] + ";"
return sql.strip()
# Usage
schema = """CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
price DECIMAL(8,2),
stock_quantity INTEGER
);"""
sql = generate_sql("What are the cheapest 5 products in Electronics?", schema)
print(sql)
Accuracy Comparison
Based on the Spider benchmark (execution accuracy):
| Model | Parameters | Accuracy | Hardware Requirement |
|---|---|---|---|
| GPT-4o | Unknown | ~86% | API only |
| DeFog SQLCoder 2 | 34B | ~84% | 2x A100 (80GB) |
| DeFog SQLCoder-7B-2 | 7B | ~78% | 1x RTX 4090 (24GB) |
| NSQL-6B | 6B | ~75% | 1x RTX 3090 (24GB) |
| CodeLlama-34B | 34B | ~72% | 2x A100 (80GB) |
| NSQL-350M | 350M | ~62% | CPU (16GB RAM) |
These numbers are for zero-shot evaluation. With few-shot examples specific to your schema, accuracy typically improves by 5-10% across all models.
Fine-Tuning on Your Schema
The biggest accuracy boost comes from fine-tuning on question-query pairs specific to your database. Even 100 examples can dramatically improve accuracy on your schema.
# Prepare training data in the format SQLCoder expects
training_data = [
{
"question": "How many active users signed up last week?",
"schema": "CREATE TABLE users (id INT, status VARCHAR, created_at DATE);",
"sql": "SELECT COUNT(*) FROM users WHERE status = 'active' AND created_at >= CURRENT_DATE - INTERVAL '7 days'",
},
# Add 100+ examples covering your common query patterns
]
# Fine-tune using QLoRA for efficiency
# pip install peft trl datasets
from peft import LoraConfig
from trl import SFTTrainer
lora_config = LoraConfig(
r=16, lora_alpha=32, lora_dropout=0.05,
target_modules=["q_proj", "v_proj"],
task_type="CAUSAL_LM",
)
# Configure SFTTrainer with your training_data and lora_config
Fine-tuning with QLoRA requires only about 10GB of GPU memory and completes in 1-2 hours on 100 examples.
FAQ
When should I use open-source models instead of GPT-4?
Choose open-source when data privacy is a hard requirement (healthcare, finance), when you need predictable per-query costs at high volume (10,000+ queries/day), or when you need sub-200ms latency for interactive applications. Choose commercial APIs when accuracy on complex queries is critical and you have fewer than 1,000 queries per day.
Can I run SQLCoder without a GPU?
NSQL-350M runs on CPUs at acceptable speeds for batch processing. For SQLCoder-7B, you can use quantized versions (GGUF format with llama.cpp) that run on CPUs with 32GB RAM, though latency will be 5-10 seconds per query instead of sub-second.
How do I evaluate accuracy on my specific database?
Create a test set of 50-100 question-query pairs covering your most common query patterns. Run each question through the model, execute both the generated SQL and the reference SQL, and compare results. Execution accuracy (do the results match?) is more meaningful than exact match (is the SQL identical?).
#SQLCoder #OpenSourceAI #TextToSQL #LocalLLM #DeFog #NSQL #FineTuning #AgenticAI
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.