Database Query Tools: Letting AI Agents Read from PostgreSQL and SQLite
Learn how to build safe database query tools that let AI agents retrieve data from PostgreSQL and SQLite. Covers parameterized queries, read-only enforcement, result formatting, and guardrails against destructive operations.
The Case for Database Tools
Most useful AI agents need access to data. Rather than pre-loading everything into the prompt, database query tools let agents fetch exactly the data they need on demand. This keeps context windows lean and allows agents to answer questions about large datasets that would never fit in a single prompt.
The challenge is safety. You are giving an LLM the ability to run SQL against your database. This post shows how to do it with proper guardrails.
Designing the Tool Schema
The schema should guide the LLM to produce well-structured queries:
query_tool_schema = {
"type": "function",
"function": {
"name": "query_database",
"description": "Execute a read-only SQL query against the application database and return results. Only SELECT statements are allowed. Use this to look up customer data, order history, product information, or aggregate statistics. Always include a LIMIT clause to avoid returning too many rows.",
"parameters": {
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "A SQL SELECT query. Must start with SELECT. Include LIMIT to cap results."
},
"params": {
"type": "array",
"items": {"type": "string"},
"description": "Positional parameters for the query, replacing $1, $2 etc placeholders."
}
},
"required": ["sql"]
}
}
}
Including the params field encourages the LLM to use parameterized queries instead of string interpolation, though you should enforce this server-side too.
The Safety Layer: Read-Only Enforcement
Never trust that the LLM will only generate SELECT statements. Validate every query before execution:
import re
import sqlparse
FORBIDDEN_KEYWORDS = {
"INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE",
"TRUNCATE", "GRANT", "REVOKE", "EXEC", "EXECUTE",
"INTO", # catches SELECT INTO
}
def validate_query(sql: str) -> tuple[bool, str]:
normalized = sql.strip().upper()
if not normalized.startswith("SELECT"):
return False, "Only SELECT queries are allowed"
parsed = sqlparse.parse(sql)
if len(parsed) != 1:
return False, "Only single statements are allowed"
tokens = [t.ttype for t in parsed[0].flatten() if t.ttype is not None]
words = set(re.findall(r'\b[A-Z]+\b', normalized))
dangerous = words.intersection(FORBIDDEN_KEYWORDS)
if dangerous:
return False, f"Forbidden keywords detected: {dangerous}"
if "LIMIT" not in normalized:
return False, "Query must include a LIMIT clause"
return True, "OK"
This is defense in depth. Even if the LLM generates a valid-looking query, you parse it, check for forbidden keywords, enforce single-statement execution, and require a LIMIT clause.
PostgreSQL Implementation
For PostgreSQL, use asyncpg with a read-only connection:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import asyncpg
import json
class PostgreSQLTool:
def __init__(self, dsn: str):
self.dsn = dsn
self.pool = None
async def connect(self):
self.pool = await asyncpg.create_pool(
self.dsn,
min_size=2,
max_size=10,
command_timeout=10,
)
# Set all connections to read-only
async with self.pool.acquire() as conn:
await conn.execute("SET default_transaction_read_only = ON")
async def execute_query(self, sql: str, params: list = None) -> str:
is_valid, message = validate_query(sql)
if not is_valid:
return f"Query rejected: {message}"
try:
async with self.pool.acquire() as conn:
await conn.execute("SET statement_timeout = '5s'")
rows = await conn.fetch(sql, *(params or []))
results = [dict(row) for row in rows]
return json.dumps(results, default=str, indent=2)
except asyncpg.PostgresError as e:
return f"Database error: {str(e)}"
Key safety measures here: connection pool with a maximum size to prevent resource exhaustion, a 5-second statement timeout to kill runaway queries, and read-only transaction mode at the connection level.
SQLite Implementation
SQLite is simpler but needs the same guardrails:
import sqlite3
import json
class SQLiteTool:
def __init__(self, db_path: str):
self.db_path = db_path
def execute_query(self, sql: str, params: list = None) -> str:
is_valid, message = validate_query(sql)
if not is_valid:
return f"Query rejected: {message}"
try:
conn = sqlite3.connect(
f"file:{self.db_path}?mode=ro",
uri=True,
timeout=5,
)
conn.row_factory = sqlite3.Row
cursor = conn.execute(sql, params or [])
rows = [dict(row) for row in cursor.fetchall()]
conn.close()
return json.dumps(rows, default=str, indent=2)
except sqlite3.Error as e:
return f"Database error: {str(e)}"
The mode=ro URI parameter opens the database in true read-only mode at the filesystem level.
Formatting Results for the LLM
Raw JSON works, but you can help the LLM interpret results by adding metadata:
def format_results(rows: list, sql: str) -> str:
if not rows:
return "No results found for the query."
output = f"Query returned {len(rows)} row(s).\n\n"
columns = list(rows[0].keys())
output += "Columns: " + ", ".join(columns) + "\n\n"
output += json.dumps(rows, default=str, indent=2)
if len(rows) >= 100:
output += "\n\nNote: Results may be truncated. Consider adding more specific WHERE conditions."
return output
Providing Schema Context
The LLM needs to know your database structure to write correct queries. Include the schema in the system prompt or provide a separate tool:
schema_tool_schema = {
"type": "function",
"function": {
"name": "get_database_schema",
"description": "Returns the database table names, column names, and column types. Call this before writing a query if you are unsure about the table structure.",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
}
}
This lets the agent discover the schema on demand rather than stuffing it all into the system prompt upfront.
FAQ
Should I let the LLM write raw SQL or use an abstraction layer?
Raw SQL with guardrails is the most flexible approach and what most production systems use. Abstraction layers like predefined query templates are safer but limit the agent to only the queries you anticipated. Start with raw SQL plus validation, and add templates for common queries as you identify them.
How do I prevent the agent from reading sensitive tables?
Create a dedicated database user with SELECT permissions only on approved tables. This is enforced at the database level, which is more reliable than keyword filtering. Additionally, use a schema tool that only exposes the tables the agent is allowed to query.
What if the query returns thousands of rows?
Enforce a maximum LIMIT in your validation layer (e.g., cap at 100 rows regardless of what the LLM requests). Truncate results before returning them to the LLM and include a note that results were truncated. Large result sets waste context tokens and rarely improve the agent's answer.
#Database #PostgreSQL #SQLite #ToolDesign #AIAgents #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.