Database Query Tools: Letting AI Agents Read from PostgreSQL and SQLite

The Case for Database Tools

Most useful AI agents need access to data. Rather than pre-loading everything into the prompt, database query tools let agents fetch exactly the data they need on demand. This keeps context windows lean and allows agents to answer questions about large datasets that would never fit in a single prompt.

The challenge is safety. You are giving an LLM the ability to run SQL against your database. This post shows how to do it with proper guardrails.

Designing the Tool Schema

The schema should guide the LLM to produce well-structured queries:

query_tool_schema = {
    "type": "function",
    "function": {
        "name": "query_database",
        "description": "Execute a read-only SQL query against the application database and return results. Only SELECT statements are allowed. Use this to look up customer data, order history, product information, or aggregate statistics. Always include a LIMIT clause to avoid returning too many rows.",
        "parameters": {
            "type": "object",
            "properties": {
                "sql": {
                    "type": "string",
                    "description": "A SQL SELECT query. Must start with SELECT. Include LIMIT to cap results."
                },
                "params": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Positional parameters for the query, replacing $1, $2 etc placeholders."
                }
            },
            "required": ["sql"]
        }
    }
}

Including the params field encourages the LLM to use parameterized queries instead of string interpolation, though you should enforce this server-side too.

The Safety Layer: Read-Only Enforcement

Never trust that the LLM will only generate SELECT statements. Validate every query before execution:

import re
import sqlparse

FORBIDDEN_KEYWORDS = {
    "INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE",
    "TRUNCATE", "GRANT", "REVOKE", "EXEC", "EXECUTE",
    "INTO",  # catches SELECT INTO
}

def validate_query(sql: str) -> tuple[bool, str]:
    normalized = sql.strip().upper()

    if not normalized.startswith("SELECT"):
        return False, "Only SELECT queries are allowed"

    parsed = sqlparse.parse(sql)
    if len(parsed) != 1:
        return False, "Only single statements are allowed"

    tokens = [t.ttype for t in parsed[0].flatten() if t.ttype is not None]
    words = set(re.findall(r'\b[A-Z]+\b', normalized))

    dangerous = words.intersection(FORBIDDEN_KEYWORDS)
    if dangerous:
        return False, f"Forbidden keywords detected: {dangerous}"

    if "LIMIT" not in normalized:
        return False, "Query must include a LIMIT clause"

    return True, "OK"

This is defense in depth. Even if the LLM generates a valid-looking query, you parse it, check for forbidden keywords, enforce single-statement execution, and require a LIMIT clause.

PostgreSQL Implementation

For PostgreSQL, use asyncpg with a read-only connection:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import asyncpg
import json

class PostgreSQLTool:
    def __init__(self, dsn: str):
        self.dsn = dsn
        self.pool = None

    async def connect(self):
        self.pool = await asyncpg.create_pool(
            self.dsn,
            min_size=2,
            max_size=10,
            command_timeout=10,
        )
        # Set all connections to read-only
        async with self.pool.acquire() as conn:
            await conn.execute("SET default_transaction_read_only = ON")

    async def execute_query(self, sql: str, params: list = None) -> str:
        is_valid, message = validate_query(sql)
        if not is_valid:
            return f"Query rejected: {message}"

        try:
            async with self.pool.acquire() as conn:
                await conn.execute("SET statement_timeout = '5s'")
                rows = await conn.fetch(sql, *(params or []))

                results = [dict(row) for row in rows]
                return json.dumps(results, default=str, indent=2)
        except asyncpg.PostgresError as e:
            return f"Database error: {str(e)}"

Key safety measures here: connection pool with a maximum size to prevent resource exhaustion, a 5-second statement timeout to kill runaway queries, and read-only transaction mode at the connection level.

SQLite Implementation

SQLite is simpler but needs the same guardrails:

import sqlite3
import json

class SQLiteTool:
    def __init__(self, db_path: str):
        self.db_path = db_path

    def execute_query(self, sql: str, params: list = None) -> str:
        is_valid, message = validate_query(sql)
        if not is_valid:
            return f"Query rejected: {message}"

        try:
            conn = sqlite3.connect(
                f"file:{self.db_path}?mode=ro",
                uri=True,
                timeout=5,
            )
            conn.row_factory = sqlite3.Row
            cursor = conn.execute(sql, params or [])
            rows = [dict(row) for row in cursor.fetchall()]
            conn.close()

            return json.dumps(rows, default=str, indent=2)
        except sqlite3.Error as e:
            return f"Database error: {str(e)}"

The mode=ro URI parameter opens the database in true read-only mode at the filesystem level.

Formatting Results for the LLM

Raw JSON works, but you can help the LLM interpret results by adding metadata:

def format_results(rows: list, sql: str) -> str:
    if not rows:
        return "No results found for the query."

    output = f"Query returned {len(rows)} row(s).\n\n"
    columns = list(rows[0].keys())
    output += "Columns: " + ", ".join(columns) + "\n\n"
    output += json.dumps(rows, default=str, indent=2)

    if len(rows) >= 100:
        output += "\n\nNote: Results may be truncated. Consider adding more specific WHERE conditions."

    return output

Providing Schema Context

The LLM needs to know your database structure to write correct queries. Include the schema in the system prompt or provide a separate tool:

schema_tool_schema = {
    "type": "function",
    "function": {
        "name": "get_database_schema",
        "description": "Returns the database table names, column names, and column types. Call this before writing a query if you are unsure about the table structure.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": []
        }
    }
}

This lets the agent discover the schema on demand rather than stuffing it all into the system prompt upfront.

FAQ

Should I let the LLM write raw SQL or use an abstraction layer?

Raw SQL with guardrails is the most flexible approach and what most production systems use. Abstraction layers like predefined query templates are safer but limit the agent to only the queries you anticipated. Start with raw SQL plus validation, and add templates for common queries as you identify them.

How do I prevent the agent from reading sensitive tables?

Create a dedicated database user with SELECT permissions only on approved tables. This is enforced at the database level, which is more reliable than keyword filtering. Additionally, use a schema tool that only exposes the tables the agent is allowed to query.

What if the query returns thousands of rows?

Enforce a maximum LIMIT in your validation layer (e.g., cap at 100 rows regardless of what the LLM requests). Truncate results before returning them to the LLM and include a note that results were truncated. Large result sets waste context tokens and rarely improve the agent's answer.

#Database #PostgreSQL #SQLite #ToolDesign #AIAgents #AgenticAI #LearnAI #AIEngineering

Database Query Tools: Letting AI Agents Read from PostgreSQL and SQLite

The Case for Database Tools

Designing the Tool Schema

The Safety Layer: Read-Only Enforcement

PostgreSQL Implementation

SQLite Implementation

Formatting Results for the LLM

Providing Schema Context

FAQ

Should I let the LLM write raw SQL or use an abstraction layer?

How do I prevent the agent from reading sensitive tables?

What if the query returns thousands of rows?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding