Database Query Optimization for Agent Knowledge Retrieval: Indexes, Caching, and Denormalization

Why Database Performance Matters for AI Agents

When an AI agent calls a tool to look up customer data, search a knowledge base, or retrieve transaction history, that tool call usually hits a database. A tool call that takes 50ms feels instant. One that takes 2 seconds makes the entire agent feel broken — and the LLM is waiting idle the entire time.

Most database performance problems in agent systems come from three sources: missing indexes, the N+1 query pattern, and full table scans on large knowledge bases. Fixing these is often the highest-ROI optimization you can make.

Query Profiling: Finding the Slow Queries

Before optimizing, measure. Use EXPLAIN ANALYZE in PostgreSQL to understand exactly how the database executes your queries.

import asyncpg

async def profile_query(pool: asyncpg.Pool, query: str, *args) -> dict:
    """Run EXPLAIN ANALYZE on a query and return the execution plan."""
    explain_query = f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}"
    result = await pool.fetchval(explain_query, *args)

    plan = result[0]
    return {
        "total_time_ms": plan["Execution Time"],
        "planning_time_ms": plan["Planning Time"],
        "plan": plan["Plan"],
    }

# Usage
profile = await profile_query(
    pool,
    "SELECT * FROM knowledge_base WHERE content ILIKE $1",
    "%return policy%",
)
print(f"Query took {profile['total_time_ms']:.1f}ms")
# If this shows "Seq Scan" on a large table, you need an index

Index Design for Agent Queries

Agents typically run three types of queries: exact lookup (find by ID), keyword search (find by content), and filtered listing (find by status + date range). Each needs a different index strategy.

# Index creation script for a typical agent knowledge base

INDEXES = [
    # Exact lookup by slug or ID — B-tree (default)
    "CREATE INDEX IF NOT EXISTS idx_kb_slug ON knowledge_base (slug);",

    # Full-text search on content — GIN index with tsvector
    """CREATE INDEX IF NOT EXISTS idx_kb_content_fts
       ON knowledge_base USING GIN (to_tsvector('english', content));""",

    # Filtered listing: category + date for sorted retrieval
    """CREATE INDEX IF NOT EXISTS idx_kb_category_date
       ON knowledge_base (category, updated_at DESC);""",

    # Composite index for agent tool: status + priority + created
    """CREATE INDEX IF NOT EXISTS idx_tickets_status_priority
       ON support_tickets (status, priority DESC, created_at DESC)
       WHERE status = 'open';""",  # Partial index — only indexes open tickets
]

async def apply_indexes(pool: asyncpg.Pool):
    async with pool.acquire() as conn:
        for idx_sql in INDEXES:
            await conn.execute(idx_sql)

Partial indexes (with a WHERE clause) are especially powerful for agent queries. If your agent only searches open tickets, indexing only open tickets makes the index smaller and faster.

Full-Text Search Instead of ILIKE

Agents often need to search knowledge bases by content. The naive approach uses ILIKE, which forces a full table scan on every query.

import asyncpg

# BAD: Full table scan on every search
async def search_knowledge_slow(pool: asyncpg.Pool, query: str) -> list:
    return await pool.fetch(
        "SELECT * FROM knowledge_base WHERE content ILIKE $1 LIMIT 10",
        f"%{query}%",
    )

# GOOD: Full-text search with GIN index
async def search_knowledge_fast(pool: asyncpg.Pool, query: str) -> list:
    return await pool.fetch(
        """SELECT *, ts_rank(
               to_tsvector('english', content),
               plainto_tsquery('english', $1)
           ) AS rank
           FROM knowledge_base
           WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
           ORDER BY rank DESC
           LIMIT 10""",
        query,
    )

On a table with 100,000 rows, the ILIKE query takes 200-500ms. The full-text search query with a GIN index takes 2-10ms.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Eliminating the N+1 Pattern

The N+1 problem is the most common performance killer in agent tools. It happens when you query a list and then loop through it to fetch related data.

import asyncpg

# BAD: N+1 — one query for orders, then one per order for items
async def get_order_details_n_plus_1(pool: asyncpg.Pool, customer_id: str):
    orders = await pool.fetch(
        "SELECT * FROM orders WHERE customer_id = $1", customer_id
    )
    for order in orders:
        # This runs once PER order — 10 orders = 10 queries
        order["items"] = await pool.fetch(
            "SELECT * FROM order_items WHERE order_id = $1", order["id"]
        )
    return orders

# GOOD: Single query with JOIN
async def get_order_details_joined(pool: asyncpg.Pool, customer_id: str):
    rows = await pool.fetch(
        """SELECT o.id AS order_id, o.status, o.total,
                  oi.product_name, oi.quantity, oi.price
           FROM orders o
           LEFT JOIN order_items oi ON oi.order_id = o.id
           WHERE o.customer_id = $1
           ORDER BY o.created_at DESC""",
        customer_id,
    )
    # Group items by order
    orders = {}
    for row in rows:
        oid = row["order_id"]
        if oid not in orders:
            orders[oid] = {
                "id": oid, "status": row["status"],
                "total": row["total"], "items": [],
            }
        if row["product_name"]:
            orders[oid]["items"].append({
                "product": row["product_name"],
                "quantity": row["quantity"],
                "price": row["price"],
            })
    return list(orders.values())

Materialized Views for Complex Aggregations

If your agent frequently needs aggregated data (e.g., "What are this customer's total purchases by category?"), materialized views pre-compute the result.

# Create a materialized view for customer spending summaries
CREATE_MATVIEW = """
CREATE MATERIALIZED VIEW IF NOT EXISTS customer_spending_summary AS
SELECT
    c.id AS customer_id,
    c.email,
    COUNT(o.id) AS total_orders,
    SUM(o.total) AS lifetime_spend,
    MAX(o.created_at) AS last_order_date,
    AVG(o.total) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.email;

CREATE UNIQUE INDEX ON customer_spending_summary (customer_id);
"""

# Refresh the view periodically (not on every query)
REFRESH_VIEW = "REFRESH MATERIALIZED VIEW CONCURRENTLY customer_spending_summary;"

async def get_spending_summary(pool: asyncpg.Pool, customer_id: str) -> dict:
    """Instant lookup instead of expensive aggregation."""
    row = await pool.fetchrow(
        "SELECT * FROM customer_spending_summary WHERE customer_id = $1",
        customer_id,
    )
    return dict(row) if row else None

Refresh the materialized view on a schedule (every 5-15 minutes) rather than on every query. For most agent use cases, slightly stale aggregation data is perfectly acceptable.

Query Result Caching

For data that does not change frequently, add an application-level cache between the agent and the database.

import json
import hashlib

class QueryCache:
    def __init__(self, redis_client, default_ttl: int = 300):
        self.redis = redis_client
        self.default_ttl = default_ttl

    def _key(self, query: str, args: tuple) -> str:
        payload = f"{query}:{json.dumps(args, default=str)}"
        return f"qcache:{hashlib.sha256(payload.encode()).hexdigest()}"

    async def cached_fetch(self, pool, query: str, *args, ttl: int = None):
        key = self._key(query, args)
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)

        rows = await pool.fetch(query, *args)
        result = [dict(r) for r in rows]

        await self.redis.set(
            key, json.dumps(result, default=str),
            ex=ttl or self.default_ttl,
        )
        return result

FAQ

How do I know which queries need optimization?

Enable slow query logging in PostgreSQL (log_min_duration_statement = 100 logs queries over 100ms). Then sort by total time (frequency times duration). A query that runs 1,000 times per day at 200ms each is a higher priority than one that runs once at 5 seconds.

Should I use vector search (pgvector) for agent knowledge retrieval?

Use vector search when your agent needs semantic similarity matching — finding content that is conceptually related to the query, not just keyword matches. Use full-text search for exact keyword queries. Many production systems use both: full-text search for precise lookups and vector search for exploratory queries.

How often should I refresh materialized views?

It depends on how fresh the data needs to be. For agent-facing aggregations like customer spending summaries, refreshing every 5-15 minutes is sufficient. For dashboards, every hour works. Use REFRESH MATERIALIZED VIEW CONCURRENTLY to avoid locking the view during refresh, which lets agents continue reading during the refresh process.

#DatabaseOptimization #PostgreSQL #Indexing #QueryCaching #Python #AgenticAI #LearnAI #AIEngineering

Database Query Optimization for Agent Knowledge Retrieval: Indexes, Caching, and Denormalization

Why Database Performance Matters for AI Agents

Query Profiling: Finding the Slow Queries

Index Design for Agent Queries

Full-Text Search Instead of ILIKE

Eliminating the N+1 Pattern

Materialized Views for Complex Aggregations

Query Result Caching

FAQ

How do I know which queries need optimization?

Should I use vector search (pgvector) for agent knowledge retrieval?

How often should I refresh materialized views?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding