Skip to content
Learn Agentic AI10 min read0 views

Database Query Optimization for Agent Knowledge Retrieval: Indexes, Caching, and Denormalization

Optimize the database layer that powers your AI agent's knowledge retrieval with query profiling, index design, materialized views, and query caching strategies that cut latency from seconds to milliseconds.

Why Database Performance Matters for AI Agents

When an AI agent calls a tool to look up customer data, search a knowledge base, or retrieve transaction history, that tool call usually hits a database. A tool call that takes 50ms feels instant. One that takes 2 seconds makes the entire agent feel broken — and the LLM is waiting idle the entire time.

Most database performance problems in agent systems come from three sources: missing indexes, the N+1 query pattern, and full table scans on large knowledge bases. Fixing these is often the highest-ROI optimization you can make.

Query Profiling: Finding the Slow Queries

Before optimizing, measure. Use EXPLAIN ANALYZE in PostgreSQL to understand exactly how the database executes your queries.

import asyncpg

async def profile_query(pool: asyncpg.Pool, query: str, *args) -> dict:
    """Run EXPLAIN ANALYZE on a query and return the execution plan."""
    explain_query = f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}"
    result = await pool.fetchval(explain_query, *args)

    plan = result[0]
    return {
        "total_time_ms": plan["Execution Time"],
        "planning_time_ms": plan["Planning Time"],
        "plan": plan["Plan"],
    }

# Usage
profile = await profile_query(
    pool,
    "SELECT * FROM knowledge_base WHERE content ILIKE $1",
    "%return policy%",
)
print(f"Query took {profile['total_time_ms']:.1f}ms")
# If this shows "Seq Scan" on a large table, you need an index

Index Design for Agent Queries

Agents typically run three types of queries: exact lookup (find by ID), keyword search (find by content), and filtered listing (find by status + date range). Each needs a different index strategy.

# Index creation script for a typical agent knowledge base

INDEXES = [
    # Exact lookup by slug or ID — B-tree (default)
    "CREATE INDEX IF NOT EXISTS idx_kb_slug ON knowledge_base (slug);",

    # Full-text search on content — GIN index with tsvector
    """CREATE INDEX IF NOT EXISTS idx_kb_content_fts
       ON knowledge_base USING GIN (to_tsvector('english', content));""",

    # Filtered listing: category + date for sorted retrieval
    """CREATE INDEX IF NOT EXISTS idx_kb_category_date
       ON knowledge_base (category, updated_at DESC);""",

    # Composite index for agent tool: status + priority + created
    """CREATE INDEX IF NOT EXISTS idx_tickets_status_priority
       ON support_tickets (status, priority DESC, created_at DESC)
       WHERE status = 'open';""",  # Partial index — only indexes open tickets
]

async def apply_indexes(pool: asyncpg.Pool):
    async with pool.acquire() as conn:
        for idx_sql in INDEXES:
            await conn.execute(idx_sql)

Partial indexes (with a WHERE clause) are especially powerful for agent queries. If your agent only searches open tickets, indexing only open tickets makes the index smaller and faster.

Full-Text Search Instead of ILIKE

Agents often need to search knowledge bases by content. The naive approach uses ILIKE, which forces a full table scan on every query.

import asyncpg

# BAD: Full table scan on every search
async def search_knowledge_slow(pool: asyncpg.Pool, query: str) -> list:
    return await pool.fetch(
        "SELECT * FROM knowledge_base WHERE content ILIKE $1 LIMIT 10",
        f"%{query}%",
    )

# GOOD: Full-text search with GIN index
async def search_knowledge_fast(pool: asyncpg.Pool, query: str) -> list:
    return await pool.fetch(
        """SELECT *, ts_rank(
               to_tsvector('english', content),
               plainto_tsquery('english', $1)
           ) AS rank
           FROM knowledge_base
           WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
           ORDER BY rank DESC
           LIMIT 10""",
        query,
    )

On a table with 100,000 rows, the ILIKE query takes 200-500ms. The full-text search query with a GIN index takes 2-10ms.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Eliminating the N+1 Pattern

The N+1 problem is the most common performance killer in agent tools. It happens when you query a list and then loop through it to fetch related data.

import asyncpg

# BAD: N+1 — one query for orders, then one per order for items
async def get_order_details_n_plus_1(pool: asyncpg.Pool, customer_id: str):
    orders = await pool.fetch(
        "SELECT * FROM orders WHERE customer_id = $1", customer_id
    )
    for order in orders:
        # This runs once PER order — 10 orders = 10 queries
        order["items"] = await pool.fetch(
            "SELECT * FROM order_items WHERE order_id = $1", order["id"]
        )
    return orders

# GOOD: Single query with JOIN
async def get_order_details_joined(pool: asyncpg.Pool, customer_id: str):
    rows = await pool.fetch(
        """SELECT o.id AS order_id, o.status, o.total,
                  oi.product_name, oi.quantity, oi.price
           FROM orders o
           LEFT JOIN order_items oi ON oi.order_id = o.id
           WHERE o.customer_id = $1
           ORDER BY o.created_at DESC""",
        customer_id,
    )
    # Group items by order
    orders = {}
    for row in rows:
        oid = row["order_id"]
        if oid not in orders:
            orders[oid] = {
                "id": oid, "status": row["status"],
                "total": row["total"], "items": [],
            }
        if row["product_name"]:
            orders[oid]["items"].append({
                "product": row["product_name"],
                "quantity": row["quantity"],
                "price": row["price"],
            })
    return list(orders.values())

Materialized Views for Complex Aggregations

If your agent frequently needs aggregated data (e.g., "What are this customer's total purchases by category?"), materialized views pre-compute the result.

# Create a materialized view for customer spending summaries
CREATE_MATVIEW = """
CREATE MATERIALIZED VIEW IF NOT EXISTS customer_spending_summary AS
SELECT
    c.id AS customer_id,
    c.email,
    COUNT(o.id) AS total_orders,
    SUM(o.total) AS lifetime_spend,
    MAX(o.created_at) AS last_order_date,
    AVG(o.total) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.email;

CREATE UNIQUE INDEX ON customer_spending_summary (customer_id);
"""

# Refresh the view periodically (not on every query)
REFRESH_VIEW = "REFRESH MATERIALIZED VIEW CONCURRENTLY customer_spending_summary;"

async def get_spending_summary(pool: asyncpg.Pool, customer_id: str) -> dict:
    """Instant lookup instead of expensive aggregation."""
    row = await pool.fetchrow(
        "SELECT * FROM customer_spending_summary WHERE customer_id = $1",
        customer_id,
    )
    return dict(row) if row else None

Refresh the materialized view on a schedule (every 5-15 minutes) rather than on every query. For most agent use cases, slightly stale aggregation data is perfectly acceptable.

Query Result Caching

For data that does not change frequently, add an application-level cache between the agent and the database.

import json
import hashlib

class QueryCache:
    def __init__(self, redis_client, default_ttl: int = 300):
        self.redis = redis_client
        self.default_ttl = default_ttl

    def _key(self, query: str, args: tuple) -> str:
        payload = f"{query}:{json.dumps(args, default=str)}"
        return f"qcache:{hashlib.sha256(payload.encode()).hexdigest()}"

    async def cached_fetch(self, pool, query: str, *args, ttl: int = None):
        key = self._key(query, args)
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)

        rows = await pool.fetch(query, *args)
        result = [dict(r) for r in rows]

        await self.redis.set(
            key, json.dumps(result, default=str),
            ex=ttl or self.default_ttl,
        )
        return result

FAQ

How do I know which queries need optimization?

Enable slow query logging in PostgreSQL (log_min_duration_statement = 100 logs queries over 100ms). Then sort by total time (frequency times duration). A query that runs 1,000 times per day at 200ms each is a higher priority than one that runs once at 5 seconds.

Should I use vector search (pgvector) for agent knowledge retrieval?

Use vector search when your agent needs semantic similarity matching — finding content that is conceptually related to the query, not just keyword matches. Use full-text search for exact keyword queries. Many production systems use both: full-text search for precise lookups and vector search for exploratory queries.

How often should I refresh materialized views?

It depends on how fresh the data needs to be. For agent-facing aggregations like customer spending summaries, refreshing every 5-15 minutes is sufficient. For dashboards, every hour works. Use REFRESH MATERIALIZED VIEW CONCURRENTLY to avoid locking the view during refresh, which lets agents continue reading during the refresh process.


#DatabaseOptimization #PostgreSQL #Indexing #QueryCaching #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.