Database Query Optimization for Agent Knowledge Retrieval: Indexes, Caching, and Denormalization
Optimize the database layer that powers your AI agent's knowledge retrieval with query profiling, index design, materialized views, and query caching strategies that cut latency from seconds to milliseconds.
Why Database Performance Matters for AI Agents
When an AI agent calls a tool to look up customer data, search a knowledge base, or retrieve transaction history, that tool call usually hits a database. A tool call that takes 50ms feels instant. One that takes 2 seconds makes the entire agent feel broken — and the LLM is waiting idle the entire time.
Most database performance problems in agent systems come from three sources: missing indexes, the N+1 query pattern, and full table scans on large knowledge bases. Fixing these is often the highest-ROI optimization you can make.
Query Profiling: Finding the Slow Queries
Before optimizing, measure. Use EXPLAIN ANALYZE in PostgreSQL to understand exactly how the database executes your queries.
import asyncpg
async def profile_query(pool: asyncpg.Pool, query: str, *args) -> dict:
"""Run EXPLAIN ANALYZE on a query and return the execution plan."""
explain_query = f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}"
result = await pool.fetchval(explain_query, *args)
plan = result[0]
return {
"total_time_ms": plan["Execution Time"],
"planning_time_ms": plan["Planning Time"],
"plan": plan["Plan"],
}
# Usage
profile = await profile_query(
pool,
"SELECT * FROM knowledge_base WHERE content ILIKE $1",
"%return policy%",
)
print(f"Query took {profile['total_time_ms']:.1f}ms")
# If this shows "Seq Scan" on a large table, you need an index
Index Design for Agent Queries
Agents typically run three types of queries: exact lookup (find by ID), keyword search (find by content), and filtered listing (find by status + date range). Each needs a different index strategy.
# Index creation script for a typical agent knowledge base
INDEXES = [
# Exact lookup by slug or ID — B-tree (default)
"CREATE INDEX IF NOT EXISTS idx_kb_slug ON knowledge_base (slug);",
# Full-text search on content — GIN index with tsvector
"""CREATE INDEX IF NOT EXISTS idx_kb_content_fts
ON knowledge_base USING GIN (to_tsvector('english', content));""",
# Filtered listing: category + date for sorted retrieval
"""CREATE INDEX IF NOT EXISTS idx_kb_category_date
ON knowledge_base (category, updated_at DESC);""",
# Composite index for agent tool: status + priority + created
"""CREATE INDEX IF NOT EXISTS idx_tickets_status_priority
ON support_tickets (status, priority DESC, created_at DESC)
WHERE status = 'open';""", # Partial index — only indexes open tickets
]
async def apply_indexes(pool: asyncpg.Pool):
async with pool.acquire() as conn:
for idx_sql in INDEXES:
await conn.execute(idx_sql)
Partial indexes (with a WHERE clause) are especially powerful for agent queries. If your agent only searches open tickets, indexing only open tickets makes the index smaller and faster.
Full-Text Search Instead of ILIKE
Agents often need to search knowledge bases by content. The naive approach uses ILIKE, which forces a full table scan on every query.
import asyncpg
# BAD: Full table scan on every search
async def search_knowledge_slow(pool: asyncpg.Pool, query: str) -> list:
return await pool.fetch(
"SELECT * FROM knowledge_base WHERE content ILIKE $1 LIMIT 10",
f"%{query}%",
)
# GOOD: Full-text search with GIN index
async def search_knowledge_fast(pool: asyncpg.Pool, query: str) -> list:
return await pool.fetch(
"""SELECT *, ts_rank(
to_tsvector('english', content),
plainto_tsquery('english', $1)
) AS rank
FROM knowledge_base
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
ORDER BY rank DESC
LIMIT 10""",
query,
)
On a table with 100,000 rows, the ILIKE query takes 200-500ms. The full-text search query with a GIN index takes 2-10ms.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Eliminating the N+1 Pattern
The N+1 problem is the most common performance killer in agent tools. It happens when you query a list and then loop through it to fetch related data.
import asyncpg
# BAD: N+1 — one query for orders, then one per order for items
async def get_order_details_n_plus_1(pool: asyncpg.Pool, customer_id: str):
orders = await pool.fetch(
"SELECT * FROM orders WHERE customer_id = $1", customer_id
)
for order in orders:
# This runs once PER order — 10 orders = 10 queries
order["items"] = await pool.fetch(
"SELECT * FROM order_items WHERE order_id = $1", order["id"]
)
return orders
# GOOD: Single query with JOIN
async def get_order_details_joined(pool: asyncpg.Pool, customer_id: str):
rows = await pool.fetch(
"""SELECT o.id AS order_id, o.status, o.total,
oi.product_name, oi.quantity, oi.price
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
WHERE o.customer_id = $1
ORDER BY o.created_at DESC""",
customer_id,
)
# Group items by order
orders = {}
for row in rows:
oid = row["order_id"]
if oid not in orders:
orders[oid] = {
"id": oid, "status": row["status"],
"total": row["total"], "items": [],
}
if row["product_name"]:
orders[oid]["items"].append({
"product": row["product_name"],
"quantity": row["quantity"],
"price": row["price"],
})
return list(orders.values())
Materialized Views for Complex Aggregations
If your agent frequently needs aggregated data (e.g., "What are this customer's total purchases by category?"), materialized views pre-compute the result.
# Create a materialized view for customer spending summaries
CREATE_MATVIEW = """
CREATE MATERIALIZED VIEW IF NOT EXISTS customer_spending_summary AS
SELECT
c.id AS customer_id,
c.email,
COUNT(o.id) AS total_orders,
SUM(o.total) AS lifetime_spend,
MAX(o.created_at) AS last_order_date,
AVG(o.total) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.email;
CREATE UNIQUE INDEX ON customer_spending_summary (customer_id);
"""
# Refresh the view periodically (not on every query)
REFRESH_VIEW = "REFRESH MATERIALIZED VIEW CONCURRENTLY customer_spending_summary;"
async def get_spending_summary(pool: asyncpg.Pool, customer_id: str) -> dict:
"""Instant lookup instead of expensive aggregation."""
row = await pool.fetchrow(
"SELECT * FROM customer_spending_summary WHERE customer_id = $1",
customer_id,
)
return dict(row) if row else None
Refresh the materialized view on a schedule (every 5-15 minutes) rather than on every query. For most agent use cases, slightly stale aggregation data is perfectly acceptable.
Query Result Caching
For data that does not change frequently, add an application-level cache between the agent and the database.
import json
import hashlib
class QueryCache:
def __init__(self, redis_client, default_ttl: int = 300):
self.redis = redis_client
self.default_ttl = default_ttl
def _key(self, query: str, args: tuple) -> str:
payload = f"{query}:{json.dumps(args, default=str)}"
return f"qcache:{hashlib.sha256(payload.encode()).hexdigest()}"
async def cached_fetch(self, pool, query: str, *args, ttl: int = None):
key = self._key(query, args)
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
rows = await pool.fetch(query, *args)
result = [dict(r) for r in rows]
await self.redis.set(
key, json.dumps(result, default=str),
ex=ttl or self.default_ttl,
)
return result
FAQ
How do I know which queries need optimization?
Enable slow query logging in PostgreSQL (log_min_duration_statement = 100 logs queries over 100ms). Then sort by total time (frequency times duration). A query that runs 1,000 times per day at 200ms each is a higher priority than one that runs once at 5 seconds.
Should I use vector search (pgvector) for agent knowledge retrieval?
Use vector search when your agent needs semantic similarity matching — finding content that is conceptually related to the query, not just keyword matches. Use full-text search for exact keyword queries. Many production systems use both: full-text search for precise lookups and vector search for exploratory queries.
How often should I refresh materialized views?
It depends on how fresh the data needs to be. For agent-facing aggregations like customer spending summaries, refreshing every 5-15 minutes is sufficient. For dashboards, every hour works. Use REFRESH MATERIALIZED VIEW CONCURRENTLY to avoid locking the view during refresh, which lets agents continue reading during the refresh process.
#DatabaseOptimization #PostgreSQL #Indexing #QueryCaching #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.