Designing Database Schemas for AI Agent Systems: Conversations, Messages, and Metadata
Learn how to design normalized database schemas for AI agent systems that store conversations, messages, tool calls, and metadata with proper indexing and query patterns for production workloads.
Why Schema Design Matters for AI Agents
Every AI agent system generates structured data: conversations with users, individual messages, tool call invocations, token usage metrics, and metadata about agent behavior. A poorly designed schema leads to slow queries, data integrity issues, and painful migrations as your agent system scales.
The core challenge is that agent data is both relational (a message belongs to a conversation, which belongs to a user) and semi-structured (tool call arguments vary by tool, agent configurations change over time). A good schema embraces both aspects.
The Core Tables
A production agent system typically needs five core tables. Here is the schema in SQL:
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT UNIQUE NOT NULL,
display_name TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
model TEXT NOT NULL DEFAULT 'gpt-4o',
instructions TEXT NOT NULL,
config JSONB NOT NULL DEFAULT '{}',
is_active BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
agent_id UUID NOT NULL REFERENCES agents(id),
title TEXT,
status TEXT NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'archived', 'deleted')),
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID NOT NULL
REFERENCES conversations(id) ON DELETE CASCADE,
role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'system', 'tool')),
content TEXT,
token_count_input INTEGER,
token_count_output INTEGER,
model TEXT,
latency_ms INTEGER,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE tool_calls (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
message_id UUID NOT NULL REFERENCES messages(id) ON DELETE CASCADE,
tool_name TEXT NOT NULL,
arguments JSONB NOT NULL DEFAULT '{}',
result JSONB,
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'success', 'error')),
duration_ms INTEGER,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
This design normalizes the data properly. Messages reference conversations, tool calls reference messages, and each table has a single responsibility.
Essential Indexes
Without indexes, queries against these tables degrade rapidly as data grows. Add these indexes at creation time:
-- Conversations: find all conversations for a user, newest first
CREATE INDEX idx_conversations_user_created
ON conversations(user_id, created_at DESC);
-- Messages: load all messages in a conversation in order
CREATE INDEX idx_messages_conversation_created
ON messages(conversation_id, created_at);
-- Tool calls: find all tool calls for a message
CREATE INDEX idx_tool_calls_message
ON tool_calls(message_id);
-- Tool calls: analytics by tool name
CREATE INDEX idx_tool_calls_name_created
ON tool_calls(tool_name, created_at DESC);
The compound index on conversations(user_id, created_at DESC) serves both the filter and the sort in a single index scan, avoiding a separate sort step.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Query Patterns
Loading a conversation with its messages is the most common query. Use a single query with a JOIN rather than two separate round-trips:
SELECT
c.id AS conversation_id,
c.title,
m.id AS message_id,
m.role,
m.content,
m.created_at
FROM conversations c
JOIN messages m ON m.conversation_id = c.id
WHERE c.id = $1
ORDER BY m.created_at;
For conversation list pages, always paginate with cursor-based pagination rather than OFFSET:
SELECT id, title, updated_at
FROM conversations
WHERE user_id = $1
AND created_at < $2 -- cursor from previous page
ORDER BY created_at DESC
LIMIT 20;
Cursor-based pagination maintains consistent performance regardless of how deep into the result set the user navigates.
Metadata Column Strategy
The metadata JSONB column on conversations is intentional. Agent systems accumulate context that does not fit neatly into predefined columns: session tags, A/B test variants, client device information, or custom labels. JSONB handles this without schema changes. However, if you find yourself querying a specific JSONB key frequently, promote it to a dedicated column with a proper index.
FAQ
When should I use UUID versus auto-incrementing integer primary keys?
UUIDs are preferred for agent systems because they allow ID generation at the application layer without a database round-trip, support distributed inserts across replicas, and do not leak information about record counts. The storage overhead (16 bytes versus 4 bytes for integer) is negligible for most agent workloads.
Should messages store the full content or reference an external blob store?
For messages under 100KB, store content directly in the table. PostgreSQL handles text fields efficiently and keeping data co-located simplifies queries and backups. For systems that handle large file attachments or images, store a reference URL in the message and the actual content in object storage like S3.
How do I handle schema changes when the agent config evolves?
Use the JSONB config column on the agents table for frequently changing configuration. For structural changes to core tables, use versioned migrations with tools like Alembic or Prisma Migrate. Never alter production tables directly.
#DatabaseDesign #SchemaDesign #PostgreSQL #AIAgents #DataModeling #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.