Skip to content
Learn Agentic AI11 min read0 views

Soft Deletes and Data Retention for AI Agent Conversations: Compliance-Ready Patterns

Implement soft deletes, data retention policies, GDPR-compliant deletion, and conversation archival for AI agent systems with PostgreSQL patterns, automated cleanup, and audit trails.

Why Soft Deletes for Agent Systems

When a user deletes a conversation, the immediate instinct is to DELETE FROM conversations WHERE id = $1. But in AI agent systems, hard deletes create three problems. First, conversation data is often needed for debugging agent failures after the fact. Second, compliance regulations may require retaining records for a defined period. Third, cascading hard deletes across conversations, messages, tool calls, and analytics can cause long-running transactions that lock tables.

Soft deletes mark records as deleted without removing them from the database. The data remains available for auditing and debugging but is hidden from normal application queries.

Implementing Soft Deletes

Add soft delete columns to your agent tables:

ALTER TABLE conversations
    ADD COLUMN deleted_at TIMESTAMPTZ,
    ADD COLUMN deleted_by UUID REFERENCES users(id);

ALTER TABLE messages
    ADD COLUMN deleted_at TIMESTAMPTZ;

-- Partial index: only index non-deleted rows for active queries
CREATE INDEX idx_conversations_active_user
    ON conversations(user_id, created_at DESC)
    WHERE deleted_at IS NULL;

The partial index is critical. Without it, every query that filters out deleted rows scans the full index. With it, PostgreSQL only indexes active rows, keeping the index small and queries fast.

Application Layer: Default Scoping

Ensure all application queries exclude deleted records by default. In SQLAlchemy, use a custom query class or a mixin:

from datetime import datetime, timezone
from sqlalchemy import event
from sqlalchemy.orm import Query


class SoftDeleteMixin:
    deleted_at: Mapped[datetime | None] = mapped_column(default=None)
    deleted_by: Mapped[str | None] = mapped_column(
        UUID(as_uuid=False), default=None
    )

    def soft_delete(self, user_id: str):
        self.deleted_at = datetime.now(timezone.utc)
        self.deleted_by = user_id

    @property
    def is_deleted(self) -> bool:
        return self.deleted_at is not None


class Conversation(SoftDeleteMixin, Base):
    __tablename__ = "conversations"
    # ... other columns


# Always filter out deleted rows in queries
async def get_user_conversations(
    session: AsyncSession, user_id: str
) -> list[Conversation]:
    stmt = (
        select(Conversation)
        .where(Conversation.user_id == user_id)
        .where(Conversation.deleted_at.is_(None))  # Soft delete filter
        .order_by(Conversation.created_at.desc())
    )
    result = await session.execute(stmt)
    return list(result.scalars().all())

For Prisma, use middleware to automatically add the soft delete filter:

prisma.$use(async (params, next) => {
  if (params.model === "Conversation") {
    if (params.action === "findMany" || params.action === "findFirst") {
      params.args.where = {
        ...params.args.where,
        deletedAt: null,
      };
    }
    if (params.action === "delete") {
      params.action = "update";
      params.args.data = { deletedAt: new Date() };
    }
  }
  return next(params);
});

Data Retention Policies

Define retention policies based on conversation status:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

-- Create a retention policy table
CREATE TABLE retention_policies (
    id SERIAL PRIMARY KEY,
    data_type TEXT NOT NULL UNIQUE,
    retention_days INTEGER NOT NULL,
    action TEXT NOT NULL DEFAULT 'hard_delete'
        CHECK (action IN ('hard_delete', 'archive', 'anonymize'))
);

INSERT INTO retention_policies (data_type, retention_days, action) VALUES
    ('soft_deleted_conversations', 30, 'hard_delete'),
    ('archived_conversations', 365, 'archive'),
    ('active_conversations', 730, 'anonymize'),
    ('tool_execution_logs', 90, 'hard_delete');

Implement an automated cleanup job:

async def enforce_retention_policies(pool: asyncpg.Pool):
    """Run daily to enforce data retention policies."""

    # Hard delete soft-deleted conversations older than 30 days
    deleted = await pool.execute("""
        DELETE FROM conversations
        WHERE deleted_at IS NOT NULL
          AND deleted_at < now() - INTERVAL '30 days'
    """)

    # Hard delete old tool execution logs
    deleted_tools = await pool.execute("""
        DELETE FROM tool_executions
        WHERE created_at < now() - INTERVAL '90 days'
    """)

    # Anonymize old active conversations (GDPR)
    anonymized = await pool.execute("""
        UPDATE conversations
        SET metadata = '{}'::jsonb,
            title = 'Archived Conversation'
        WHERE deleted_at IS NULL
          AND created_at < now() - INTERVAL '730 days'
          AND status != 'anonymized'
    """)

    return {
        "conversations_deleted": deleted,
        "tool_logs_deleted": deleted_tools,
        "conversations_anonymized": anonymized,
    }

GDPR Right to Erasure

When a user requests data deletion under GDPR, you must remove all personally identifiable information. This is different from a soft delete — it is a permanent transformation:

async def gdpr_erase_user(pool: asyncpg.Pool, user_id: str):
    """Permanently erase all PII for a user (GDPR Art. 17)."""
    async with pool.acquire() as conn:
        async with conn.transaction():
            # Anonymize messages (keep for agent analytics)
            await conn.execute("""
                UPDATE messages
                SET content = '[REDACTED]'
                WHERE conversation_id IN (
                    SELECT id FROM conversations
                    WHERE user_id = $1
                )
                AND role = 'user'
            """, user_id)

            # Remove user PII from conversations
            await conn.execute("""
                UPDATE conversations
                SET metadata = '{}'::jsonb,
                    title = '[REDACTED]'
                WHERE user_id = $1
            """, user_id)

            # Anonymize the user record
            await conn.execute("""
                UPDATE users
                SET email = 'deleted_' || id || '@redacted.local',
                    display_name = '[DELETED USER]'
                WHERE id = $1
            """, user_id)

            # Log the erasure for compliance audit
            await conn.execute("""
                INSERT INTO audit_log (action, entity_type, entity_id, details)
                VALUES ('gdpr_erasure', 'user', $1,
                    jsonb_build_object('erased_at', now()))
            """, user_id)

This approach retains conversation structure for analytics (message counts, tool usage patterns) while removing all content that could identify the user.

Archival to Cold Storage

For very old conversations, move them out of the primary database entirely:

async def archive_to_s3(
    pool: asyncpg.Pool, s3_client, bucket: str, cutoff_days: int = 365
):
    rows = await pool.fetch("""
        SELECT c.id, c.user_id, c.created_at,
            json_agg(json_build_object(
                'role', m.role,
                'content', m.content,
                'created_at', m.created_at
            ) ORDER BY m.created_at) AS messages
        FROM conversations c
        JOIN messages m ON m.conversation_id = c.id
        WHERE c.created_at < now() - make_interval(days => $1)
          AND c.status = 'archived'
        GROUP BY c.id
        LIMIT 1000
    """, cutoff_days)

    for row in rows:
        key = f"archives/{row['user_id']}/{row['id']}.json"
        await s3_client.put_object(
            Bucket=bucket, Key=key,
            Body=json.dumps(dict(row), default=str),
        )

    archived_ids = [row["id"] for row in rows]
    if archived_ids:
        await pool.execute("""
            DELETE FROM conversations WHERE id = ANY($1)
        """, archived_ids)

FAQ

Do soft deletes impact query performance?

Yes, but partial indexes mitigate this almost entirely. A partial index with WHERE deleted_at IS NULL means active queries never scan deleted rows. The main cost is storage — deleted rows remain on disk until hard-deleted. Run the retention cleanup job regularly to manage this.

Should I soft delete messages individually or cascade from conversations?

Soft delete at the conversation level only. When a user deletes a conversation, mark the conversation as deleted and exclude it from queries. Individual message soft deletes add complexity without clear value — users rarely delete single messages from agent conversations.

How do I handle GDPR deletion requests for data shared with third-party LLM providers?

GDPR erasure applies to data you control. Document in your privacy policy that conversation content is sent to LLM providers for processing. Most providers (OpenAI, Anthropic) offer data deletion APIs or have zero-retention API tiers. For compliance, use the zero-retention tier and maintain records of which providers processed which conversations.


#SoftDeletes #DataRetention #GDPR #Compliance #AIAgents #PostgreSQL #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.