Data Loss Prevention for AI Agents: Preventing Sensitive Data Leakage

The Unique DLP Challenge with AI Agents

Traditional DLP systems monitor file transfers, email attachments, and database exports. AI agents create a new exfiltration vector that bypasses all of these controls. An employee can paste a customer list into an agent prompt, ask it to summarize financial data from a confidential document, or instruct it to email internal metrics to an external address.

The risk is bidirectional. Sensitive data can leak into the agent (through prompts and tool inputs) and out of the agent (through responses, tool calls, and downstream API calls). A comprehensive DLP strategy must scan both directions.

Building a DLP Scanner

The scanner inspects text for patterns that match sensitive data categories: personally identifiable information, financial data, health records, credentials, and proprietary business data.

import re
from dataclasses import dataclass
from enum import Enum


class Sensitivity(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"


class Action(str, Enum):
    ALLOW = "allow"
    WARN = "warn"
    REDACT = "redact"
    BLOCK = "block"


@dataclass
class DLPRule:
    name: str
    pattern: re.Pattern
    sensitivity: Sensitivity
    action: Action
    description: str


DLP_RULES = [
    DLPRule(
        name="ssn",
        pattern=re.compile(r"d{3}-d{2}-d{4}"),
        sensitivity=Sensitivity.CRITICAL,
        action=Action.BLOCK,
        description="US Social Security Number",
    ),
    DLPRule(
        name="credit_card",
        pattern=re.compile(r"(?:d{4}[- ]?){3}d{4}"),
        sensitivity=Sensitivity.CRITICAL,
        action=Action.BLOCK,
        description="Credit card number",
    ),
    DLPRule(
        name="email_address",
        pattern=re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}"),
        sensitivity=Sensitivity.MEDIUM,
        action=Action.WARN,
        description="Email address",
    ),
    DLPRule(
        name="api_key",
        pattern=re.compile(r"(?:sk|pk|api)[_-][A-Za-z0-9]{20,}"),
        sensitivity=Sensitivity.CRITICAL,
        action=Action.BLOCK,
        description="API key or secret",
    ),
    DLPRule(
        name="aws_access_key",
        pattern=re.compile(r"AKIA[0-9A-Z]{16}"),
        sensitivity=Sensitivity.CRITICAL,
        action=Action.BLOCK,
        description="AWS access key ID",
    ),
]


@dataclass
class ScanResult:
    rule_name: str
    matched_text: str
    action: Action
    sensitivity: Sensitivity
    position: tuple[int, int]


class DLPScanner:
    def __init__(self, rules: list[DLPRule]):
        self.rules = rules

    def scan(self, text: str) -> list[ScanResult]:
        findings = []
        for rule in self.rules:
            for match in rule.pattern.finditer(text):
                findings.append(ScanResult(
                    rule_name=rule.name,
                    matched_text=match.group(),
                    action=rule.action,
                    sensitivity=rule.sensitivity,
                    position=(match.start(), match.end()),
                ))
        return findings

    def redact(self, text: str) -> str:
        findings = sorted(self.scan(text), key=lambda f: f.position[0], reverse=True)
        for finding in findings:
            if finding.action in (Action.REDACT, Action.BLOCK):
                start, end = finding.position
                placeholder = f"[{finding.rule_name.upper()}_REDACTED]"
                text = text[:start] + placeholder + text[end:]
        return text

Integrating DLP Into the Agent Pipeline

The scanner runs at two points: when the user submits a prompt (inbound DLP) and when the agent generates a response or invokes a tool (outbound DLP). The gateway from the previous post is the ideal integration point.

from fastapi import HTTPException


class DLPMiddleware:
    def __init__(self, scanner: DLPScanner, audit_logger):
        self.scanner = scanner
        self.audit = audit_logger

    async def check_inbound(self, user_id: str, agent_id: str, text: str) -> str:
        findings = self.scanner.scan(text)
        if not findings:
            return text

        blocked = [f for f in findings if f.action == Action.BLOCK]
        if blocked:
            await self.audit.log_dlp_violation(
                user_id=user_id,
                agent_id=agent_id,
                direction="inbound",
                findings=[f.__dict__ for f in blocked],
            )
            raise HTTPException(
                status_code=422,
                detail=(
                    "Your message contains sensitive data that cannot "
                    "be processed. Please remove: "
                    + ", ".join(f.rule_name for f in blocked)
                ),
            )

        warnings = [f for f in findings if f.action == Action.WARN]
        if warnings:
            await self.audit.log_dlp_warning(
                user_id=user_id, agent_id=agent_id,
                direction="inbound", findings=[f.__dict__ for f in warnings],
            )

        redactable = [f for f in findings if f.action == Action.REDACT]
        if redactable:
            text = self.scanner.redact(text)

        return text

    async def check_outbound(self, agent_id: str, text: str) -> str:
        findings = self.scanner.scan(text)
        blocked = [f for f in findings if f.action == Action.BLOCK]
        if blocked:
            await self.audit.log_dlp_violation(
                user_id="system", agent_id=agent_id,
                direction="outbound",
                findings=[f.__dict__ for f in blocked],
            )
            return self.scanner.redact(text)
        return text

Named Entity Recognition for Context-Aware DLP

Regex catches formatted patterns like SSNs and credit card numbers. But sensitive data also appears as unstructured text: "John Smith's salary is $185,000" or "the patient was diagnosed with diabetes." Use NER models to detect person names, monetary values, medical terms, and organization names, then apply policies based on the entity type and the agent's data access level.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Exception Handling and Override Workflows

Not every match is a real violation. An agent discussing credit card processing might legitimately reference card number formats. Build an exception workflow where authorized users can request a DLP bypass for specific use cases. Each exception is logged, time-limited, and requires approval from a data steward.

FAQ

How do you handle DLP for agents that process documents and images?

For documents, extract text before scanning. For images, use OCR to extract visible text and scan the result. Also scan document metadata, which can contain author names, revision history, and internal file paths. For agents that generate images, implement a separate content moderation pipeline that checks for watermarks, logos, or embedded text containing sensitive data.

Does DLP scanning add noticeable latency to agent responses?

Regex-based scanning adds less than a millisecond for typical prompt sizes. NER-based scanning adds 10 to 50 milliseconds depending on the model and text length. This is negligible compared to LLM inference time. Run DLP scanning concurrently with other pre-processing steps to minimize any impact.

How do you keep DLP rules updated as new sensitive data patterns emerge?

Maintain DLP rules in a versioned configuration store, not in application code. Platform security teams update rules through the admin dashboard. New rules take effect immediately without redeploying the gateway. Run new rules in "audit only" mode for a week before enabling blocking, so you can tune false positive rates.

#EnterpriseAI #DLP #DataSecurity #Compliance #Privacy #ContentScanning #AgenticAI #LearnAI #AIEngineering

Data Loss Prevention for AI Agents: Preventing Sensitive Data Leakage

The Unique DLP Challenge with AI Agents

Building a DLP Scanner

Integrating DLP Into the Agent Pipeline

Named Entity Recognition for Context-Aware DLP

Exception Handling and Override Workflows

FAQ

How do you handle DLP for agents that process documents and images?

Does DLP scanning add noticeable latency to agent responses?

How do you keep DLP rules updated as new sensitive data patterns emerge?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding