Data Loss Prevention for AI Agents: Preventing Sensitive Data Leakage
Implement data loss prevention policies for AI agents that detect and block sensitive data in prompts and responses. Covers DLP policy engines, content scanning with regex and NER, blocking rules, and exception handling workflows.
The Unique DLP Challenge with AI Agents
Traditional DLP systems monitor file transfers, email attachments, and database exports. AI agents create a new exfiltration vector that bypasses all of these controls. An employee can paste a customer list into an agent prompt, ask it to summarize financial data from a confidential document, or instruct it to email internal metrics to an external address.
The risk is bidirectional. Sensitive data can leak into the agent (through prompts and tool inputs) and out of the agent (through responses, tool calls, and downstream API calls). A comprehensive DLP strategy must scan both directions.
Building a DLP Scanner
The scanner inspects text for patterns that match sensitive data categories: personally identifiable information, financial data, health records, credentials, and proprietary business data.
import re
from dataclasses import dataclass
from enum import Enum
class Sensitivity(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class Action(str, Enum):
ALLOW = "allow"
WARN = "warn"
REDACT = "redact"
BLOCK = "block"
@dataclass
class DLPRule:
name: str
pattern: re.Pattern
sensitivity: Sensitivity
action: Action
description: str
DLP_RULES = [
DLPRule(
name="ssn",
pattern=re.compile(r"d{3}-d{2}-d{4}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="US Social Security Number",
),
DLPRule(
name="credit_card",
pattern=re.compile(r"(?:d{4}[- ]?){3}d{4}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="Credit card number",
),
DLPRule(
name="email_address",
pattern=re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}"),
sensitivity=Sensitivity.MEDIUM,
action=Action.WARN,
description="Email address",
),
DLPRule(
name="api_key",
pattern=re.compile(r"(?:sk|pk|api)[_-][A-Za-z0-9]{20,}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="API key or secret",
),
DLPRule(
name="aws_access_key",
pattern=re.compile(r"AKIA[0-9A-Z]{16}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="AWS access key ID",
),
]
@dataclass
class ScanResult:
rule_name: str
matched_text: str
action: Action
sensitivity: Sensitivity
position: tuple[int, int]
class DLPScanner:
def __init__(self, rules: list[DLPRule]):
self.rules = rules
def scan(self, text: str) -> list[ScanResult]:
findings = []
for rule in self.rules:
for match in rule.pattern.finditer(text):
findings.append(ScanResult(
rule_name=rule.name,
matched_text=match.group(),
action=rule.action,
sensitivity=rule.sensitivity,
position=(match.start(), match.end()),
))
return findings
def redact(self, text: str) -> str:
findings = sorted(self.scan(text), key=lambda f: f.position[0], reverse=True)
for finding in findings:
if finding.action in (Action.REDACT, Action.BLOCK):
start, end = finding.position
placeholder = f"[{finding.rule_name.upper()}_REDACTED]"
text = text[:start] + placeholder + text[end:]
return text
Integrating DLP Into the Agent Pipeline
The scanner runs at two points: when the user submits a prompt (inbound DLP) and when the agent generates a response or invokes a tool (outbound DLP). The gateway from the previous post is the ideal integration point.
from fastapi import HTTPException
class DLPMiddleware:
def __init__(self, scanner: DLPScanner, audit_logger):
self.scanner = scanner
self.audit = audit_logger
async def check_inbound(self, user_id: str, agent_id: str, text: str) -> str:
findings = self.scanner.scan(text)
if not findings:
return text
blocked = [f for f in findings if f.action == Action.BLOCK]
if blocked:
await self.audit.log_dlp_violation(
user_id=user_id,
agent_id=agent_id,
direction="inbound",
findings=[f.__dict__ for f in blocked],
)
raise HTTPException(
status_code=422,
detail=(
"Your message contains sensitive data that cannot "
"be processed. Please remove: "
+ ", ".join(f.rule_name for f in blocked)
),
)
warnings = [f for f in findings if f.action == Action.WARN]
if warnings:
await self.audit.log_dlp_warning(
user_id=user_id, agent_id=agent_id,
direction="inbound", findings=[f.__dict__ for f in warnings],
)
redactable = [f for f in findings if f.action == Action.REDACT]
if redactable:
text = self.scanner.redact(text)
return text
async def check_outbound(self, agent_id: str, text: str) -> str:
findings = self.scanner.scan(text)
blocked = [f for f in findings if f.action == Action.BLOCK]
if blocked:
await self.audit.log_dlp_violation(
user_id="system", agent_id=agent_id,
direction="outbound",
findings=[f.__dict__ for f in blocked],
)
return self.scanner.redact(text)
return text
Named Entity Recognition for Context-Aware DLP
Regex catches formatted patterns like SSNs and credit card numbers. But sensitive data also appears as unstructured text: "John Smith's salary is $185,000" or "the patient was diagnosed with diabetes." Use NER models to detect person names, monetary values, medical terms, and organization names, then apply policies based on the entity type and the agent's data access level.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Exception Handling and Override Workflows
Not every match is a real violation. An agent discussing credit card processing might legitimately reference card number formats. Build an exception workflow where authorized users can request a DLP bypass for specific use cases. Each exception is logged, time-limited, and requires approval from a data steward.
FAQ
How do you handle DLP for agents that process documents and images?
For documents, extract text before scanning. For images, use OCR to extract visible text and scan the result. Also scan document metadata, which can contain author names, revision history, and internal file paths. For agents that generate images, implement a separate content moderation pipeline that checks for watermarks, logos, or embedded text containing sensitive data.
Does DLP scanning add noticeable latency to agent responses?
Regex-based scanning adds less than a millisecond for typical prompt sizes. NER-based scanning adds 10 to 50 milliseconds depending on the model and text length. This is negligible compared to LLM inference time. Run DLP scanning concurrently with other pre-processing steps to minimize any impact.
How do you keep DLP rules updated as new sensitive data patterns emerge?
Maintain DLP rules in a versioned configuration store, not in application code. Platform security teams update rules through the admin dashboard. New rules take effect immediately without redeploying the gateway. Run new rules in "audit only" mode for a week before enabling blocking, so you can tune false positive rates.
#EnterpriseAI #DLP #DataSecurity #Compliance #Privacy #ContentScanning #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.