AI Agent Safety Levels: Designing Graduated Autonomy for Different Risk Contexts
Implement a tiered safety system for AI agents with graduated autonomy levels, approval workflows, monitoring intensity, and automatic rollback capabilities matched to risk context.
Why One-Size-Fits-All Safety Does Not Work
Not every AI agent action carries the same risk. Answering a factual question about store hours is fundamentally different from approving a $50,000 insurance claim or modifying a patient's medication schedule. Applying the same level of oversight to all actions either over-constrains low-risk operations (killing efficiency) or under-constrains high-risk ones (creating danger).
Graduated autonomy solves this by matching the level of agent freedom to the risk level of each specific action. This is the same principle used in aviation (autopilot handles cruising but pilots handle takeoff and landing) and medicine (nurses handle routine checks but doctors handle diagnoses).
Defining Safety Levels
Design five distinct safety levels that govern how much independence the agent has:
from enum import IntEnum
from dataclasses import dataclass, field
class SafetyLevel(IntEnum):
L0_FULL_AUTO = 0 # Agent acts without any human involvement
L1_LOG_AND_ACT = 1 # Agent acts, logs for async review
L2_NOTIFY_AND_ACT = 2 # Agent acts, notifies human immediately
L3_PROPOSE_AND_WAIT = 3 # Agent proposes, waits for human approval
L4_HUMAN_ONLY = 4 # Agent prepares information, human decides and acts
@dataclass
class SafetyPolicy:
level: SafetyLevel
max_financial_impact: float
requires_approval_from: list[str]
monitoring_frequency: str # "none", "sampled", "all"
rollback_enabled: bool
max_actions_per_hour: int
cooldown_after_error_seconds: int
escalation_path: list[str]
SAFETY_POLICIES = {
SafetyLevel.L0_FULL_AUTO: SafetyPolicy(
level=SafetyLevel.L0_FULL_AUTO,
max_financial_impact=0,
requires_approval_from=[],
monitoring_frequency="sampled",
rollback_enabled=False,
max_actions_per_hour=1000,
cooldown_after_error_seconds=0,
escalation_path=["system_alert"],
),
SafetyLevel.L1_LOG_AND_ACT: SafetyPolicy(
level=SafetyLevel.L1_LOG_AND_ACT,
max_financial_impact=100,
requires_approval_from=[],
monitoring_frequency="all",
rollback_enabled=True,
max_actions_per_hour=500,
cooldown_after_error_seconds=60,
escalation_path=["team_lead", "system_alert"],
),
SafetyLevel.L3_PROPOSE_AND_WAIT: SafetyPolicy(
level=SafetyLevel.L3_PROPOSE_AND_WAIT,
max_financial_impact=50000,
requires_approval_from=["domain_expert", "manager"],
monitoring_frequency="all",
rollback_enabled=True,
max_actions_per_hour=50,
cooldown_after_error_seconds=3600,
escalation_path=["manager", "director", "legal"],
),
}
Classifying Actions by Risk
Build an action classifier that assigns the appropriate safety level to each agent action:
@dataclass
class ActionRiskProfile:
action_type: str
reversible: bool
financial_impact: float
affects_personal_data: bool
regulatory_implications: bool
user_impact_scope: str # "single_user", "team", "organization", "public"
def classify_action_risk(profile: ActionRiskProfile) -> SafetyLevel:
"""Assign a safety level based on the action's risk characteristics."""
risk_score = 0.0
# Financial impact scoring
if profile.financial_impact > 10000:
risk_score += 4
elif profile.financial_impact > 1000:
risk_score += 3
elif profile.financial_impact > 100:
risk_score += 2
elif profile.financial_impact > 0:
risk_score += 1
# Reversibility
if not profile.reversible:
risk_score += 2
# Data sensitivity
if profile.affects_personal_data:
risk_score += 2
# Regulatory
if profile.regulatory_implications:
risk_score += 3
# Scope of impact
scope_scores = {"single_user": 0, "team": 1, "organization": 2, "public": 3}
risk_score += scope_scores.get(profile.user_impact_scope, 0)
# Map score to safety level
if risk_score >= 10:
return SafetyLevel.L4_HUMAN_ONLY
elif risk_score >= 7:
return SafetyLevel.L3_PROPOSE_AND_WAIT
elif risk_score >= 4:
return SafetyLevel.L2_NOTIFY_AND_ACT
elif risk_score >= 2:
return SafetyLevel.L1_LOG_AND_ACT
else:
return SafetyLevel.L0_FULL_AUTO
Implementing the Approval Workflow
For L3 (propose and wait) actions, the agent must pause and request human approval:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import asyncio
from datetime import datetime, timezone
@dataclass
class ApprovalRequest:
request_id: str
agent_id: str
action_description: str
proposed_action: dict
risk_profile: ActionRiskProfile
safety_level: SafetyLevel
required_approvers: list[str]
approvals_received: list[dict] = field(default_factory=list)
status: str = "pending" # "pending", "approved", "rejected", "expired"
created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
expires_at: datetime | None = None
def is_fully_approved(self) -> bool:
approved_by = {a["approver"] for a in self.approvals_received if a["decision"] == "approve"}
return all(req in approved_by for req in self.required_approvers)
async def execute_with_approval(agent, action: dict, risk_profile: ActionRiskProfile) -> dict:
safety_level = classify_action_risk(risk_profile)
policy = SAFETY_POLICIES.get(safety_level)
if safety_level == SafetyLevel.L4_HUMAN_ONLY:
return {
"status": "deferred_to_human",
"message": "This action requires human execution.",
"prepared_data": action,
}
if safety_level == SafetyLevel.L3_PROPOSE_AND_WAIT:
request = ApprovalRequest(
request_id=generate_id(),
agent_id=agent.id,
action_description=action.get("description", ""),
proposed_action=action,
risk_profile=risk_profile,
safety_level=safety_level,
required_approvers=policy.requires_approval_from,
)
await submit_approval_request(request)
return {
"status": "awaiting_approval",
"request_id": request.request_id,
"required_approvers": policy.requires_approval_from,
}
# L0, L1, L2: execute with appropriate logging
result = await agent.execute_action(action)
if safety_level >= SafetyLevel.L2_NOTIFY_AND_ACT:
await notify_stakeholders(agent.id, action, result)
return {"status": "executed", "result": result, "safety_level": safety_level.name}
Automatic Rollback
For reversible actions, implement automatic rollback when anomalies are detected:
@dataclass
class RollbackCapability:
action_id: str
rollback_function: str
rollback_params: dict
created_at: datetime
expires_at: datetime # Rollback is only possible within a time window
class RollbackManager:
def __init__(self):
self.rollback_registry: dict[str, RollbackCapability] = {}
def register(self, action_id: str, rollback_fn: str, params: dict, ttl_hours: int = 24) -> None:
from datetime import timedelta
now = datetime.now(timezone.utc)
self.rollback_registry[action_id] = RollbackCapability(
action_id=action_id,
rollback_function=rollback_fn,
rollback_params=params,
created_at=now,
expires_at=now + timedelta(hours=ttl_hours),
)
async def rollback(self, action_id: str, reason: str) -> dict:
capability = self.rollback_registry.get(action_id)
if not capability:
return {"success": False, "error": "No rollback registered for this action"}
now = datetime.now(timezone.utc)
if now > capability.expires_at:
return {"success": False, "error": "Rollback window has expired"}
# Execute the rollback
result = await execute_rollback(capability.rollback_function, capability.rollback_params)
return {"success": True, "rolled_back_action": action_id, "reason": reason, "result": result}
Monitoring Intensity by Safety Level
Adjust monitoring granularity based on the safety level of actions:
class SafetyMonitor:
def __init__(self, sample_rate: float = 0.1):
self.sample_rate = sample_rate
async def should_monitor(self, safety_level: SafetyLevel) -> bool:
policy = SAFETY_POLICIES.get(safety_level)
if not policy:
return True # Monitor unknown safety levels
if policy.monitoring_frequency == "all":
return True
elif policy.monitoring_frequency == "sampled":
import random
return random.random() < self.sample_rate
return False
FAQ
How do I decide which safety level to assign to a new agent capability?
Start at L3 (propose and wait) for any new capability and only reduce the safety level after collecting sufficient data. Track the human override rate — the percentage of times a human reviewer changes the agent's proposed action. When the override rate drops below 2% over at least 1,000 actions, consider moving to L2. Below 0.5% over 5,000 actions, consider L1. Never move directly from L3 to L0; always go through intermediate levels.
What happens when the approval workflow creates a bottleneck?
Set expiry times on approval requests so they do not queue indefinitely. Implement delegation rules so that if the primary approver is unavailable, a backup can approve. For time-sensitive actions, allow the safety level to temporarily decrease by one level with an automatic escalation notification. Track approval latency as a key metric and adjust staffing or delegation rules when it exceeds your SLA.
Should safety levels be configurable per customer or deployment?
Yes, but only in the direction of increasing safety. A healthcare deployment should be able to raise the default safety levels but never lower them below your minimum thresholds. Implement this as a safety floor that the system enforces regardless of configuration, plus configurable overrides that can only increase safety requirements above that floor.
#AIEthics #Safety #Autonomy #RiskManagement #ResponsibleAI #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.