AI Agent Safety Levels: Designing Graduated Autonomy for Different Risk Contexts

Why One-Size-Fits-All Safety Does Not Work

Not every AI agent action carries the same risk. Answering a factual question about store hours is fundamentally different from approving a $50,000 insurance claim or modifying a patient's medication schedule. Applying the same level of oversight to all actions either over-constrains low-risk operations (killing efficiency) or under-constrains high-risk ones (creating danger).

Graduated autonomy solves this by matching the level of agent freedom to the risk level of each specific action. This is the same principle used in aviation (autopilot handles cruising but pilots handle takeoff and landing) and medicine (nurses handle routine checks but doctors handle diagnoses).

Defining Safety Levels

Design five distinct safety levels that govern how much independence the agent has:

from enum import IntEnum
from dataclasses import dataclass, field

class SafetyLevel(IntEnum):
    L0_FULL_AUTO = 0       # Agent acts without any human involvement
    L1_LOG_AND_ACT = 1     # Agent acts, logs for async review
    L2_NOTIFY_AND_ACT = 2  # Agent acts, notifies human immediately
    L3_PROPOSE_AND_WAIT = 3  # Agent proposes, waits for human approval
    L4_HUMAN_ONLY = 4      # Agent prepares information, human decides and acts

@dataclass
class SafetyPolicy:
    level: SafetyLevel
    max_financial_impact: float
    requires_approval_from: list[str]
    monitoring_frequency: str  # "none", "sampled", "all"
    rollback_enabled: bool
    max_actions_per_hour: int
    cooldown_after_error_seconds: int
    escalation_path: list[str]

SAFETY_POLICIES = {
    SafetyLevel.L0_FULL_AUTO: SafetyPolicy(
        level=SafetyLevel.L0_FULL_AUTO,
        max_financial_impact=0,
        requires_approval_from=[],
        monitoring_frequency="sampled",
        rollback_enabled=False,
        max_actions_per_hour=1000,
        cooldown_after_error_seconds=0,
        escalation_path=["system_alert"],
    ),
    SafetyLevel.L1_LOG_AND_ACT: SafetyPolicy(
        level=SafetyLevel.L1_LOG_AND_ACT,
        max_financial_impact=100,
        requires_approval_from=[],
        monitoring_frequency="all",
        rollback_enabled=True,
        max_actions_per_hour=500,
        cooldown_after_error_seconds=60,
        escalation_path=["team_lead", "system_alert"],
    ),
    SafetyLevel.L3_PROPOSE_AND_WAIT: SafetyPolicy(
        level=SafetyLevel.L3_PROPOSE_AND_WAIT,
        max_financial_impact=50000,
        requires_approval_from=["domain_expert", "manager"],
        monitoring_frequency="all",
        rollback_enabled=True,
        max_actions_per_hour=50,
        cooldown_after_error_seconds=3600,
        escalation_path=["manager", "director", "legal"],
    ),
}

Classifying Actions by Risk

Build an action classifier that assigns the appropriate safety level to each agent action:

@dataclass
class ActionRiskProfile:
    action_type: str
    reversible: bool
    financial_impact: float
    affects_personal_data: bool
    regulatory_implications: bool
    user_impact_scope: str  # "single_user", "team", "organization", "public"

def classify_action_risk(profile: ActionRiskProfile) -> SafetyLevel:
    """Assign a safety level based on the action's risk characteristics."""
    risk_score = 0.0

    # Financial impact scoring
    if profile.financial_impact > 10000:
        risk_score += 4
    elif profile.financial_impact > 1000:
        risk_score += 3
    elif profile.financial_impact > 100:
        risk_score += 2
    elif profile.financial_impact > 0:
        risk_score += 1

    # Reversibility
    if not profile.reversible:
        risk_score += 2

    # Data sensitivity
    if profile.affects_personal_data:
        risk_score += 2

    # Regulatory
    if profile.regulatory_implications:
        risk_score += 3

    # Scope of impact
    scope_scores = {"single_user": 0, "team": 1, "organization": 2, "public": 3}
    risk_score += scope_scores.get(profile.user_impact_scope, 0)

    # Map score to safety level
    if risk_score >= 10:
        return SafetyLevel.L4_HUMAN_ONLY
    elif risk_score >= 7:
        return SafetyLevel.L3_PROPOSE_AND_WAIT
    elif risk_score >= 4:
        return SafetyLevel.L2_NOTIFY_AND_ACT
    elif risk_score >= 2:
        return SafetyLevel.L1_LOG_AND_ACT
    else:
        return SafetyLevel.L0_FULL_AUTO

Implementing the Approval Workflow

For L3 (propose and wait) actions, the agent must pause and request human approval:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import asyncio
from datetime import datetime, timezone

@dataclass
class ApprovalRequest:
    request_id: str
    agent_id: str
    action_description: str
    proposed_action: dict
    risk_profile: ActionRiskProfile
    safety_level: SafetyLevel
    required_approvers: list[str]
    approvals_received: list[dict] = field(default_factory=list)
    status: str = "pending"  # "pending", "approved", "rejected", "expired"
    created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    expires_at: datetime | None = None

    def is_fully_approved(self) -> bool:
        approved_by = {a["approver"] for a in self.approvals_received if a["decision"] == "approve"}
        return all(req in approved_by for req in self.required_approvers)

async def execute_with_approval(agent, action: dict, risk_profile: ActionRiskProfile) -> dict:
    safety_level = classify_action_risk(risk_profile)
    policy = SAFETY_POLICIES.get(safety_level)

    if safety_level == SafetyLevel.L4_HUMAN_ONLY:
        return {
            "status": "deferred_to_human",
            "message": "This action requires human execution.",
            "prepared_data": action,
        }

    if safety_level == SafetyLevel.L3_PROPOSE_AND_WAIT:
        request = ApprovalRequest(
            request_id=generate_id(),
            agent_id=agent.id,
            action_description=action.get("description", ""),
            proposed_action=action,
            risk_profile=risk_profile,
            safety_level=safety_level,
            required_approvers=policy.requires_approval_from,
        )
        await submit_approval_request(request)
        return {
            "status": "awaiting_approval",
            "request_id": request.request_id,
            "required_approvers": policy.requires_approval_from,
        }

    # L0, L1, L2: execute with appropriate logging
    result = await agent.execute_action(action)

    if safety_level >= SafetyLevel.L2_NOTIFY_AND_ACT:
        await notify_stakeholders(agent.id, action, result)

    return {"status": "executed", "result": result, "safety_level": safety_level.name}

Automatic Rollback

For reversible actions, implement automatic rollback when anomalies are detected:

@dataclass
class RollbackCapability:
    action_id: str
    rollback_function: str
    rollback_params: dict
    created_at: datetime
    expires_at: datetime  # Rollback is only possible within a time window

class RollbackManager:
    def __init__(self):
        self.rollback_registry: dict[str, RollbackCapability] = {}

    def register(self, action_id: str, rollback_fn: str, params: dict, ttl_hours: int = 24) -> None:
        from datetime import timedelta
        now = datetime.now(timezone.utc)
        self.rollback_registry[action_id] = RollbackCapability(
            action_id=action_id,
            rollback_function=rollback_fn,
            rollback_params=params,
            created_at=now,
            expires_at=now + timedelta(hours=ttl_hours),
        )

    async def rollback(self, action_id: str, reason: str) -> dict:
        capability = self.rollback_registry.get(action_id)
        if not capability:
            return {"success": False, "error": "No rollback registered for this action"}

        now = datetime.now(timezone.utc)
        if now > capability.expires_at:
            return {"success": False, "error": "Rollback window has expired"}

        # Execute the rollback
        result = await execute_rollback(capability.rollback_function, capability.rollback_params)
        return {"success": True, "rolled_back_action": action_id, "reason": reason, "result": result}

Monitoring Intensity by Safety Level

Adjust monitoring granularity based on the safety level of actions:

class SafetyMonitor:
    def __init__(self, sample_rate: float = 0.1):
        self.sample_rate = sample_rate

    async def should_monitor(self, safety_level: SafetyLevel) -> bool:
        policy = SAFETY_POLICIES.get(safety_level)
        if not policy:
            return True  # Monitor unknown safety levels

        if policy.monitoring_frequency == "all":
            return True
        elif policy.monitoring_frequency == "sampled":
            import random
            return random.random() < self.sample_rate
        return False

FAQ

How do I decide which safety level to assign to a new agent capability?

Start at L3 (propose and wait) for any new capability and only reduce the safety level after collecting sufficient data. Track the human override rate — the percentage of times a human reviewer changes the agent's proposed action. When the override rate drops below 2% over at least 1,000 actions, consider moving to L2. Below 0.5% over 5,000 actions, consider L1. Never move directly from L3 to L0; always go through intermediate levels.

What happens when the approval workflow creates a bottleneck?

Set expiry times on approval requests so they do not queue indefinitely. Implement delegation rules so that if the primary approver is unavailable, a backup can approve. For time-sensitive actions, allow the safety level to temporarily decrease by one level with an automatic escalation notification. Track approval latency as a key metric and adjust staffing or delegation rules when it exceeds your SLA.

Should safety levels be configurable per customer or deployment?

Yes, but only in the direction of increasing safety. A healthcare deployment should be able to raise the default safety levels but never lower them below your minimum thresholds. Implement this as a safety floor that the system enforces regardless of configuration, plus configurable overrides that can only increase safety requirements above that floor.

#AIEthics #Safety #Autonomy #RiskManagement #ResponsibleAI #AgenticAI #LearnAI #AIEngineering

AI Agent Safety Levels: Designing Graduated Autonomy for Different Risk Contexts

Why One-Size-Fits-All Safety Does Not Work

Defining Safety Levels

Classifying Actions by Risk

Implementing the Approval Workflow

Automatic Rollback

Monitoring Intensity by Safety Level

FAQ

How do I decide which safety level to assign to a new agent capability?

What happens when the approval workflow creates a bottleneck?

Should safety levels be configurable per customer or deployment?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding