Voice Agent Error Recovery: Handling Network Issues, Transcription Failures, and Timeouts

Why Voice Agents Need Robust Error Handling

Voice agents operate in a uniquely unforgiving environment. When a web page encounters an API error, it can show a loading spinner or an error message and the user waits patiently. When a voice agent goes silent for 3 seconds because of an unhandled error, the user thinks the call dropped. They hang up, and you lose the interaction.

Every component in the voice pipeline can fail: STT services return empty transcripts, LLM APIs time out, TTS services produce garbled audio, and network connections drop mid-conversation. Building a production voice agent means planning for every failure mode and ensuring the agent always has something to say.

The Error Recovery Framework

A comprehensive error recovery system has four layers: detection, classification, recovery, and user communication.

from enum import Enum
from dataclasses import dataclass
import asyncio
import time

class ErrorSeverity(Enum):
    TRANSIENT = "transient"       # Retry likely to succeed
    DEGRADED = "degraded"         # Partial functionality available
    CRITICAL = "critical"         # Cannot continue normally

class ErrorCategory(Enum):
    STT_FAILURE = "stt_failure"
    LLM_TIMEOUT = "llm_timeout"
    LLM_ERROR = "llm_error"
    TTS_FAILURE = "tts_failure"
    NETWORK = "network"
    AUDIO_QUALITY = "audio_quality"

@dataclass
class VoiceError:
    category: ErrorCategory
    severity: ErrorSeverity
    message: str
    timestamp: float
    retryable: bool = True

class ErrorRecoveryManager:
    def __init__(self):
        self.error_history = []
        self.circuit_breakers = {}
        self.fallback_audio = {}  # Pre-synthesized fallback messages

    def classify_error(self, exception: Exception, stage: str) -> VoiceError:
        """Classify an exception into a structured VoiceError."""
        if isinstance(exception, asyncio.TimeoutError):
            if stage == "llm":
                return VoiceError(
                    category=ErrorCategory.LLM_TIMEOUT,
                    severity=ErrorSeverity.TRANSIENT,
                    message="LLM response timed out",
                    timestamp=time.time(),
                )
            return VoiceError(
                category=ErrorCategory.NETWORK,
                severity=ErrorSeverity.TRANSIENT,
                message=f"Timeout in {stage}",
                timestamp=time.time(),
            )

        if isinstance(exception, ConnectionError):
            return VoiceError(
                category=ErrorCategory.NETWORK,
                severity=ErrorSeverity.DEGRADED,
                message=str(exception),
                timestamp=time.time(),
            )

        return VoiceError(
            category=ErrorCategory.LLM_ERROR,
            severity=ErrorSeverity.CRITICAL,
            message=str(exception),
            timestamp=time.time(),
            retryable=False,
        )

Retry Strategies with Exponential Backoff

For transient errors, retries are the first line of defense. But voice agents cannot afford the long backoff delays typical in backend systems — the user is waiting in real time.

class VoiceRetryPolicy:
    """Fast retry policy optimized for real-time voice interactions."""

    def __init__(
        self,
        max_retries: int = 2,
        initial_delay_ms: int = 100,
        max_delay_ms: int = 500,
        backoff_factor: float = 2.0,
    ):
        self.max_retries = max_retries
        self.initial_delay_ms = initial_delay_ms
        self.max_delay_ms = max_delay_ms
        self.backoff_factor = backoff_factor

    async def execute(self, func, *args, **kwargs):
        """Execute with retries, returning result or raising last error."""
        last_error = None
        delay_ms = self.initial_delay_ms

        for attempt in range(self.max_retries + 1):
            try:
                return await asyncio.wait_for(
                    func(*args, **kwargs),
                    timeout=2.0,  # Hard timeout per attempt
                )
            except Exception as e:
                last_error = e
                if attempt < self.max_retries:
                    await asyncio.sleep(delay_ms / 1000)
                    delay_ms = min(
                        delay_ms * self.backoff_factor,
                        self.max_delay_ms,
                    )

        raise last_error

# Usage
retry = VoiceRetryPolicy(max_retries=2, initial_delay_ms=100)
try:
    result = await retry.execute(llm_client.generate, prompt)
except Exception:
    # All retries exhausted — use fallback
    result = get_fallback_response(prompt)

Circuit Breaker Pattern

When a service is consistently failing, retries waste time and degrade the user experience. A circuit breaker stops attempting calls to a failing service and switches to a fallback immediately.

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 3,
        reset_timeout_s: float = 30.0,
        name: str = "default",
    ):
        self.failure_threshold = failure_threshold
        self.reset_timeout_s = reset_timeout_s
        self.name = name
        self.failure_count = 0
        self.last_failure_time = 0
        self.state = "closed"  # closed = normal, open = failing

    def can_execute(self) -> bool:
        if self.state == "closed":
            return True

        # Check if enough time has passed to retry (half-open)
        elapsed = time.time() - self.last_failure_time
        if elapsed >= self.reset_timeout_s:
            self.state = "half-open"
            return True

        return False

    def record_success(self):
        self.failure_count = 0
        self.state = "closed"

    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = "open"
            print(f"Circuit breaker [{self.name}] OPEN — using fallback")

class ResilientLLMClient:
    def __init__(self, primary_client, fallback_client):
        self.primary = primary_client
        self.fallback = fallback_client
        self.breaker = CircuitBreaker(name="llm", failure_threshold=3)

    async def generate(self, messages: list) -> str:
        if self.breaker.can_execute():
            try:
                result = await asyncio.wait_for(
                    self.primary.chat(messages), timeout=3.0
                )
                self.breaker.record_success()
                return result
            except Exception:
                self.breaker.record_failure()

        # Fallback to secondary LLM
        return await self.fallback.chat(messages)

Handling STT Failures

STT failures fall into two categories: empty transcripts (the engine returned nothing) and low-confidence transcripts (the engine returned unreliable text).

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

class STTErrorHandler:
    def __init__(self):
        self.consecutive_empty = 0
        self.max_empty_before_prompt = 3

    async def handle_transcript(
        self, text: str, confidence: float, is_final: bool
    ) -> dict:
        if not is_final:
            return {"action": "wait", "text": text}

        # Empty transcript
        if not text or not text.strip():
            self.consecutive_empty += 1
            if self.consecutive_empty >= self.max_empty_before_prompt:
                self.consecutive_empty = 0
                return {
                    "action": "prompt_user",
                    "message": "I'm having trouble hearing you. "
                               "Could you speak a bit louder or move "
                               "closer to your microphone?",
                }
            return {"action": "ignore"}

        # Low confidence transcript
        if confidence < 0.6:
            return {
                "action": "confirm",
                "message": f'I think you said "{text}". Is that right?',
                "original_text": text,
            }

        # Good transcript
        self.consecutive_empty = 0
        return {"action": "process", "text": text}

Pre-Synthesized Fallback Audio

The worst thing a voice agent can do is go silent during an error. Pre-synthesize fallback messages at startup so they are always available, even if the TTS service is down.

class FallbackAudioLibrary:
    def __init__(self):
        self.audio_cache = {}

    async def preload(self, tts_client):
        """Pre-synthesize all fallback messages at startup."""
        fallback_messages = {
            "generic_error": "I'm sorry, I'm having a technical "
                             "issue right now. Let me try again.",
            "network_error": "It seems we're having connection "
                             "issues. Please hold on a moment.",
            "cant_hear": "I'm having trouble hearing you. Could "
                         "you try speaking a little louder?",
            "timeout": "I apologize for the delay. Let me look "
                       "into that for you.",
            "repeat": "I'm sorry, could you repeat that?",
            "transfer": "Let me connect you with a human agent "
                        "who can help you better.",
            "goodbye": "Thank you for calling. Goodbye!",
        }

        for key, message in fallback_messages.items():
            try:
                self.audio_cache[key] = await tts_client.synthesize(message)
                print(f"Pre-loaded fallback: {key}")
            except Exception as e:
                print(f"Warning: Could not pre-load {key}: {e}")

    def get(self, key: str) -> bytes | None:
        return self.audio_cache.get(key)

Network Disconnection and Reconnection

WebSocket and WebRTC connections can drop at any time. Implement automatic reconnection with state recovery.

class ResilientConnection {
  constructor(url, options = {}) {
    this.url = url;
    this.maxRetries = options.maxRetries || 5;
    this.baseDelay = options.baseDelay || 1000;
    this.retryCount = 0;
    this.ws = null;
    this.messageQueue = [];
    this.onMessage = options.onMessage || (() => {});
    this.onReconnect = options.onReconnect || (() => {});
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onopen = () => {
      console.log('Connected');
      this.retryCount = 0;
      // Flush queued messages
      while (this.messageQueue.length > 0) {
        this.ws.send(this.messageQueue.shift());
      }
      this.onReconnect();
    };

    this.ws.onmessage = (event) => this.onMessage(event);

    this.ws.onclose = (event) => {
      if (event.code !== 1000) {
        // Abnormal closure — attempt reconnect
        this.reconnect();
      }
    };

    this.ws.onerror = () => {
      // Error will trigger onclose, which handles reconnection
    };
  }

  reconnect() {
    if (this.retryCount >= this.maxRetries) {
      console.error('Max reconnection attempts reached');
      return;
    }

    const delay = this.baseDelay * Math.pow(2, this.retryCount);
    const jitter = delay * 0.2 * Math.random();
    this.retryCount++;

    console.log(
      'Reconnecting in ' + Math.round(delay + jitter) + 'ms ' +
      '(attempt ' + this.retryCount + '/' + this.maxRetries + ')'
    );

    setTimeout(() => this.connect(), delay + jitter);
  }

  send(data) {
    if (this.ws && this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(data);
    } else {
      // Queue messages during disconnection
      this.messageQueue.push(data);
    }
  }
}

Graceful Degradation Strategy

When multiple components fail, degrade gracefully rather than crashing. Define a degradation hierarchy.

class DegradationManager:
    """Manage graceful degradation when services fail."""

    def __init__(self):
        self.service_status = {
            "stt": True,
            "llm": True,
            "tts": True,
        }

    def get_degradation_level(self) -> str:
        if all(self.service_status.values()):
            return "full"          # All services operational
        if self.service_status["llm"]:
            return "limited"       # Can still reason, but degraded I/O
        return "emergency"         # Cannot reason, transfer to human

    async def handle_request(self, audio_input, pipeline, transfer_fn):
        level = self.get_degradation_level()

        if level == "full":
            return await pipeline.full_process(audio_input)

        elif level == "limited":
            # STT or TTS down — use text fallback
            if not self.service_status["stt"]:
                # Ask user to type instead
                return pipeline.get_fallback_audio("type_instead")
            if not self.service_status["tts"]:
                # Return text response for display
                transcript = await pipeline.stt_process(audio_input)
                return await pipeline.llm_process(transcript)

        else:
            # Emergency — transfer to human
            await transfer_fn()
            return pipeline.get_fallback_audio("transfer")

FAQ

How many retries should a voice agent attempt before falling back?

For real-time voice, limit retries to 1-2 attempts with very short delays (100-200ms). The total retry budget should not exceed 500ms. Users are waiting in silence during retries, and even a half-second of silence feels awkward. It is better to play a brief fallback message ("One moment, please") and retry in the background than to leave the user in silence while retrying.

Should the agent tell the user when an error occurs?

Yes, but frame it conversationally, not technically. Instead of "I experienced a transcription error," say "I didn't quite catch that — could you say that again?" Users do not need to know about your internal architecture. The goal is to keep the conversation flowing naturally even when things go wrong behind the scenes. Only escalate to explicit error messaging ("I'm having technical difficulties") when the problem persists across multiple exchanges.

How do I test error recovery in voice agents?

Use chaos engineering principles. Build a test harness that injects failures at each pipeline stage: drop STT connections mid-stream, return empty transcripts, add 5-second LLM delays, and corrupt TTS audio. Run automated conversations through this harness and verify that the agent always responds within your latency budget and never goes silent. Record these test sessions and listen to them to verify the recovery experience sounds natural.

#ErrorRecovery #VoiceAI #Resilience #RetryStrategies #GracefulDegradation #FaultTolerance #AgenticAI #LearnAI #AIEngineering