DTMF Handling in AI Voice Agents: Processing Keypad Input During Calls
Master DTMF tone detection and processing in AI voice agents. Learn to build hybrid voice-and-keypad interfaces, handle multi-digit input, implement timeouts, and create fallback paths for accessibility.
Why DTMF Still Matters in the Age of Voice AI
Even as voice AI becomes increasingly capable, DTMF (the tones from phone keypad presses) remains essential. Callers in noisy environments cannot use voice. People with speech impairments rely on keypad input. Some users simply prefer pressing buttons. Regulatory requirements in certain industries mandate a non-voice input option. A robust AI phone agent must handle both voice and keypad input seamlessly.
DTMF stands for Dual-Tone Multi-Frequency — each key press generates two simultaneous tones that uniquely identify the digit. There are 16 possible signals: digits 0-9, symbols * and #, and letters A-D (rarely used).
DTMF Detection Methods
There are three ways DTMF tones reach your application. Understanding the differences is critical for reliable processing:
from enum import Enum
class DTMFMethod(Enum):
"""Three methods of DTMF delivery."""
# In-band: tones embedded in the audio stream (RTP)
# Least reliable — affected by audio compression
INBAND = "inband"
# RFC 2833: sent as named events in RTP packets
# Most common and reliable for SIP calls
RFC2833 = "rfc2833"
# SIP INFO: sent as SIP messages outside the media stream
# Used by some PBX systems
SIP_INFO = "sip_info"
Always configure your system to prefer RFC 2833. In-band detection requires audio analysis and is unreliable with compressed codecs like G.729.
Building a DTMF Input Handler
Here is a complete DTMF handler with buffering, timeouts, and validation:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import asyncio
from dataclasses import dataclass, field
from typing import Optional, Callable
from datetime import datetime
@dataclass
class DTMFSession:
"""Tracks DTMF input state for a single call."""
call_id: str
buffer: str = ""
last_digit_time: Optional[datetime] = None
expected_length: Optional[int] = None
terminator: str = "#"
timeout_seconds: float = 5.0
max_digits: int = 20
class DTMFHandler:
"""Processes DTMF input with buffering and validation."""
def __init__(self):
self.sessions: dict[str, DTMFSession] = {}
self.callbacks: dict[str, Callable] = {}
def create_session(
self,
call_id: str,
expected_length: Optional[int] = None,
terminator: str = "#",
timeout: float = 5.0,
) -> DTMFSession:
"""Start collecting DTMF input for a call."""
session = DTMFSession(
call_id=call_id,
expected_length=expected_length,
terminator=terminator,
timeout_seconds=timeout,
)
self.sessions[call_id] = session
return session
async def on_digit(self, call_id: str, digit: str):
"""Process a single DTMF digit."""
session = self.sessions.get(call_id)
if not session:
return
session.last_digit_time = datetime.utcnow()
# Check for terminator
if digit == session.terminator:
await self.complete_input(session)
return
# Append to buffer (respect max length)
if len(session.buffer) < session.max_digits:
session.buffer += digit
# Check if expected length reached
if (session.expected_length and
len(session.buffer) >= session.expected_length):
await self.complete_input(session)
async def complete_input(self, session: DTMFSession):
"""Input collection is complete — trigger callback."""
result = session.buffer
callback = self.callbacks.get(session.call_id)
if callback:
await callback(session.call_id, result)
# Reset for next input
session.buffer = ""
async def check_timeout(self, call_id: str):
"""Monitor for input timeout."""
session = self.sessions.get(call_id)
if not session or not session.last_digit_time:
return False
elapsed = (datetime.utcnow() - session.last_digit_time).seconds
if elapsed >= session.timeout_seconds and session.buffer:
await self.complete_input(session)
return True
return False
Hybrid Voice and Keypad Interface
The most effective approach lets callers switch between voice and keypad at any time:
from twilio.twiml.voice_response import VoiceResponse
class HybridInputHandler:
"""Accepts both voice and DTMF input simultaneously."""
def build_gather_twiml(
self,
prompt: str,
action_url: str,
dtmf_digits: int = 1,
speech_timeout: str = "auto",
) -> VoiceResponse:
"""Create TwiML that accepts voice OR keypad input."""
response = VoiceResponse()
gather = response.gather(
input="speech dtmf", # Accept both simultaneously
action=action_url,
method="POST",
speech_timeout=speech_timeout,
timeout=10,
num_digits=dtmf_digits,
language="en-US",
)
gather.say(prompt, voice="Polly.Joanna")
return response
def parse_gather_result(self, form_data: dict) -> dict:
"""Parse the result from a Gather — could be voice or DTMF."""
speech_result = form_data.get("SpeechResult")
dtmf_digits = form_data.get("Digits")
if dtmf_digits:
return {
"input_type": "dtmf",
"value": dtmf_digits,
"confidence": 1.0, # DTMF is always exact
}
elif speech_result:
return {
"input_type": "speech",
"value": speech_result,
"confidence": float(
form_data.get("Confidence", 0.0)
),
}
return {"input_type": "none", "value": None, "confidence": 0.0}
Multi-Digit Input Patterns
Different scenarios require different DTMF collection strategies:
class DTMFPatterns:
"""Common DTMF input patterns for phone systems."""
@staticmethod
def collect_menu_choice(max_option: int = 9) -> dict:
"""Single digit menu selection (press 1, 2, 3...)."""
return {
"num_digits": 1,
"valid_range": [str(i) for i in range(max_option + 1)],
"timeout": 5,
}
@staticmethod
def collect_account_number(length: int = 8) -> dict:
"""Fixed-length account number entry."""
return {
"num_digits": length,
"timeout": 10,
"finish_on_key": "#",
}
@staticmethod
def collect_phone_number() -> dict:
"""10-digit phone number with optional country code."""
return {
"num_digits": 10,
"timeout": 15,
"finish_on_key": "#",
}
@staticmethod
def collect_pin() -> dict:
"""4-6 digit PIN for authentication."""
return {
"num_digits": 6,
"timeout": 10,
"finish_on_key": "#",
}
@staticmethod
def yes_no_confirmation() -> dict:
"""1 for yes, 2 for no."""
return {
"num_digits": 1,
"valid_digits": ["1", "2"],
"timeout": 8,
}
def validate_dtmf_input(digits: str, pattern: dict) -> tuple:
"""Validate DTMF input against the expected pattern."""
valid_digits = pattern.get("valid_digits")
valid_range = pattern.get("valid_range")
expected_length = pattern.get("num_digits")
if expected_length and len(digits) != expected_length:
return False, f"Expected {expected_length} digits, got {len(digits)}"
if valid_digits and digits not in valid_digits:
return False, f"Invalid input: {digits}"
if valid_range and digits not in valid_range:
return False, f"Input out of range: {digits}"
return True, "valid"
Integrating DTMF with AI Decision Making
Use AI to interpret ambiguous DTMF sequences or to map keypad input to natural language intents:
async def interpret_dtmf_with_context(
digits: str,
call_context: dict,
ai_client,
) -> str:
"""Use AI to interpret DTMF input in conversation context."""
# Most DTMF is straightforward, but edge cases exist
if call_context.get("expecting") == "date":
# Caller entered 03172026 — interpret as a date
if len(digits) == 8:
month = digits[:2]
day = digits[2:4]
year = digits[4:]
return f"{year}-{month}-{day}"
if call_context.get("expecting") == "amount":
# Caller entered 15099 — interpret as $150.99
# Use star key as decimal: 150*99
if "*" in digits:
parts = digits.split("*")
return f"${parts[0]}.{parts[1]}"
return digits
FAQ
How do I handle DTMF on VoIP calls where tones get compressed?
VoIP codecs like G.729 and Opus can distort in-band DTMF tones. Always negotiate RFC 2833 (telephone-event payload type) during SIP session setup. In your SDP offer, include a=rtpmap:101 telephone-event/8000 to signal RFC 2833 support. If your VoIP provider does not support RFC 2833, use SIP INFO as a fallback. Never rely solely on in-band detection for VoIP calls.
What happens when a caller presses keys while the AI is speaking?
This is called "barge-in" and it depends on your configuration. With Twilio's <Gather>, DTMF input during a <Say> prompt interrupts the speech and begins collecting digits immediately. This is generally the desired behavior — callers who know what they want should not have to wait for the prompt to finish. If you need to prevent barge-in (e.g., during a legal disclaimer), use <Play> instead of <Say> as it does not respond to DTMF.
How do I handle star (*) and pound (#) keys in DTMF input?
The * key is commonly used as a "go back" or "cancel" command, while # typically signals "I am done entering." Define these conventions early and be consistent. In PIN entry, * might mean "clear and re-enter." In menus, * could mean "return to previous menu." Always announce these conventions to the caller: "Press star to go back, or pound when finished."
#DTMF #VoiceAI #KeypadInput #Accessibility #Telephony #HybridInterface #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.