Skip to content
Learn Agentic AI13 min read0 views

Building an IVR Replacement with AI: Natural Language Phone Menus

Replace rigid IVR phone trees with natural language AI agents. Learn to design conversational call flows, implement DTMF fallbacks, and handle edge cases for a seamless caller experience.

The Problem with Traditional IVR

Interactive Voice Response (IVR) systems have been the front door of business phone systems for decades. You know the experience: "Press 1 for sales, press 2 for support, press 3 for billing..." These rigid menu trees frustrate callers, increase abandonment rates, and often route people to the wrong department. Studies consistently show that over 60% of callers try to bypass IVR menus by pressing 0 repeatedly.

An AI-powered replacement lets callers simply state what they need in natural language. Instead of navigating a tree of options, the caller says "I need to change my shipping address" and the system routes them correctly — or handles the request directly.

Designing the Conversational Call Flow

A well-designed AI call flow needs three layers: the greeting, intent resolution, and action execution. Here is the architecture:

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional

class CallIntent(Enum):
    SALES = "sales"
    SUPPORT = "support"
    BILLING = "billing"
    APPOINTMENTS = "appointments"
    GENERAL = "general"
    UNKNOWN = "unknown"

@dataclass
class CallState:
    call_id: str
    caller_number: str
    intent: CallIntent = CallIntent.UNKNOWN
    confidence: float = 0.0
    attempts: int = 0
    max_attempts: int = 3
    context: dict = field(default_factory=dict)
    fallback_to_dtmf: bool = False

class AICallFlowManager:
    """Manages the conversational flow for incoming calls."""

    def __init__(self, ai_client, tts_engine, stt_engine):
        self.ai_client = ai_client
        self.tts_engine = tts_engine
        self.stt_engine = stt_engine

    async def handle_new_call(self, call_id, caller_number):
        state = CallState(call_id=call_id, caller_number=caller_number)

        # Greeting with open-ended prompt
        greeting = (
            "Thank you for calling Acme Corp. "
            "How can I help you today?"
        )
        await self.speak(call_id, greeting)

        # Listen for caller response
        transcript = await self.listen(call_id, timeout=8.0)

        if not transcript:
            state.attempts += 1
            return await self.handle_silence(state)

        return await self.resolve_intent(state, transcript)

Intent Resolution with AI

The core of the IVR replacement is an AI model that understands caller intent from natural speech:

from openai import AsyncOpenAI

class IntentResolver:
    """Resolves caller intent using an LLM."""

    def __init__(self):
        self.client = AsyncOpenAI()
        self.system_prompt = """You are a call routing assistant.
Classify the caller's statement into exactly one intent:
- sales: purchasing, pricing, demos, new accounts
- support: technical issues, troubleshooting, bugs
- billing: invoices, payments, refunds, charges
- appointments: scheduling, rescheduling, canceling
- general: anything else

Return JSON: {"intent": "...", "confidence": 0.0-1.0, "summary": "..."}"""

    async def resolve(self, transcript: str) -> dict:
        response = await self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": self.system_prompt},
                {"role": "user", "content": transcript},
            ],
            response_format={"type": "json_object"},
            temperature=0.1,
        )
        import json
        return json.loads(response.choices[0].message.content)

Notice the low temperature setting — for classification tasks you want deterministic, consistent results rather than creative variation.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

DTMF Fallback for Accessibility

Not every caller can or wants to use voice. Your system must provide a DTMF fallback path. This is also critical for compliance with accessibility requirements:

class DTMFFallbackHandler:
    """Provides traditional keypad navigation as a fallback."""

    MENU_MAP = {
        "1": CallIntent.SALES,
        "2": CallIntent.SUPPORT,
        "3": CallIntent.BILLING,
        "4": CallIntent.APPOINTMENTS,
        "0": "operator",
    }

    async def offer_dtmf_menu(self, call_id, speak_func):
        menu_text = (
            "You can also use your keypad. "
            "Press 1 for sales. Press 2 for support. "
            "Press 3 for billing. Press 4 for appointments. "
            "Press 0 for an operator."
        )
        await speak_func(call_id, menu_text)

    def resolve_dtmf(self, digit: str) -> Optional[CallIntent]:
        return self.MENU_MAP.get(digit)


class AICallFlowManager:
    """Extended flow manager with DTMF fallback logic."""

    async def handle_silence(self, state: CallState):
        if state.attempts >= 2:
            state.fallback_to_dtmf = True
            await self.dtmf_handler.offer_dtmf_menu(
                state.call_id, self.speak
            )
            return state

        prompts = [
            "I did not catch that. Could you tell me what you need help with?",
            "I am still here. You can describe your issue or I can give you a menu.",
        ]
        await self.speak(state.call_id, prompts[state.attempts - 1])
        state.attempts += 1
        return state

Handling Ambiguous Intent

When the AI is not confident about the intent, ask a clarifying question rather than routing blindly:

async def resolve_intent(self, state, transcript):
    result = await self.intent_resolver.resolve(transcript)
    state.intent = CallIntent(result["intent"])
    state.confidence = result["confidence"]

    if state.confidence >= 0.85:
        # High confidence — route directly
        await self.route_call(state)
    elif state.confidence >= 0.5:
        # Medium confidence — confirm with caller
        confirmation = (
            f"It sounds like you need help with "
            f"{state.intent.value}. Is that right?"
        )
        await self.speak(state.call_id, confirmation)
        answer = await self.listen(state.call_id, timeout=5.0)
        if self.is_affirmative(answer):
            await self.route_call(state)
        else:
            await self.speak(
                state.call_id,
                "I apologize. Could you describe what you need "
                "in a different way?"
            )
    else:
        # Low confidence — escalate or ask again
        state.attempts += 1
        if state.attempts >= state.max_attempts:
            await self.transfer_to_operator(state)
        else:
            await self.speak(
                state.call_id,
                "I want to make sure I connect you to the right "
                "person. Can you give me a bit more detail?"
            )

Measuring Success Against the Old IVR

Track these metrics to prove the AI replacement outperforms the traditional IVR: first-call resolution rate, average time to reach the correct department, caller abandonment rate, and misroute percentage. Most deployments see a 30-50% reduction in misroutes and a 20% decrease in caller abandonment within the first month.

FAQ

How do I handle callers who speak languages other than English?

Use language detection on the first few seconds of audio. Services like Google Speech-to-Text and Azure Speech support automatic language identification. Once detected, switch your TTS voice, STT model, and AI system prompts to that language. For high-traffic secondary languages, provide an explicit option: "Para espanol, presione dos."

What happens if the AI system goes down during a call?

Always implement a circuit breaker that falls back to a traditional DTMF menu when the AI service is unavailable. Monitor AI response latency and if it exceeds a threshold (e.g., 3 seconds), switch the active call to the DTMF path. The caller should never be left in silence because of a backend failure.

How much latency is acceptable for a natural phone conversation?

Human conversation tolerates about 300-400 milliseconds of round-trip delay before it feels unnatural. Your total pipeline — STT, AI inference, TTS — must complete within that window. Use streaming STT and TTS (start speaking before the full response is generated), keep AI prompts concise, and deploy inference close to your telephony infrastructure to minimize network hops.


#IVR #VoiceAI #CallFlow #NaturalLanguage #Telephony #CustomerExperience #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.