Building a Voicemail AI Agent: Transcription, Analysis, and Automated Response
Build an intelligent voicemail system that transcribes messages, scores priority, extracts action items, and schedules callbacks automatically. Covers voicemail detection, message processing, and smart notifications.
Rethinking Voicemail with AI
Traditional voicemail is a black hole. Messages pile up, important calls get buried under spam, and by the time someone listens to a message, the moment has passed. An AI-powered voicemail agent transforms this experience: every message is instantly transcribed, analyzed for urgency, scored by priority, and routed to the right person with a recommended action. Critical messages trigger immediate notifications. Routine ones get batched into a daily digest.
This is not just voicemail transcription — it is an intelligent message processing pipeline.
Voicemail Detection and Greeting
The first challenge is knowing when to activate the voicemail system. This happens when a call goes unanswered or when the AI screening agent decides to take a message:
from twilio.twiml.voice_response import VoiceResponse
from fastapi import FastAPI, Request
from fastapi.responses import Response
app = FastAPI()
@app.post("/voicemail-greeting")
async def voicemail_greeting(request: Request):
"""Play a personalized voicemail greeting and record."""
form = await request.form()
called_number = form.get("Called")
caller_number = form.get("From")
# Look up the mailbox owner for a personalized greeting
owner = await get_mailbox_owner(called_number)
response = VoiceResponse()
if owner and owner.get("custom_greeting_url"):
response.play(owner["custom_greeting_url"])
else:
name = owner.get("name", "the person you are calling") if owner else "us"
response.say(
f"You have reached {name}. "
"Please leave a message after the tone and "
"I will make sure it gets to the right person.",
voice="Polly.Joanna",
)
response.pause(length=1)
response.play("https://api.twilio.com/beep.mp3")
# Record the voicemail
response.record(
action="/voicemail-complete",
max_length=180, # 3 minutes max
timeout=5, # 5 seconds of silence to stop
transcribe=False, # We will use our own transcription
recording_status_callback="/recording-ready",
play_beep=False, # We already played our own
)
# Fallback if caller does not leave a message
response.say("No message was recorded. Goodbye.")
response.hangup()
return Response(content=str(response), media_type="application/xml")
Message Transcription Pipeline
When the recording is ready, download and transcribe it with high accuracy:
import httpx
import os
from deepgram import DeepgramClient, PrerecordedOptions
from datetime import datetime
deepgram = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])
async def transcribe_voicemail(recording_url: str) -> dict:
"""Download and transcribe a voicemail recording."""
async with httpx.AsyncClient() as client:
resp = await client.get(
f"{recording_url}.wav",
auth=(
os.environ["TWILIO_ACCOUNT_SID"],
os.environ["TWILIO_AUTH_TOKEN"],
),
)
audio_bytes = resp.content
options = PrerecordedOptions(
model="nova-2",
smart_format=True,
punctuate=True,
paragraphs=True,
detect_language=True,
sentiment=True,
)
result = await deepgram.listen.asyncrest.v("1").transcribe_file(
{"buffer": audio_bytes, "mimetype": "audio/wav"},
options,
)
transcript = result.results.channels[0].alternatives[0]
return {
"text": transcript.transcript,
"confidence": transcript.confidence,
"language": result.results.channels[0].detected_language,
"words": [
{
"word": w.word,
"start": w.start,
"end": w.end,
"confidence": w.confidence,
}
for w in transcript.words
],
"duration": result.metadata.duration,
}
AI-Powered Message Analysis
Analyze the transcribed message to extract structured information:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from openai import AsyncOpenAI
client = AsyncOpenAI()
VOICEMAIL_ANALYSIS_PROMPT = """Analyze this voicemail message and extract:
1. caller_name: if mentioned
2. callback_number: if a different number is provided
3. summary: 1-2 sentence summary
4. intent: the caller's purpose (inquiry, complaint, appointment, urgent, sales, personal, spam)
5. urgency: 1-10 score (10 = emergency, 1 = junk)
6. sentiment: positive, neutral, negative, distressed
7. action_items: specific actions requested
8. entities: names, dates, account numbers, amounts mentioned
9. is_spam: boolean — telemarketer, robocall, or solicitation
10. suggested_response: recommended reply approach
Return valid JSON."""
async def analyze_voicemail(
transcript_text: str,
caller_number: str,
caller_history: dict,
) -> dict:
"""Run AI analysis on a voicemail transcript."""
context = ""
if caller_history:
context = (
f"\nCaller history: {caller_history.get('total_calls', 0)} "
f"previous calls, last contact: "
f"{caller_history.get('last_contact', 'never')}. "
f"Known as: {caller_history.get('name', 'unknown')}."
)
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": VOICEMAIL_ANALYSIS_PROMPT},
{
"role": "user",
"content": f"Transcript: {transcript_text}{context}",
},
],
response_format={"type": "json_object"},
temperature=0.2,
)
import json
return json.loads(response.choices[0].message.content)
Priority Scoring and Smart Routing
Not all voicemails are equal. Score and route them based on the analysis:
from dataclasses import dataclass
from typing import Optional
@dataclass
class ProcessedVoicemail:
id: str
caller_number: str
recording_url: str
transcript: str
analysis: dict
priority_score: int
mailbox_owner: str
created_at: datetime
callback_scheduled: Optional[datetime] = None
class VoicemailRouter:
"""Routes processed voicemails based on priority and content."""
URGENCY_THRESHOLDS = {
"immediate_notify": 8, # Phone push + SMS
"priority_notify": 5, # Email + app notification
"batch_digest": 1, # Daily summary
"spam_discard": 0, # Auto-archive
}
async def route_voicemail(
self, voicemail: ProcessedVoicemail
) -> str:
"""Determine notification strategy based on priority."""
analysis = voicemail.analysis
score = analysis.get("urgency", 5)
if analysis.get("is_spam"):
await self.archive_spam(voicemail)
return "spam_archived"
if score >= self.URGENCY_THRESHOLDS["immediate_notify"]:
await self.send_immediate_notification(voicemail)
await self.schedule_callback(voicemail, delay_minutes=15)
return "immediate"
if score >= self.URGENCY_THRESHOLDS["priority_notify"]:
await self.send_priority_notification(voicemail)
await self.schedule_callback(voicemail, delay_minutes=60)
return "priority"
await self.add_to_digest(voicemail)
return "batched"
async def send_immediate_notification(
self, voicemail: ProcessedVoicemail
):
"""Push notification with transcript and suggested action."""
message = (
f"URGENT VOICEMAIL from {voicemail.analysis.get('caller_name', voicemail.caller_number)}\n"
f"Summary: {voicemail.analysis['summary']}\n"
f"Action: {voicemail.analysis.get('suggested_response', 'Call back ASAP')}"
)
await self.push_notification(voicemail.mailbox_owner, message)
await self.send_sms(voicemail.mailbox_owner, message)
async def schedule_callback(
self, voicemail: ProcessedVoicemail, delay_minutes: int
):
"""Schedule an automated callback if not handled manually."""
from datetime import timedelta
callback_time = datetime.utcnow() + timedelta(minutes=delay_minutes)
callback_number = (
voicemail.analysis.get("callback_number")
or voicemail.caller_number
)
await self.db_pool.execute(
"""
INSERT INTO scheduled_callbacks
(voicemail_id, phone_number, scheduled_at, status, context)
VALUES ($1, $2, $3, 'pending', $4)
""",
voicemail.id,
callback_number,
callback_time,
json.dumps(voicemail.analysis),
)
Automated Callback System
For voicemails that request a callback, the AI can handle the return call:
class AutoCallbackEngine:
"""Handles automated callbacks for voicemail follow-up."""
async def execute_callback(
self, callback_id: str, voicemail: ProcessedVoicemail
):
"""Place an automated callback based on voicemail context."""
context = voicemail.analysis
# Generate a personalized callback script
script = await self.generate_callback_script(context)
# Place the call
call = self.twilio_client.calls.create(
to=context.get("callback_number", voicemail.caller_number),
from_=os.environ["TWILIO_NUMBER"],
url=(
f"{self.webhook_base}/callback-answer"
f"?callback_id={callback_id}"
),
machine_detection="DetectMessageEnd",
)
return call.sid
async def generate_callback_script(self, context: dict) -> str:
"""Generate a contextual callback opening."""
response = await self.ai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": (
"Generate a brief, professional callback "
"opening based on the voicemail context. "
"Reference the caller's original message to "
"show you listened. Keep it under 3 sentences."
),
},
{
"role": "user",
"content": (
f"Caller: {context.get('caller_name', 'the caller')}. "
f"Their message: {context['summary']}. "
f"They wanted: {', '.join(context.get('action_items', ['a callback']))}"
),
},
],
)
return response.choices[0].message.content
The Complete Processing Pipeline
Wire everything together in an async pipeline:
async def process_voicemail_pipeline(
recording_sid: str,
recording_url: str,
call_sid: str,
caller_number: str,
called_number: str,
):
"""End-to-end voicemail processing pipeline."""
# Step 1: Transcribe
transcript = await transcribe_voicemail(recording_url)
if transcript["confidence"] < 0.3:
# Very low confidence — store raw recording, skip analysis
await store_raw_voicemail(recording_sid, recording_url)
return
# Step 2: Get caller history
caller_history = await get_caller_history(caller_number)
# Step 3: Analyze
analysis = await analyze_voicemail(
transcript["text"], caller_number, caller_history
)
# Step 4: Create processed voicemail record
voicemail = ProcessedVoicemail(
id=recording_sid,
caller_number=caller_number,
recording_url=recording_url,
transcript=transcript["text"],
analysis=analysis,
priority_score=analysis.get("urgency", 5),
mailbox_owner=await get_mailbox_owner(called_number),
created_at=datetime.utcnow(),
)
# Step 5: Store in database
await store_processed_voicemail(voicemail)
# Step 6: Route based on priority
route_result = await voicemail_router.route_voicemail(voicemail)
print(
f"Voicemail from {caller_number}: "
f"urgency={analysis.get('urgency')}, "
f"intent={analysis.get('intent')}, "
f"routed={route_result}"
)
FAQ
How do I detect if a voicemail system answered instead of a human?
When making outbound calls, use Twilio's machine_detection parameter set to DetectMessageEnd. This uses audio analysis to distinguish human speech patterns from voicemail greetings. It detects the greeting, waits for the beep, and then connects your webhook so you can leave a message at the right moment. Detection accuracy is approximately 90% — design your opening line to work gracefully in both scenarios.
What is the best way to handle voicemails in languages other than English?
Use a transcription service with automatic language detection (Deepgram and Whisper both support this). Once the language is detected, switch your AI analysis prompt to that language or use a multilingual model. Store the detected language alongside the transcript so notifications can be formatted appropriately. For businesses serving multilingual populations, consider offering the voicemail greeting in multiple languages.
How do I handle very long voicemails or callers who ramble?
Set a max_length on the recording (120-180 seconds is typical). For analysis of long messages, the AI naturally handles this — the summary and action items extraction will distill even a rambling 3-minute message into a concise output. If you want to discourage long messages, your greeting can say "Please leave a brief message" and you can use the timeout parameter to stop recording after a few seconds of silence.
#Voicemail #Transcription #AIAnalysis #CallbackScheduling #VoiceAI #Automation #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.