Call Recording and Transcription for AI Analysis: Building a Call Analytics Pipeline
Build a complete call analytics pipeline that records calls, transcribes them, and extracts actionable insights using AI. Covers recording APIs, speaker diarization, sentiment analysis, and trend detection.
Why Call Analytics Matters
Every phone call your business handles is a goldmine of unstructured data — customer pain points, competitor mentions, product feedback, and sales signals. Without a structured analytics pipeline, these insights vanish the moment the call ends. A call analytics pipeline captures recordings, transcribes them accurately, and uses AI to extract structured insights at scale.
The pipeline has four stages: recording, transcription, analysis, and storage. Each stage feeds the next, and the final output is a structured dataset you can query, visualize, and act on.
Stage 1: Recording Calls
Using Twilio as an example, enabling call recording is a single parameter in your TwiML:
from twilio.twiml.voice_response import VoiceResponse
from fastapi import FastAPI, Request
from fastapi.responses import Response
app = FastAPI()
@app.post("/incoming-call")
async def handle_call(request: Request):
response = VoiceResponse()
# Enable dual-channel recording (separate tracks per speaker)
response.start().record(
name="call-recording",
track="both_legs", # Separate caller and agent audio
)
response.say("Thank you for calling. How can I help?")
gather = response.gather(input="speech", action="/handle-speech")
return Response(content=str(response), media_type="application/xml")
@app.post("/recording-status")
async def recording_complete(request: Request):
"""Webhook called when recording is finalized."""
form = await request.form()
recording_sid = form["RecordingSid"]
recording_url = form["RecordingUrl"]
duration = int(form["RecordingDuration"])
call_sid = form["CallSid"]
# Trigger the transcription pipeline
await start_transcription_pipeline(
recording_sid=recording_sid,
recording_url=f"{recording_url}.wav",
duration=duration,
call_sid=call_sid,
)
return {"status": "accepted"}
Dual-channel recording is critical for analytics — it puts each speaker on a separate audio track, which dramatically improves transcription accuracy and makes speaker diarization trivial.
Stage 2: Transcription with Speaker Diarization
Download the recording and run it through a speech-to-text engine with speaker separation:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import httpx
from deepgram import DeepgramClient, PrerecordedOptions
deepgram = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])
async def transcribe_recording(recording_url: str, auth_token: str):
"""Download recording and transcribe with speaker diarization."""
# Download the recording from Twilio
async with httpx.AsyncClient() as client:
resp = await client.get(
recording_url,
auth=(os.environ["TWILIO_ACCOUNT_SID"], auth_token),
)
audio_bytes = resp.content
# Transcribe with Deepgram (diarization + punctuation)
options = PrerecordedOptions(
model="nova-2",
smart_format=True,
diarize=True,
punctuate=True,
utterances=True,
language="en-US",
)
response = await deepgram.listen.asyncrest.v("1").transcribe_file(
{"buffer": audio_bytes, "mimetype": "audio/wav"},
options,
)
# Structure the transcript by speaker
utterances = response.results.utterances
structured_transcript = []
for utterance in utterances:
structured_transcript.append({
"speaker": f"Speaker {utterance.speaker}",
"text": utterance.transcript,
"start": utterance.start,
"end": utterance.end,
"confidence": utterance.confidence,
})
return structured_transcript
Stage 3: AI-Powered Analysis
With a structured transcript in hand, use an LLM to extract insights:
from openai import AsyncOpenAI
client = AsyncOpenAI()
ANALYSIS_PROMPT = """Analyze this call transcript and extract:
1. **Summary**: 2-3 sentence summary of the call
2. **Sentiment**: overall (positive/neutral/negative), and per-speaker
3. **Intent**: caller's primary intent (support, sales, complaint, etc.)
4. **Key Topics**: list of topics discussed
5. **Action Items**: any follow-up actions promised
6. **Satisfaction Score**: 1-10 estimate of caller satisfaction
7. **Escalation Risk**: low/medium/high
8. **Competitor Mentions**: any competitor names mentioned
Return valid JSON matching this schema exactly."""
async def analyze_transcript(transcript: list[dict]) -> dict:
"""Run AI analysis on a structured transcript."""
# Format transcript for the LLM
formatted = "\n".join(
f"[{t['speaker']}] ({t['start']:.1f}s): {t['text']}"
for t in transcript
)
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": ANALYSIS_PROMPT},
{"role": "user", "content": formatted},
],
response_format={"type": "json_object"},
temperature=0.2,
)
import json
return json.loads(response.choices[0].message.content)
Stage 4: Storage and Querying
Store the raw transcript and analysis results in a database optimized for querying:
import asyncpg
import json
from datetime import datetime
async def store_call_analysis(
pool: asyncpg.Pool,
call_sid: str,
transcript: list[dict],
analysis: dict,
duration: int,
):
"""Persist call data and analysis to PostgreSQL."""
await pool.execute(
"""
INSERT INTO call_analytics (
call_sid, transcript, summary, sentiment,
intent, topics, action_items, satisfaction_score,
escalation_risk, competitor_mentions,
duration_seconds, analyzed_at
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
""",
call_sid,
json.dumps(transcript),
analysis["summary"],
analysis["sentiment"],
analysis["intent"],
analysis["key_topics"],
json.dumps(analysis["action_items"]),
analysis["satisfaction_score"],
analysis["escalation_risk"],
analysis.get("competitor_mentions", []),
duration,
datetime.utcnow(),
)
async def get_insights_summary(pool: asyncpg.Pool, days: int = 7):
"""Query aggregate insights over a time period."""
return await pool.fetch(
"""
SELECT
intent,
COUNT(*) as call_count,
AVG(satisfaction_score) as avg_satisfaction,
COUNT(*) FILTER (WHERE escalation_risk = 'high') as escalations,
array_agg(DISTINCT unnest_topics) as all_topics
FROM call_analytics,
LATERAL unnest(topics) as unnest_topics
WHERE analyzed_at >= NOW() - make_interval(days => $1)
GROUP BY intent
ORDER BY call_count DESC
""",
days,
)
The Complete Pipeline
Wire all four stages together with an async task queue:
async def start_transcription_pipeline(
recording_sid: str,
recording_url: str,
duration: int,
call_sid: str,
):
"""Orchestrate the full recording-to-insights pipeline."""
# Stage 2: Transcribe
transcript = await transcribe_recording(
recording_url, os.environ["TWILIO_AUTH_TOKEN"]
)
# Stage 3: Analyze
analysis = await analyze_transcript(transcript)
# Stage 4: Store
await store_call_analysis(
db_pool, call_sid, transcript, analysis, duration
)
print(f"Pipeline complete for call {call_sid}: "
f"intent={analysis['intent']}, "
f"satisfaction={analysis['satisfaction_score']}/10")
FAQ
How long does the pipeline take per call?
Transcription takes roughly 20-30% of the call duration with modern engines like Deepgram Nova-2. AI analysis adds 2-5 seconds. For a 5-minute call, expect the full pipeline to complete in about 90 seconds. Run it asynchronously after the call ends so it never impacts call quality.
What about call recording consent laws?
Recording laws vary by jurisdiction. In "two-party consent" states (like California) and countries (like Germany), you must inform all parties and obtain consent before recording. Add a recording disclosure at the start of every call and implement a mechanism to disable recording if consent is denied. Consult legal counsel for your specific jurisdictions.
How accurate is modern speech-to-text for phone calls?
Modern engines like Deepgram Nova-2 and OpenAI Whisper achieve 90-95% accuracy on clean phone audio. Accuracy drops with heavy accents, background noise, or poor phone connections. Dual-channel recording improves accuracy by 5-10% because each speaker has a clean audio track. Always store the raw recording alongside the transcript so you can re-transcribe as models improve.
#CallAnalytics #Transcription #SentimentAnalysis #SpeechtoText #VoiceAI #DataPipeline #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.