Building a Meeting Notes Agent: Transcription, Summary, and Action Item Extraction
Build an AI agent that transcribes meeting audio, generates structured summaries with key decisions, extracts action items with assignees, and distributes notes to participants automatically.
Meetings Produce Value Only When Captured
The average professional spends 23 hours per week in meetings, yet most meeting outcomes evaporate within 24 hours. Without structured notes, decisions get revisited, action items fall through cracks, and absent team members miss critical context. A meeting notes agent solves this by transcribing audio, generating structured summaries, extracting action items with owners, and distributing the results to all participants.
This guide builds a complete meeting notes agent using Whisper for transcription, an LLM for intelligent summarization, and automated distribution via email or Slack.
Transcribing Audio with Whisper
The first step is converting meeting audio to text. OpenAI's Whisper API handles this with high accuracy across languages and accents:
from openai import OpenAI
from pathlib import Path
from dataclasses import dataclass
client = OpenAI()
@dataclass
class TranscriptSegment:
start: float
end: float
text: str
speaker: str = ""
def transcribe_audio(audio_path: str) -> list[TranscriptSegment]:
"""Transcribe meeting audio using Whisper with timestamps."""
file_path = Path(audio_path)
# Split long files into 25MB chunks (Whisper API limit)
segments = []
file_size = file_path.stat().st_size
max_size = 25 * 1024 * 1024 # 25MB
if file_size <= max_size:
with open(audio_path, "rb") as f:
response = client.audio.transcriptions.create(
model="whisper-1",
file=f,
response_format="verbose_json",
timestamp_granularities=["segment"],
)
for seg in response.segments:
segments.append(TranscriptSegment(
start=seg["start"],
end=seg["end"],
text=seg["text"].strip(),
))
else:
segments = _transcribe_chunked(audio_path, max_size)
return segments
def _transcribe_chunked(audio_path: str, max_size: int) -> list[TranscriptSegment]:
"""Handle large audio files by splitting into chunks."""
from pydub import AudioSegment
audio = AudioSegment.from_file(audio_path)
chunk_duration_ms = 10 * 60 * 1000 # 10 minutes per chunk
segments = []
offset = 0.0
for i in range(0, len(audio), chunk_duration_ms):
chunk = audio[i:i + chunk_duration_ms]
chunk_path = f"/tmp/chunk_{i}.mp3"
chunk.export(chunk_path, format="mp3")
with open(chunk_path, "rb") as f:
response = client.audio.transcriptions.create(
model="whisper-1",
file=f,
response_format="verbose_json",
timestamp_granularities=["segment"],
)
for seg in response.segments:
segments.append(TranscriptSegment(
start=seg["start"] + offset,
end=seg["end"] + offset,
text=seg["text"].strip(),
))
offset += chunk_duration_ms / 1000
return segments
Speaker Diarization
Knowing who said what transforms a transcript from a wall of text into a conversation. We use a simple heuristic with pyannote or a dedicated API:
def format_transcript_with_speakers(
segments: list[TranscriptSegment],
speaker_map: dict[str, str] | None = None,
) -> str:
"""Format transcript segments into a readable conversation."""
if speaker_map is None:
speaker_map = {}
lines = []
current_speaker = ""
for seg in segments:
speaker = speaker_map.get(seg.speaker, seg.speaker or "Speaker")
timestamp = _format_time(seg.start)
if speaker != current_speaker:
lines.append(f"\n**{speaker}** [{timestamp}]:")
current_speaker = speaker
lines.append(f" {seg.text}")
return "\n".join(lines)
def _format_time(seconds: float) -> str:
minutes = int(seconds // 60)
secs = int(seconds % 60)
return f"{minutes:02d}:{secs:02d}"
Generating Structured Summaries
The LLM transforms the raw transcript into a structured meeting summary with decisions, topics discussed, and key takeaways:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import json
def generate_meeting_summary(transcript: str, meeting_title: str = "") -> dict:
"""Generate a structured meeting summary from transcript."""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": (
"You are a meeting notes assistant. Analyze the transcript and "
"return JSON with:\n"
"- title: meeting title (infer from content if not provided)\n"
"- date: meeting date if mentioned\n"
"- participants: list of participant names detected\n"
"- executive_summary: 2-3 sentence overview\n"
"- topics_discussed: list of {topic, key_points, decisions_made}\n"
"- action_items: list of {task, assignee, deadline, priority}\n"
"- open_questions: list of unresolved questions\n"
"- next_steps: list of agreed next steps"
),
},
{
"role": "user",
"content": (
f"Meeting: {meeting_title}\n\nTranscript:\n{transcript[:12000]}"
),
},
],
)
return json.loads(response.choices[0].message.content)
Truncating the transcript to 12,000 characters keeps the request within token limits for most models. For longer meetings, split the transcript into chunks and summarize each chunk before generating a final summary.
Extracting Action Items with Assignees
Action items deserve special attention because they drive follow-up. The agent extracts them with explicit assignees, deadlines, and priority levels:
def extract_action_items(summary: dict) -> list[dict]:
"""Extract and validate action items from the meeting summary."""
items = summary.get("action_items", [])
validated = []
for item in items:
validated.append({
"task": item.get("task", ""),
"assignee": item.get("assignee", "Unassigned"),
"deadline": item.get("deadline", "Not specified"),
"priority": item.get("priority", "medium"),
"status": "pending",
})
return validated
def format_action_items_markdown(items: list[dict]) -> str:
"""Format action items as a Markdown checklist."""
lines = ["## Action Items\n"]
for item in items:
priority_emoji = {"high": "[HIGH]", "medium": "[MED]", "low": "[LOW]"}.get(
item["priority"], ""
)
lines.append(
f"- [ ] {priority_emoji} **{item['task']}** — "
f"Assigned to: {item['assignee']} | Due: {item['deadline']}"
)
return "\n".join(lines)
Distributing Meeting Notes
The agent formats the summary and sends it to participants via email or posts it to a Slack channel:
def format_meeting_notes(summary: dict, action_items: list[dict]) -> str:
"""Format complete meeting notes as Markdown."""
notes = [f"# {summary.get('title', 'Meeting Notes')}\n"]
notes.append(f"**Date:** {summary.get('date', 'Not specified')}")
notes.append(f"**Participants:** {', '.join(summary.get('participants', []))}\n")
notes.append(f"## Summary\n{summary.get('executive_summary', '')}\n")
for topic in summary.get("topics_discussed", []):
notes.append(f"### {topic['topic']}")
for point in topic.get("key_points", []):
notes.append(f"- {point}")
if topic.get("decisions_made"):
for decision in topic["decisions_made"]:
notes.append(f"- **Decision:** {decision}")
notes.append("")
notes.append(format_action_items_markdown(action_items))
if summary.get("open_questions"):
notes.append("\n## Open Questions")
for q in summary["open_questions"]:
notes.append(f"- {q}")
return "\n".join(notes)
def send_to_slack(webhook_url: str, notes: str):
"""Post meeting notes to a Slack channel via webhook."""
import httpx
httpx.post(webhook_url, json={"text": notes})
FAQ
How accurate is Whisper for meeting transcription?
Whisper achieves word error rates between 5 and 10 percent for clear English audio. Accuracy drops with heavy accents, background noise, or multiple people speaking simultaneously. For critical meetings, consider using a higher-quality microphone setup and post-processing the transcript with an LLM to fix obvious transcription errors.
How do I handle meetings longer than one hour?
Split the audio into 10-minute chunks for transcription, then concatenate the results. For summarization, generate a summary per chunk first, then feed all chunk summaries into a final summarization pass. This two-stage approach handles meetings of any length while staying within token limits.
Can the agent create tasks in project management tools?
Yes. After extracting action items, use the Jira, Asana, or Linear API to create tasks automatically. Map the assignee field to user IDs in your project management tool, set due dates from the extracted deadlines, and link back to the meeting notes for context.
#MeetingNotes #AIAgents #Transcription #Summarization #WorkflowAutomation #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.