Building a Meeting Notes Agent: Transcription, Summary, and Action Item Extraction

Meetings Produce Value Only When Captured

The average professional spends 23 hours per week in meetings, yet most meeting outcomes evaporate within 24 hours. Without structured notes, decisions get revisited, action items fall through cracks, and absent team members miss critical context. A meeting notes agent solves this by transcribing audio, generating structured summaries, extracting action items with owners, and distributing the results to all participants.

This guide builds a complete meeting notes agent using Whisper for transcription, an LLM for intelligent summarization, and automated distribution via email or Slack.

Transcribing Audio with Whisper

The first step is converting meeting audio to text. OpenAI's Whisper API handles this with high accuracy across languages and accents:

from openai import OpenAI
from pathlib import Path
from dataclasses import dataclass

client = OpenAI()

@dataclass
class TranscriptSegment:
    start: float
    end: float
    text: str
    speaker: str = ""

def transcribe_audio(audio_path: str) -> list[TranscriptSegment]:
    """Transcribe meeting audio using Whisper with timestamps."""
    file_path = Path(audio_path)

    # Split long files into 25MB chunks (Whisper API limit)
    segments = []
    file_size = file_path.stat().st_size
    max_size = 25 * 1024 * 1024  # 25MB

    if file_size <= max_size:
        with open(audio_path, "rb") as f:
            response = client.audio.transcriptions.create(
                model="whisper-1",
                file=f,
                response_format="verbose_json",
                timestamp_granularities=["segment"],
            )
        for seg in response.segments:
            segments.append(TranscriptSegment(
                start=seg["start"],
                end=seg["end"],
                text=seg["text"].strip(),
            ))
    else:
        segments = _transcribe_chunked(audio_path, max_size)

    return segments

def _transcribe_chunked(audio_path: str, max_size: int) -> list[TranscriptSegment]:
    """Handle large audio files by splitting into chunks."""
    from pydub import AudioSegment

    audio = AudioSegment.from_file(audio_path)
    chunk_duration_ms = 10 * 60 * 1000  # 10 minutes per chunk
    segments = []
    offset = 0.0

    for i in range(0, len(audio), chunk_duration_ms):
        chunk = audio[i:i + chunk_duration_ms]
        chunk_path = f"/tmp/chunk_{i}.mp3"
        chunk.export(chunk_path, format="mp3")

        with open(chunk_path, "rb") as f:
            response = client.audio.transcriptions.create(
                model="whisper-1",
                file=f,
                response_format="verbose_json",
                timestamp_granularities=["segment"],
            )
        for seg in response.segments:
            segments.append(TranscriptSegment(
                start=seg["start"] + offset,
                end=seg["end"] + offset,
                text=seg["text"].strip(),
            ))
        offset += chunk_duration_ms / 1000

    return segments

Speaker Diarization

Knowing who said what transforms a transcript from a wall of text into a conversation. We use a simple heuristic with pyannote or a dedicated API:

def format_transcript_with_speakers(
    segments: list[TranscriptSegment],
    speaker_map: dict[str, str] | None = None,
) -> str:
    """Format transcript segments into a readable conversation."""
    if speaker_map is None:
        speaker_map = {}

    lines = []
    current_speaker = ""
    for seg in segments:
        speaker = speaker_map.get(seg.speaker, seg.speaker or "Speaker")
        timestamp = _format_time(seg.start)
        if speaker != current_speaker:
            lines.append(f"\n**{speaker}** [{timestamp}]:")
            current_speaker = speaker
        lines.append(f"  {seg.text}")

    return "\n".join(lines)

def _format_time(seconds: float) -> str:
    minutes = int(seconds // 60)
    secs = int(seconds % 60)
    return f"{minutes:02d}:{secs:02d}"

Generating Structured Summaries

The LLM transforms the raw transcript into a structured meeting summary with decisions, topics discussed, and key takeaways:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

import json

def generate_meeting_summary(transcript: str, meeting_title: str = "") -> dict:
    """Generate a structured meeting summary from transcript."""
    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.2,
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a meeting notes assistant. Analyze the transcript and "
                    "return JSON with:\n"
                    "- title: meeting title (infer from content if not provided)\n"
                    "- date: meeting date if mentioned\n"
                    "- participants: list of participant names detected\n"
                    "- executive_summary: 2-3 sentence overview\n"
                    "- topics_discussed: list of {topic, key_points, decisions_made}\n"
                    "- action_items: list of {task, assignee, deadline, priority}\n"
                    "- open_questions: list of unresolved questions\n"
                    "- next_steps: list of agreed next steps"
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Meeting: {meeting_title}\n\nTranscript:\n{transcript[:12000]}"
                ),
            },
        ],
    )
    return json.loads(response.choices[0].message.content)

Truncating the transcript to 12,000 characters keeps the request within token limits for most models. For longer meetings, split the transcript into chunks and summarize each chunk before generating a final summary.

Extracting Action Items with Assignees

Action items deserve special attention because they drive follow-up. The agent extracts them with explicit assignees, deadlines, and priority levels:

def extract_action_items(summary: dict) -> list[dict]:
    """Extract and validate action items from the meeting summary."""
    items = summary.get("action_items", [])
    validated = []
    for item in items:
        validated.append({
            "task": item.get("task", ""),
            "assignee": item.get("assignee", "Unassigned"),
            "deadline": item.get("deadline", "Not specified"),
            "priority": item.get("priority", "medium"),
            "status": "pending",
        })
    return validated

def format_action_items_markdown(items: list[dict]) -> str:
    """Format action items as a Markdown checklist."""
    lines = ["## Action Items\n"]
    for item in items:
        priority_emoji = {"high": "[HIGH]", "medium": "[MED]", "low": "[LOW]"}.get(
            item["priority"], ""
        )
        lines.append(
            f"- [ ] {priority_emoji} **{item['task']}** — "
            f"Assigned to: {item['assignee']} | Due: {item['deadline']}"
        )
    return "\n".join(lines)

Distributing Meeting Notes

The agent formats the summary and sends it to participants via email or posts it to a Slack channel:

def format_meeting_notes(summary: dict, action_items: list[dict]) -> str:
    """Format complete meeting notes as Markdown."""
    notes = [f"# {summary.get('title', 'Meeting Notes')}\n"]
    notes.append(f"**Date:** {summary.get('date', 'Not specified')}")
    notes.append(f"**Participants:** {', '.join(summary.get('participants', []))}\n")
    notes.append(f"## Summary\n{summary.get('executive_summary', '')}\n")

    for topic in summary.get("topics_discussed", []):
        notes.append(f"### {topic['topic']}")
        for point in topic.get("key_points", []):
            notes.append(f"- {point}")
        if topic.get("decisions_made"):
            for decision in topic["decisions_made"]:
                notes.append(f"- **Decision:** {decision}")
        notes.append("")

    notes.append(format_action_items_markdown(action_items))

    if summary.get("open_questions"):
        notes.append("\n## Open Questions")
        for q in summary["open_questions"]:
            notes.append(f"- {q}")

    return "\n".join(notes)

def send_to_slack(webhook_url: str, notes: str):
    """Post meeting notes to a Slack channel via webhook."""
    import httpx
    httpx.post(webhook_url, json={"text": notes})

FAQ

How accurate is Whisper for meeting transcription?

Whisper achieves word error rates between 5 and 10 percent for clear English audio. Accuracy drops with heavy accents, background noise, or multiple people speaking simultaneously. For critical meetings, consider using a higher-quality microphone setup and post-processing the transcript with an LLM to fix obvious transcription errors.

How do I handle meetings longer than one hour?

Split the audio into 10-minute chunks for transcription, then concatenate the results. For summarization, generate a summary per chunk first, then feed all chunk summaries into a final summarization pass. This two-stage approach handles meetings of any length while staying within token limits.

Can the agent create tasks in project management tools?

Yes. After extracting action items, use the Jira, Asana, or Linear API to create tasks automatically. Map the assignee field to user IDs in your project management tool, set due dates from the extracted deadlines, and link back to the meeting notes for context.

#MeetingNotes #AIAgents #Transcription #Summarization #WorkflowAutomation #Python #AgenticAI #LearnAI #AIEngineering

Building a Meeting Notes Agent: Transcription, Summary, and Action Item Extraction

Meetings Produce Value Only When Captured

Transcribing Audio with Whisper

Speaker Diarization

Generating Structured Summaries

Extracting Action Items with Assignees

Distributing Meeting Notes

FAQ

How accurate is Whisper for meeting transcription?

How do I handle meetings longer than one hour?

Can the agent create tasks in project management tools?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding