Skip to content
Learn Agentic AI14 min read0 views

Building a Whiteboard-to-Code Agent: Converting Hand-Drawn Diagrams to Working Software

Learn how to build an AI agent that recognizes hand-drawn diagrams on whiteboards, classifies shapes and connections, and generates working code including Mermaid diagrams, database schemas, and API stubs.

From Sketch to Code in Seconds

Whiteboards are where software architecture happens. Teams sketch entity-relationship diagrams, flowcharts, system architectures, and UI wireframes during design sessions. But these diagrams typically die on the whiteboard — someone takes a photo, it gets buried in a Slack thread, and the knowledge is effectively lost.

A whiteboard-to-code agent changes this. It takes a photo of a whiteboard, identifies the shapes, arrows, and text, understands the diagram type, and produces working code artifacts: Mermaid diagrams for documentation, SQL schemas for databases, API route stubs, or even class definitions.

Architecture of the Agent

The pipeline has four stages:

  1. Image preprocessing — clean up whiteboard photo artifacts
  2. Element detection — find shapes (boxes, circles, diamonds) and connections (arrows, lines)
  3. Semantic classification — determine diagram type and element meanings
  4. Code generation — produce the appropriate code output

Image Preprocessing for Whiteboards

Whiteboard photos have unique challenges: glare, perspective distortion, marker color variations, and erased-but-visible ghost text:

import cv2
import numpy as np


def preprocess_whiteboard(image_path: str) -> np.ndarray:
    """Clean up a whiteboard photo for element detection."""
    img = cv2.imread(image_path)

    # Perspective correction: find the whiteboard boundary
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edges = cv2.Canny(blurred, 50, 150)

    contours, _ = cv2.findContours(
        edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    if contours:
        largest = max(contours, key=cv2.contourArea)
        epsilon = 0.02 * cv2.arcLength(largest, True)
        approx = cv2.approxPolyDP(largest, epsilon, True)

        if len(approx) == 4:
            pts = approx.reshape(4, 2).astype(np.float32)
            width, height = 1200, 900
            dst = np.array([
                [0, 0], [width, 0],
                [width, height], [0, height]
            ], dtype=np.float32)
            matrix = cv2.getPerspectiveTransform(pts, dst)
            img = cv2.warpPerspective(img, matrix, (width, height))

    # Enhance contrast and remove background
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, (0, 30, 0), (180, 255, 255))
    result = cv2.bitwise_and(img, img, mask=mask)

    return result

Shape Detection and Classification

Detect individual shapes by finding contours and classifying them based on geometry:

from dataclasses import dataclass, field
from enum import Enum


class ShapeType(Enum):
    RECTANGLE = "rectangle"
    CIRCLE = "circle"
    DIAMOND = "diamond"
    ARROW = "arrow"
    TEXT = "text"
    UNKNOWN = "unknown"


@dataclass
class DiagramElement:
    shape: ShapeType
    bbox: tuple  # (x, y, w, h)
    center: tuple  # (cx, cy)
    label: str = ""
    connections: list[int] = field(default_factory=list)


def detect_shapes(image: np.ndarray) -> list[DiagramElement]:
    """Detect and classify shapes in the preprocessed image."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

    contours, _ = cv2.findContours(
        binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    elements = []
    for contour in contours:
        area = cv2.contourArea(contour)
        if area < 500:  # Skip noise
            continue

        x, y, w, h = cv2.boundingRect(contour)
        center = (x + w // 2, y + h // 2)

        # Classify based on geometry
        perimeter = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.04 * perimeter, True)
        circularity = 4 * np.pi * area / (perimeter ** 2)

        if circularity > 0.8:
            shape = ShapeType.CIRCLE
        elif len(approx) == 4:
            aspect = w / float(h)
            angle = cv2.minAreaRect(contour)[-1]
            if 0.8 < aspect < 1.2 and abs(angle) > 30:
                shape = ShapeType.DIAMOND
            else:
                shape = ShapeType.RECTANGLE
        else:
            shape = ShapeType.UNKNOWN

        elements.append(DiagramElement(
            shape=shape,
            bbox=(x, y, w, h),
            center=center,
        ))

    return elements

Text Recognition Within Shapes

Extract the text label inside each detected shape:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import pytesseract
from PIL import Image


def extract_shape_labels(
    image: np.ndarray,
    elements: list[DiagramElement]
) -> list[DiagramElement]:
    """Read text inside each detected shape."""
    for elem in elements:
        x, y, w, h = elem.bbox
        padding = 5
        roi = image[
            max(0, y - padding):y + h + padding,
            max(0, x - padding):x + w + padding
        ]

        roi_pil = Image.fromarray(roi)
        text = pytesseract.image_to_string(
            roi_pil, config="--psm 6"
        ).strip()

        elem.label = text if text else f"Element_{elements.index(elem)}"

    return elements

Connection Detection

Find arrows and lines that connect shapes:

def detect_connections(
    elements: list[DiagramElement],
    image: np.ndarray
) -> list[tuple[int, int]]:
    """Detect which elements are connected by arrows or lines."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150)

    lines = cv2.HoughLinesP(
        edges, 1, np.pi / 180,
        threshold=50, minLineLength=30, maxLineGap=10
    )

    connections = []
    if lines is None:
        return connections

    for line in lines:
        x1, y1, x2, y2 = line[0]

        start_elem = find_nearest_element(elements, (x1, y1))
        end_elem = find_nearest_element(elements, (x2, y2))

        if (start_elem is not None and end_elem is not None
                and start_elem != end_elem):
            connections.append((start_elem, end_elem))

    return list(set(connections))


def find_nearest_element(
    elements: list[DiagramElement],
    point: tuple,
    max_dist: float = 50.0
) -> int | None:
    """Find the element closest to a given point."""
    min_dist = float("inf")
    nearest = None

    for i, elem in enumerate(elements):
        dist = np.sqrt(
            (elem.center[0] - point[0]) ** 2 +
            (elem.center[1] - point[1]) ** 2
        )
        if dist < min_dist and dist < max_dist:
            min_dist = dist
            nearest = i

    return nearest

Generating Mermaid Diagrams

Convert the detected structure into a Mermaid diagram:

def generate_mermaid(
    elements: list[DiagramElement],
    connections: list[tuple[int, int]],
    diagram_type: str = "flowchart"
) -> str:
    """Generate Mermaid diagram syntax from detected elements."""
    lines = [f"flowchart TD"]

    # Define nodes
    for i, elem in enumerate(elements):
        label = elem.label.replace('"', "'")
        if elem.shape == ShapeType.CIRCLE:
            lines.append(f'    N{i}(("{label}"))')
        elif elem.shape == ShapeType.DIAMOND:
            lines.append(f'    N{i}{{"{label}"}}')
        else:
            lines.append(f'    N{i}["{label}"]')

    # Define connections
    for start, end in connections:
        lines.append(f"    N{start} --> N{end}")

    return "\n".join(lines)

Generating SQL Schema from ER Diagrams

When the diagram is identified as an entity-relationship diagram, generate a SQL schema:

from openai import OpenAI


def diagram_to_sql(
    elements: list[DiagramElement],
    connections: list[tuple[int, int]]
) -> str:
    """Use an LLM to generate SQL from detected ER diagram."""
    diagram_desc = "Entities:\n"
    for i, elem in enumerate(elements):
        diagram_desc += f"- {elem.label} ({elem.shape.value})\n"

    diagram_desc += "\nRelationships:\n"
    for start, end in connections:
        diagram_desc += (
            f"- {elements[start].label} -> "
            f"{elements[end].label}\n"
        )

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Convert this ER diagram description into a PostgreSQL "
                "schema. Include primary keys, foreign keys, appropriate "
                "data types, and indexes. Only output SQL, no explanation."
            )},
            {"role": "user", "content": diagram_desc},
        ],
    )

    return response.choices[0].message.content

FAQ

How well does this work with messy handwriting?

The accuracy depends heavily on handwriting legibility. Block letters in dark markers on a clean whiteboard work well — expect 85-90% text recognition accuracy. Cursive or small writing drops significantly. For critical diagrams, consider having users write labels in a structured way or adding a manual correction step before code generation.

Can the agent distinguish between different diagram types automatically?

Yes, with LLM-powered classification. Send the detected shapes, their types, and connection patterns to an LLM and ask it to classify the diagram as a flowchart, ER diagram, sequence diagram, or architecture diagram. The shape distribution is a strong signal: many diamonds suggest a flowchart, all rectangles with labeled connections suggest an ER diagram.

How do I handle diagrams with multiple colors?

Color carries semantic meaning on whiteboards — red might mean errors, green might mean success paths. Preserve color information during preprocessing and pass it to the LLM as metadata. For example, annotate each element with its dominant color so the code generator can map red paths to error handlers and green paths to success flows.


#WhiteboardAI #DiagramRecognition #CodeGeneration #MermaidJS #ComputerVision #AgenticAI #Python #SoftwareDesign

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.