Building a Code Generation Pipeline with 5 Specialized Agents: Planner, Coder, Reviewer, Tester, Deployer

Why Specialized Agents Beat a Single Code Generator

A single LLM prompted with "write me a function" can produce code, but it cannot reliably plan architecture, enforce coding standards, write meaningful tests, and handle deployment. Each of these tasks demands a different reasoning mode. By splitting the pipeline into five specialized agents — Planner, Coder, Reviewer, Tester, and Deployer — each agent can be optimized for its specific responsibility with tailored prompts, tools, and evaluation criteria.

This is the same principle that makes software engineering teams effective: specialization with clear handoff contracts.

The Pipeline Architecture

User Request
    |
    v
[Planner Agent] --> Implementation Plan
    |
    v
[Coder Agent] --> Generated Code
    |
    v
[Reviewer Agent] --> Review Feedback (pass/fail)
    |                      |
    | (if fail)            | (if pass)
    v                      v
[Coder Agent]         [Tester Agent] --> Test Results (pass/fail)
  (revision)               |
                           v (if pass)
                    [Deployer Agent] --> Deployment Artifact

Handoff Data Structures

The key to a reliable pipeline is well-defined data structures at each handoff point.

from dataclasses import dataclass, field
from typing import List, Optional, Dict
from enum import Enum

class StageStatus(Enum):
    PENDING = "pending"
    PASSED = "passed"
    FAILED = "failed"
    NEEDS_REVISION = "needs_revision"

@dataclass
class ImplementationPlan:
    summary: str
    files_to_create: List[str]
    files_to_modify: List[str]
    dependencies: List[str]
    architecture_notes: str
    acceptance_criteria: List[str]

@dataclass
class GeneratedCode:
    files: Dict[str, str]          # filename -> content
    dependencies_added: List[str]
    explanation: str
    revision_number: int = 1

@dataclass
class ReviewResult:
    status: StageStatus
    issues: List[Dict[str, str]]   # {"severity": ..., "description": ...}
    suggestions: List[str]
    score: float                   # 0.0 to 1.0

@dataclass
class TestResult:
    status: StageStatus
    test_files: Dict[str, str]     # filename -> test content
    tests_passed: int
    tests_failed: int
    coverage_percent: float
    failure_details: List[str]

@dataclass
class DeploymentArtifact:
    dockerfile: Optional[str]
    deployment_config: Optional[str]
    environment_variables: List[str]
    deploy_instructions: str

Implementing Each Agent

The Planner Agent

The Planner analyzes the user's request and produces a structured implementation plan.

from openai import AsyncOpenAI

client = AsyncOpenAI()

async def planner_agent(user_request: str) -> ImplementationPlan:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a software architect. Given a feature request, "
                    "produce a structured implementation plan. Output JSON with "
                    "keys: summary, files_to_create, files_to_modify, "
                    "dependencies, architecture_notes, acceptance_criteria."
                ),
            },
            {"role": "user", "content": user_request},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return ImplementationPlan(**data)

The Coder Agent

The Coder takes the plan and produces implementation code.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

async def coder_agent(
    plan: ImplementationPlan,
    review_feedback: Optional[ReviewResult] = None,
    revision: int = 1,
) -> GeneratedCode:
    context = f"Implementation plan:\n{plan.summary}\n"
    context += f"Files to create: {plan.files_to_create}\n"
    context += f"Architecture: {plan.architecture_notes}\n"

    if review_feedback and review_feedback.status == StageStatus.NEEDS_REVISION:
        context += "\nReview feedback to address:\n"
        for issue in review_feedback.issues:
            context += f"- [{issue['severity']}] {issue['description']}\n"

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are an expert software engineer. Generate production "
                    "code based on the plan. Return JSON with keys: files "
                    "(object mapping filename to content), dependencies_added "
                    "(list), explanation (string)."
                ),
            },
            {"role": "user", "content": context},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return GeneratedCode(**data, revision_number=revision)

The Reviewer Agent

async def reviewer_agent(
    plan: ImplementationPlan, code: GeneratedCode
) -> ReviewResult:
    review_prompt = (
        f"Plan: {plan.summary}\n"
        f"Acceptance criteria: {plan.acceptance_criteria}\n"
        f"Code files: {json.dumps(code.files, indent=2)}"
    )
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a senior code reviewer. Evaluate the code against "
                    "the plan. Check for bugs, security issues, missing edge "
                    "cases, and adherence to acceptance criteria. Return JSON: "
                    "status (passed/needs_revision), issues (list of objects "
                    "with severity and description), suggestions, score (0-1)."
                ),
            },
            {"role": "user", "content": review_prompt},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    data["status"] = StageStatus(data["status"])
    return ReviewResult(**data)

The Pipeline Orchestrator

import json

async def run_pipeline(user_request: str, max_revisions: int = 3):
    # Step 1: Plan
    plan = await planner_agent(user_request)
    print(f"Plan: {plan.summary}")

    code = None
    review = None
    for revision in range(1, max_revisions + 1):
        # Step 2: Code (or revise)
        code = await coder_agent(plan, review, revision)
        print(f"Code generated (revision {revision})")

        # Step 3: Review
        review = await reviewer_agent(plan, code)
        print(f"Review score: {review.score}")

        if review.status == StageStatus.PASSED:
            break
    else:
        print("Max revisions reached — proceeding with best effort")

    # Step 4: Test
    test_result = await tester_agent(plan, code)
    if test_result.status == StageStatus.FAILED:
        print(f"Tests failed: {test_result.failure_details}")
        return None

    # Step 5: Deploy
    artifact = await deployer_agent(code)
    print(f"Deployment ready: {artifact.deploy_instructions}")
    return artifact

The review loop is the key quality mechanism — the Coder receives specific feedback from the Reviewer and addresses it in the next revision, mimicking a real code review cycle.

FAQ

Why not use a single agent with all five capabilities?

Specialization allows each agent to have focused system prompts, specific tool access, and tailored output schemas. The Reviewer agent, for example, should never have access to the deployment tools. Separation of concerns also lets you swap individual agents (e.g., upgrade only your Tester agent) without affecting the rest of the pipeline.

How do I handle the Coder agent producing code that never passes review?

Set a maximum revision count (typically 2-3). If the code still fails review after max revisions, log the failure with the review feedback, alert a human, and halt the pipeline. In practice, most code passes review within 2 iterations when the Planner produces clear acceptance criteria.

Can I run some agents in parallel?

The Tester and Deployer agents are sequential — you cannot deploy untested code. However, if you have multiple independent code files, you can parallelize coding and reviewing across files. The Planner output should indicate which files are independent to enable this.

#CodeGeneration #AgentPipeline #MultiAgentAI #SoftwareEngineering #AIDevTools #AgenticAI #PythonAI #CICDAgents