Building a Code Generation Pipeline with 5 Specialized Agents: Planner, Coder, Reviewer, Tester, Deployer
Build an end-to-end code generation pipeline using five specialized AI agents — Planner, Coder, Reviewer, Tester, and Deployer — with complete handoff data structures and orchestration logic in Python.
Why Specialized Agents Beat a Single Code Generator
A single LLM prompted with "write me a function" can produce code, but it cannot reliably plan architecture, enforce coding standards, write meaningful tests, and handle deployment. Each of these tasks demands a different reasoning mode. By splitting the pipeline into five specialized agents — Planner, Coder, Reviewer, Tester, and Deployer — each agent can be optimized for its specific responsibility with tailored prompts, tools, and evaluation criteria.
This is the same principle that makes software engineering teams effective: specialization with clear handoff contracts.
The Pipeline Architecture
User Request
|
v
[Planner Agent] --> Implementation Plan
|
v
[Coder Agent] --> Generated Code
|
v
[Reviewer Agent] --> Review Feedback (pass/fail)
| |
| (if fail) | (if pass)
v v
[Coder Agent] [Tester Agent] --> Test Results (pass/fail)
(revision) |
v (if pass)
[Deployer Agent] --> Deployment Artifact
Handoff Data Structures
The key to a reliable pipeline is well-defined data structures at each handoff point.
from dataclasses import dataclass, field
from typing import List, Optional, Dict
from enum import Enum
class StageStatus(Enum):
PENDING = "pending"
PASSED = "passed"
FAILED = "failed"
NEEDS_REVISION = "needs_revision"
@dataclass
class ImplementationPlan:
summary: str
files_to_create: List[str]
files_to_modify: List[str]
dependencies: List[str]
architecture_notes: str
acceptance_criteria: List[str]
@dataclass
class GeneratedCode:
files: Dict[str, str] # filename -> content
dependencies_added: List[str]
explanation: str
revision_number: int = 1
@dataclass
class ReviewResult:
status: StageStatus
issues: List[Dict[str, str]] # {"severity": ..., "description": ...}
suggestions: List[str]
score: float # 0.0 to 1.0
@dataclass
class TestResult:
status: StageStatus
test_files: Dict[str, str] # filename -> test content
tests_passed: int
tests_failed: int
coverage_percent: float
failure_details: List[str]
@dataclass
class DeploymentArtifact:
dockerfile: Optional[str]
deployment_config: Optional[str]
environment_variables: List[str]
deploy_instructions: str
Implementing Each Agent
The Planner Agent
The Planner analyzes the user's request and produces a structured implementation plan.
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def planner_agent(user_request: str) -> ImplementationPlan:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"You are a software architect. Given a feature request, "
"produce a structured implementation plan. Output JSON with "
"keys: summary, files_to_create, files_to_modify, "
"dependencies, architecture_notes, acceptance_criteria."
),
},
{"role": "user", "content": user_request},
],
response_format={"type": "json_object"},
)
data = json.loads(response.choices[0].message.content)
return ImplementationPlan(**data)
The Coder Agent
The Coder takes the plan and produces implementation code.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
async def coder_agent(
plan: ImplementationPlan,
review_feedback: Optional[ReviewResult] = None,
revision: int = 1,
) -> GeneratedCode:
context = f"Implementation plan:\n{plan.summary}\n"
context += f"Files to create: {plan.files_to_create}\n"
context += f"Architecture: {plan.architecture_notes}\n"
if review_feedback and review_feedback.status == StageStatus.NEEDS_REVISION:
context += "\nReview feedback to address:\n"
for issue in review_feedback.issues:
context += f"- [{issue['severity']}] {issue['description']}\n"
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"You are an expert software engineer. Generate production "
"code based on the plan. Return JSON with keys: files "
"(object mapping filename to content), dependencies_added "
"(list), explanation (string)."
),
},
{"role": "user", "content": context},
],
response_format={"type": "json_object"},
)
data = json.loads(response.choices[0].message.content)
return GeneratedCode(**data, revision_number=revision)
The Reviewer Agent
async def reviewer_agent(
plan: ImplementationPlan, code: GeneratedCode
) -> ReviewResult:
review_prompt = (
f"Plan: {plan.summary}\n"
f"Acceptance criteria: {plan.acceptance_criteria}\n"
f"Code files: {json.dumps(code.files, indent=2)}"
)
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"You are a senior code reviewer. Evaluate the code against "
"the plan. Check for bugs, security issues, missing edge "
"cases, and adherence to acceptance criteria. Return JSON: "
"status (passed/needs_revision), issues (list of objects "
"with severity and description), suggestions, score (0-1)."
),
},
{"role": "user", "content": review_prompt},
],
response_format={"type": "json_object"},
)
data = json.loads(response.choices[0].message.content)
data["status"] = StageStatus(data["status"])
return ReviewResult(**data)
The Pipeline Orchestrator
import json
async def run_pipeline(user_request: str, max_revisions: int = 3):
# Step 1: Plan
plan = await planner_agent(user_request)
print(f"Plan: {plan.summary}")
code = None
review = None
for revision in range(1, max_revisions + 1):
# Step 2: Code (or revise)
code = await coder_agent(plan, review, revision)
print(f"Code generated (revision {revision})")
# Step 3: Review
review = await reviewer_agent(plan, code)
print(f"Review score: {review.score}")
if review.status == StageStatus.PASSED:
break
else:
print("Max revisions reached — proceeding with best effort")
# Step 4: Test
test_result = await tester_agent(plan, code)
if test_result.status == StageStatus.FAILED:
print(f"Tests failed: {test_result.failure_details}")
return None
# Step 5: Deploy
artifact = await deployer_agent(code)
print(f"Deployment ready: {artifact.deploy_instructions}")
return artifact
The review loop is the key quality mechanism — the Coder receives specific feedback from the Reviewer and addresses it in the next revision, mimicking a real code review cycle.
FAQ
Why not use a single agent with all five capabilities?
Specialization allows each agent to have focused system prompts, specific tool access, and tailored output schemas. The Reviewer agent, for example, should never have access to the deployment tools. Separation of concerns also lets you swap individual agents (e.g., upgrade only your Tester agent) without affecting the rest of the pipeline.
How do I handle the Coder agent producing code that never passes review?
Set a maximum revision count (typically 2-3). If the code still fails review after max revisions, log the failure with the review feedback, alert a human, and halt the pipeline. In practice, most code passes review within 2 iterations when the Planner produces clear acceptance criteria.
Can I run some agents in parallel?
The Tester and Deployer agents are sequential — you cannot deploy untested code. However, if you have multiple independent code files, you can parallelize coding and reviewing across files. The Planner output should indicate which files are independent to enable this.
#CodeGeneration #AgentPipeline #MultiAgentAI #SoftwareEngineering #AIDevTools #AgenticAI #PythonAI #CICDAgents
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.