Why Build an AI Code Review Bot?

Manual code review is a bottleneck in every engineering team. Senior engineers spend 5-10 hours per week reviewing pull requests. Reviews are inconsistent -- what one reviewer catches, another misses. And review latency delays merges, slowing the entire development cycle.

An AI code review bot does not replace human reviewers. It augments them by catching the mechanical issues (bugs, security vulnerabilities, style violations, missing tests) so that human reviewers can focus on architecture, design, and business logic.

Architecture Overview

The system has four components:

GitHub Webhook Listener: Receives PR events from GitHub
Diff Analyzer: Extracts and structures the code changes
Claude Review Engine: Analyzes code and generates feedback
GitHub Comment Writer: Posts review comments on the PR

GitHub PR Event -> Webhook -> Diff Analyzer -> Claude API -> GitHub Comments

Step 1: GitHub Webhook Listener

from fastapi import FastAPI, Request, HTTPException
import hmac
import hashlib
import os

app = FastAPI()
GITHUB_WEBHOOK_SECRET = os.environ["GITHUB_WEBHOOK_SECRET"]

@app.post("/webhook/github")
async def handle_github_webhook(request: Request):
    # Verify webhook signature
    signature = request.headers.get("X-Hub-Signature-256", "")
    body = await request.body()

    expected = "sha256=" + hmac.new(
        GITHUB_WEBHOOK_SECRET.encode(),
        body,
        hashlib.sha256
    ).hexdigest()

    if not hmac.compare_digest(signature, expected):
        raise HTTPException(status_code=403, detail="Invalid signature")

    payload = await request.json()
    event_type = request.headers.get("X-GitHub-Event")

    if event_type == "pull_request" and payload["action"] in ("opened", "synchronize"):
        await review_pull_request(
            repo=payload["repository"]["full_name"],
            pr_number=payload["pull_request"]["number"],
            base_sha=payload["pull_request"]["base"]["sha"],
            head_sha=payload["pull_request"]["head"]["sha"],
        )

    return {"status": "ok"}

Step 2: Diff Analyzer

import httpx

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]

async def get_pr_diff(repo: str, pr_number: int) -> list[dict]:
    """Fetch the PR diff and parse it into structured file changes."""
    async with httpx.AsyncClient() as client:
        # Get list of changed files
        response = await client.get(
            f"https://api.github.com/repos/{repo}/pulls/{pr_number}/files",
            headers={
                "Authorization": f"token {GITHUB_TOKEN}",
                "Accept": "application/vnd.github.v3+json",
            }
        )
        files = response.json()

    changes = []
    for file in files:
        if file["status"] == "removed":
            continue  # Skip deleted files

        changes.append({
            "filename": file["filename"],
            "status": file["status"],  # added, modified, renamed
            "additions": file["additions"],
            "deletions": file["deletions"],
            "patch": file.get("patch", ""),  # The actual diff
            "language": detect_language(file["filename"]),
        })

    return changes

def detect_language(filename: str) -> str:
    ext_map = {
        ".py": "python", ".ts": "typescript", ".tsx": "typescript",
        ".js": "javascript", ".jsx": "javascript", ".go": "go",
        ".rs": "rust", ".java": "java", ".rb": "ruby",
    }
    for ext, lang in ext_map.items():
        if filename.endswith(ext):
            return lang
    return "unknown"

Step 3: Claude Review Engine

This is the core of the system. We send each file's diff to Claude with specialized review instructions.

from anthropic import Anthropic
import json

client = Anthropic()

REVIEW_SYSTEM_PROMPT = """You are an expert code reviewer. For each code diff provided,
analyze the changes and identify:

1. **Bugs**: Logic errors, off-by-one errors, null pointer issues, race conditions
2. **Security**: SQL injection, XSS, auth bypasses, secrets exposure, input validation
3. **Performance**: N+1 queries, unnecessary allocations, missing indexes, O(n^2) algorithms
4. **Style**: Naming conventions, code organization, readability
5. **Missing tests**: New logic paths that lack test coverage

For each issue found, provide:
- severity: "critical", "warning", or "suggestion"
- line: the line number in the diff (from the + side)
- description: clear explanation of the issue
- suggestion: specific code fix when possible

Return your review as a JSON array of issues. If the code looks good, return an empty array.
Do NOT fabricate issues -- only report genuine problems."""

async def review_file(filename: str, patch: str, language: str) -> list[dict]:
    """Review a single file's changes."""
    if not patch or len(patch) < 10:
        return []

    response = client.messages.create(
        model="claude-sonnet-4-5-20250514",
        max_tokens=4096,
        system=REVIEW_SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": f"""Review this {language} code change in {filename}:

```diff
{patch}

Return your findings as a JSON array.""" }] )

try:
    # Extract JSON from the response
    text = response.content[0].text
    # Handle markdown code blocks in response
    if "```json" in text:
        text = text.split("```json")[1].split("```")[0]
    elif "```" in text:
        text = text.split("```")[1].split("```")[0]
    return json.loads(text)
except (json.JSONDecodeError, IndexError):
    return []

async def review_pull_request(repo: str, pr_number: int, base_sha: str, head_sha: str): """Review all files in a pull request.""" changes = await get_pr_diff(repo, pr_number)

all_issues = []
for file_change in changes:
    issues = await review_file(
        filename=file_change["filename"],
        patch=file_change["patch"],
        language=file_change["language"],
    )
    for issue in issues:
        issue["filename"] = file_change["filename"]
    all_issues.extend(issues)

# Post results to GitHub
await post_review_comments(repo, pr_number, head_sha, all_issues)


## Step 4: GitHub Comment Writer

```python
async def post_review_comments(
    repo: str, pr_number: int, commit_sha: str, issues: list[dict]
):
    """Post review comments on the GitHub PR."""
    if not issues:
        # Post a summary comment
        await post_pr_comment(
            repo, pr_number,
            "AI Review: No issues found. The changes look good."
        )
        return

    # Group by severity
    critical = [i for i in issues if i["severity"] == "critical"]
    warnings = [i for i in issues if i["severity"] == "warning"]
    suggestions = [i for i in issues if i["severity"] == "suggestion"]

    # Create review with inline comments
    comments = []
    for issue in issues:
        body = f"**{issue['severity'].upper()}**: {issue['description']}"
        if issue.get("suggestion"):
            body += f"\n\nSuggested fix:\n```\n{issue['suggestion']}\n```"

        comments.append({
            "path": issue["filename"],
            "line": issue.get("line", 1),
            "body": body,
        })

    # Determine review action
    event = "REQUEST_CHANGES" if critical else "COMMENT"

    summary = f"""## AI Code Review Summary

| Severity | Count |
|---|---|
| Critical | {len(critical)} |
| Warning | {len(warnings)} |
| Suggestion | {len(suggestions)} |

{"**Action required**: Critical issues found that should be addressed before merging." if critical else "No blocking issues found."}"""

    async with httpx.AsyncClient() as http_client:
        await http_client.post(
            f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews",
            headers={
                "Authorization": f"token {GITHUB_TOKEN}",
                "Accept": "application/vnd.github.v3+json",
            },
            json={
                "commit_id": commit_sha,
                "body": summary,
                "event": event,
                "comments": comments,
            }
        )

Handling Large PRs

Large PRs can exceed Claude's context window. Split the review into manageable chunks:

async def review_large_pr(changes: list[dict], max_tokens_per_call: int = 50_000):
    """Break large PRs into reviewable chunks."""
    current_batch = []
    current_tokens = 0

    for change in changes:
        patch_tokens = len(change["patch"]) // 4  # Rough estimate
        if current_tokens + patch_tokens > max_tokens_per_call and current_batch:
            # Review current batch
            yield await review_batch(current_batch)
            current_batch = []
            current_tokens = 0

        current_batch.append(change)
        current_tokens += patch_tokens

    if current_batch:
        yield await review_batch(current_batch)

Reducing False Positives

The biggest challenge with AI code review is false positives. Every false positive erodes developer trust in the tool. Strategies to minimize them:

Include project context: Add a .ai-review-config.yml that describes coding standards, acceptable patterns, and known exceptions
Use file-type-specific prompts: A Python review prompt differs from a TypeScript review prompt
Filter low-confidence findings: Ask Claude to rate its confidence (1-10) and only surface issues above 7
Learn from dismissals: Track which comments developers dismiss and adjust the prompt accordingly
Limit scope: Focus on security and bugs initially. Add style checks only after the bot has earned trust

Cost Analysis

For an average PR with 10 changed files and 500 lines of diff:

Component	Tokens	Cost (Sonnet)
System prompt (cached)	500	$0.00015
10 file diffs	5,000	$0.015
10 review outputs	3,000	$0.045
Total per PR	8,500	$0.06

At 50 PRs per day, the monthly cost is approximately $90 -- less than one hour of a senior engineer's time. The ROI is immediate and substantial.

Building a Code Review Bot with the Claude API

Why Build an AI Code Review Bot?

Architecture Overview

Step 1: GitHub Webhook Listener

Step 2: Diff Analyzer

Step 3: Claude Review Engine

Handling Large PRs

Reducing False Positives

Cost Analysis

Try CallSphere AI Voice Agents

Related Articles

Massive Multitask Language Understanding (MMLU) benchmark evaluates general knowledge and reasoning

Claude Co-Work: How Claude Enables True Collaborative AI Development

Showcasing LLM Performance: How Research Papers Present Evaluation Results