Building a Code Review Bot with the Claude API
Step-by-step guide to building an automated code review bot using the Claude API. Covers GitHub integration, diff analysis, security scanning, style enforcement, and delivering actionable feedback on pull requests.
Why Build an AI Code Review Bot?
Manual code review is a bottleneck in every engineering team. Senior engineers spend 5-10 hours per week reviewing pull requests. Reviews are inconsistent -- what one reviewer catches, another misses. And review latency delays merges, slowing the entire development cycle.
An AI code review bot does not replace human reviewers. It augments them by catching the mechanical issues (bugs, security vulnerabilities, style violations, missing tests) so that human reviewers can focus on architecture, design, and business logic.
Architecture Overview
The system has four components:
- GitHub Webhook Listener: Receives PR events from GitHub
- Diff Analyzer: Extracts and structures the code changes
- Claude Review Engine: Analyzes code and generates feedback
- GitHub Comment Writer: Posts review comments on the PR
GitHub PR Event -> Webhook -> Diff Analyzer -> Claude API -> GitHub Comments
Step 1: GitHub Webhook Listener
from fastapi import FastAPI, Request, HTTPException
import hmac
import hashlib
import os
app = FastAPI()
GITHUB_WEBHOOK_SECRET = os.environ["GITHUB_WEBHOOK_SECRET"]
@app.post("/webhook/github")
async def handle_github_webhook(request: Request):
# Verify webhook signature
signature = request.headers.get("X-Hub-Signature-256", "")
body = await request.body()
expected = "sha256=" + hmac.new(
GITHUB_WEBHOOK_SECRET.encode(),
body,
hashlib.sha256
).hexdigest()
if not hmac.compare_digest(signature, expected):
raise HTTPException(status_code=403, detail="Invalid signature")
payload = await request.json()
event_type = request.headers.get("X-GitHub-Event")
if event_type == "pull_request" and payload["action"] in ("opened", "synchronize"):
await review_pull_request(
repo=payload["repository"]["full_name"],
pr_number=payload["pull_request"]["number"],
base_sha=payload["pull_request"]["base"]["sha"],
head_sha=payload["pull_request"]["head"]["sha"],
)
return {"status": "ok"}
Step 2: Diff Analyzer
import httpx
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
async def get_pr_diff(repo: str, pr_number: int) -> list[dict]:
"""Fetch the PR diff and parse it into structured file changes."""
async with httpx.AsyncClient() as client:
# Get list of changed files
response = await client.get(
f"https://api.github.com/repos/{repo}/pulls/{pr_number}/files",
headers={
"Authorization": f"token {GITHUB_TOKEN}",
"Accept": "application/vnd.github.v3+json",
}
)
files = response.json()
changes = []
for file in files:
if file["status"] == "removed":
continue # Skip deleted files
changes.append({
"filename": file["filename"],
"status": file["status"], # added, modified, renamed
"additions": file["additions"],
"deletions": file["deletions"],
"patch": file.get("patch", ""), # The actual diff
"language": detect_language(file["filename"]),
})
return changes
def detect_language(filename: str) -> str:
ext_map = {
".py": "python", ".ts": "typescript", ".tsx": "typescript",
".js": "javascript", ".jsx": "javascript", ".go": "go",
".rs": "rust", ".java": "java", ".rb": "ruby",
}
for ext, lang in ext_map.items():
if filename.endswith(ext):
return lang
return "unknown"
Step 3: Claude Review Engine
This is the core of the system. We send each file's diff to Claude with specialized review instructions.
from anthropic import Anthropic
import json
client = Anthropic()
REVIEW_SYSTEM_PROMPT = """You are an expert code reviewer. For each code diff provided,
analyze the changes and identify:
1. **Bugs**: Logic errors, off-by-one errors, null pointer issues, race conditions
2. **Security**: SQL injection, XSS, auth bypasses, secrets exposure, input validation
3. **Performance**: N+1 queries, unnecessary allocations, missing indexes, O(n^2) algorithms
4. **Style**: Naming conventions, code organization, readability
5. **Missing tests**: New logic paths that lack test coverage
For each issue found, provide:
- severity: "critical", "warning", or "suggestion"
- line: the line number in the diff (from the + side)
- description: clear explanation of the issue
- suggestion: specific code fix when possible
Return your review as a JSON array of issues. If the code looks good, return an empty array.
Do NOT fabricate issues -- only report genuine problems."""
async def review_file(filename: str, patch: str, language: str) -> list[dict]:
"""Review a single file's changes."""
if not patch or len(patch) < 10:
return []
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
system=REVIEW_SYSTEM_PROMPT,
messages=[{
"role": "user",
"content": f"""Review this {language} code change in {filename}:
```diff
{patch}
Return your findings as a JSON array.""" }] )
try:
# Extract JSON from the response
text = response.content[0].text
# Handle markdown code blocks in response
if "```json" in text:
text = text.split("```json")[1].split("```")[0]
elif "```" in text:
text = text.split("```")[1].split("```")[0]
return json.loads(text)
except (json.JSONDecodeError, IndexError):
return []
async def review_pull_request(repo: str, pr_number: int, base_sha: str, head_sha: str): """Review all files in a pull request.""" changes = await get_pr_diff(repo, pr_number)
all_issues = []
for file_change in changes:
issues = await review_file(
filename=file_change["filename"],
patch=file_change["patch"],
language=file_change["language"],
)
for issue in issues:
issue["filename"] = file_change["filename"]
all_issues.extend(issues)
# Post results to GitHub
await post_review_comments(repo, pr_number, head_sha, all_issues)
## Step 4: GitHub Comment Writer
```python
async def post_review_comments(
repo: str, pr_number: int, commit_sha: str, issues: list[dict]
):
"""Post review comments on the GitHub PR."""
if not issues:
# Post a summary comment
await post_pr_comment(
repo, pr_number,
"AI Review: No issues found. The changes look good."
)
return
# Group by severity
critical = [i for i in issues if i["severity"] == "critical"]
warnings = [i for i in issues if i["severity"] == "warning"]
suggestions = [i for i in issues if i["severity"] == "suggestion"]
# Create review with inline comments
comments = []
for issue in issues:
body = f"**{issue['severity'].upper()}**: {issue['description']}"
if issue.get("suggestion"):
body += f"\n\nSuggested fix:\n```\n{issue['suggestion']}\n```"
comments.append({
"path": issue["filename"],
"line": issue.get("line", 1),
"body": body,
})
# Determine review action
event = "REQUEST_CHANGES" if critical else "COMMENT"
summary = f"""## AI Code Review Summary
| Severity | Count |
|---|---|
| Critical | {len(critical)} |
| Warning | {len(warnings)} |
| Suggestion | {len(suggestions)} |
{"**Action required**: Critical issues found that should be addressed before merging." if critical else "No blocking issues found."}"""
async with httpx.AsyncClient() as http_client:
await http_client.post(
f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews",
headers={
"Authorization": f"token {GITHUB_TOKEN}",
"Accept": "application/vnd.github.v3+json",
},
json={
"commit_id": commit_sha,
"body": summary,
"event": event,
"comments": comments,
}
)
Handling Large PRs
Large PRs can exceed Claude's context window. Split the review into manageable chunks:
async def review_large_pr(changes: list[dict], max_tokens_per_call: int = 50_000):
"""Break large PRs into reviewable chunks."""
current_batch = []
current_tokens = 0
for change in changes:
patch_tokens = len(change["patch"]) // 4 # Rough estimate
if current_tokens + patch_tokens > max_tokens_per_call and current_batch:
# Review current batch
yield await review_batch(current_batch)
current_batch = []
current_tokens = 0
current_batch.append(change)
current_tokens += patch_tokens
if current_batch:
yield await review_batch(current_batch)
Reducing False Positives
The biggest challenge with AI code review is false positives. Every false positive erodes developer trust in the tool. Strategies to minimize them:
- Include project context: Add a
.ai-review-config.ymlthat describes coding standards, acceptable patterns, and known exceptions - Use file-type-specific prompts: A Python review prompt differs from a TypeScript review prompt
- Filter low-confidence findings: Ask Claude to rate its confidence (1-10) and only surface issues above 7
- Learn from dismissals: Track which comments developers dismiss and adjust the prompt accordingly
- Limit scope: Focus on security and bugs initially. Add style checks only after the bot has earned trust
Cost Analysis
For an average PR with 10 changed files and 500 lines of diff:
| Component | Tokens | Cost (Sonnet) |
|---|---|---|
| System prompt (cached) | 500 | $0.00015 |
| 10 file diffs | 5,000 | $0.015 |
| 10 review outputs | 3,000 | $0.045 |
| Total per PR | 8,500 | $0.06 |
At 50 PRs per day, the monthly cost is approximately $90 -- less than one hour of a senior engineer's time. The ROI is immediate and substantial.
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.