Claude Code and Test-Driven Development: AI-Assisted TDD
How to practice TDD with Claude Code — writing failing tests first, implementing to pass, and refactoring with AI assistance for faster red-green-refactor cycles.
TDD and AI: Better Together
Test-Driven Development follows a simple cycle: write a failing test (red), write the minimum code to pass it (green), then refactor (refactor). Claude Code accelerates every phase of this cycle.
The traditional objection to TDD is that it feels slow — writing tests before code adds upfront time. But with Claude Code, the overhead drops dramatically. You describe the behavior you want, Claude Code writes the test, then implements the code to pass it, then refactors — all in one conversation.
The AI-Assisted TDD Workflow
Phase 1: Red — Define the Behavior
Start by telling Claude Code what you want, framed as test cases:
Write failing tests for a PasswordValidator class with these rules:
1. Minimum 8 characters
2. At least one uppercase letter
3. At least one lowercase letter
4. At least one digit
5. At least one special character (!@#$%^&*)
6. Cannot contain the user's email or username
7. Cannot be in a list of common passwords
Write comprehensive tests covering valid passwords, each individual rule violation,
and edge cases (empty string, very long passwords, unicode characters).
Do NOT write the implementation yet.
Claude Code generates:
# tests/test_password_validator.py
import pytest
from app.validators.password import PasswordValidator, PasswordValidationResult
class TestPasswordValidator:
def setup_method(self):
self.validator = PasswordValidator(
common_passwords=["password123", "qwerty123", "admin123"]
)
# --- Valid passwords ---
def test_valid_password(self):
result = self.validator.validate("Str0ng!Pass", username="john", email="john@example.com")
assert result.is_valid
assert result.errors == []
def test_valid_complex_password(self):
result = self.validator.validate("C0mpl3x#P@ss!", username="jane", email="jane@test.com")
assert result.is_valid
# --- Length violations ---
def test_too_short(self):
result = self.validator.validate("Sh0rt!", username="user", email="u@e.com")
assert not result.is_valid
assert "at least 8 characters" in result.errors[0].lower()
def test_exactly_8_characters_valid(self):
result = self.validator.validate("Aa1!xxxx", username="user", email="u@e.com")
assert result.is_valid
# --- Missing character types ---
def test_no_uppercase(self):
result = self.validator.validate("lowercase1!", username="user", email="u@e.com")
assert not result.is_valid
assert any("uppercase" in e.lower() for e in result.errors)
def test_no_lowercase(self):
result = self.validator.validate("UPPERCASE1!", username="user", email="u@e.com")
assert not result.is_valid
assert any("lowercase" in e.lower() for e in result.errors)
def test_no_digit(self):
result = self.validator.validate("NoDigits!!", username="user", email="u@e.com")
assert not result.is_valid
assert any("digit" in e.lower() for e in result.errors)
def test_no_special_character(self):
result = self.validator.validate("NoSpecial1A", username="user", email="u@e.com")
assert not result.is_valid
assert any("special" in e.lower() for e in result.errors)
# --- Personal info ---
def test_contains_username(self):
result = self.validator.validate("john!Str0ng", username="john", email="john@e.com")
assert not result.is_valid
assert any("username" in e.lower() for e in result.errors)
def test_contains_email(self):
result = self.validator.validate("john@example.com1!A", username="john", email="john@example.com")
assert not result.is_valid
assert any("email" in e.lower() for e in result.errors)
def test_username_case_insensitive(self):
result = self.validator.validate("JOHN!str0ng", username="john", email="john@e.com")
assert not result.is_valid
# --- Common passwords ---
def test_common_password_rejected(self):
result = self.validator.validate("password123", username="user", email="u@e.com")
assert not result.is_valid
assert any("common" in e.lower() for e in result.errors)
# --- Edge cases ---
def test_empty_string(self):
result = self.validator.validate("", username="user", email="u@e.com")
assert not result.is_valid
def test_very_long_password(self):
long_pass = "A1!" + "a" * 997
result = self.validator.validate(long_pass, username="user", email="u@e.com")
assert result.is_valid
def test_unicode_characters(self):
result = self.validator.validate("Str0ng!Pass", username="user", email="u@e.com")
assert result.is_valid
def test_multiple_violations_returns_all_errors(self):
result = self.validator.validate("short", username="user", email="u@e.com")
assert not result.is_valid
assert len(result.errors) > 1 # Should report multiple issues
Phase 2: Green — Implement to Pass
Now run the tests. They should all fail since PasswordValidator does not exist yet.
Then implement PasswordValidator to make all tests pass. Use the minimum code necessary.
Claude Code runs the tests, confirms they fail, then implements:
# app/validators/password.py
from dataclasses import dataclass, field
import re
@dataclass
class PasswordValidationResult:
is_valid: bool
errors: list[str] = field(default_factory=list)
class PasswordValidator:
def __init__(self, common_passwords: list[str] | None = None):
self.common_passwords = set(
p.lower() for p in (common_passwords or [])
)
def validate(
self, password: str, username: str = "", email: str = ""
) -> PasswordValidationResult:
errors: list[str] = []
if len(password) < 8:
errors.append("Password must be at least 8 characters")
if not re.search(r"[A-Z]", password):
errors.append("Password must contain at least one uppercase letter")
if not re.search(r"[a-z]", password):
errors.append("Password must contain at least one lowercase letter")
if not re.search(r"[0-9]", password):
errors.append("Password must contain at least one digit")
if not re.search(r"[!@#$%^&*]", password):
errors.append("Password must contain at least one special character (!@#$%^&*)")
if username and username.lower() in password.lower():
errors.append("Password must not contain your username")
if email and email.lower() in password.lower():
errors.append("Password must not contain your email")
if password.lower() in self.common_passwords:
errors.append("Password is too common — please choose a different one")
return PasswordValidationResult(
is_valid=len(errors) == 0,
errors=errors,
)
Claude Code then runs the tests and confirms they all pass.
Phase 3: Refactor
All tests pass. Now refactor the PasswordValidator:
1. Extract each rule into a separate method for readability
2. Make rules configurable (min length, required character types)
3. Ensure all tests still pass after refactoring
Claude Code refactors while keeping all tests green:
@dataclass
class PasswordRules:
min_length: int = 8
require_uppercase: bool = True
require_lowercase: bool = True
require_digit: bool = True
require_special: bool = True
special_characters: str = "!@#$%^&*"
check_personal_info: bool = True
check_common_passwords: bool = True
class PasswordValidator:
def __init__(
self,
rules: PasswordRules | None = None,
common_passwords: list[str] | None = None,
):
self.rules = rules or PasswordRules()
self.common_passwords = set(
p.lower() for p in (common_passwords or [])
)
def validate(
self, password: str, username: str = "", email: str = ""
) -> PasswordValidationResult:
errors: list[str] = []
self._check_length(password, errors)
self._check_uppercase(password, errors)
self._check_lowercase(password, errors)
self._check_digit(password, errors)
self._check_special(password, errors)
self._check_personal_info(password, username, email, errors)
self._check_common(password, errors)
return PasswordValidationResult(is_valid=len(errors) == 0, errors=errors)
def _check_length(self, password: str, errors: list[str]) -> None:
if len(password) < self.rules.min_length:
errors.append(f"Password must be at least {self.rules.min_length} characters")
def _check_uppercase(self, password: str, errors: list[str]) -> None:
if self.rules.require_uppercase and not re.search(r"[A-Z]", password):
errors.append("Password must contain at least one uppercase letter")
# ... remaining check methods follow the same pattern
TDD with Claude Code for API Endpoints
The same red-green-refactor cycle works for API development:
Write integration tests for a POST /api/v1/orders endpoint that:
1. Creates an order with line items
2. Validates that all product IDs exist
3. Calculates total from line items (price * quantity)
4. Returns 201 with the created order
5. Returns 422 if any product ID is invalid
6. Returns 400 if the cart is empty
7. Returns 409 if any product is out of stock
Write the tests first. Do not implement the endpoint yet.
Then:
Run the tests to confirm they fail. Then implement the endpoint and service
to make all tests pass.
Benefits of TDD with Claude Code
1. Tests Define the Contract
When you write tests first, you define exactly what the code should do before Claude Code writes it. This eliminates the ambiguity that causes AI-generated code to miss requirements.
2. Tests Catch Regressions Immediately
Claude Code runs tests after every change. If a refactoring breaks something, it is caught and fixed in the same conversation.
3. Tests Serve as Documentation
The test suite Claude Code generates becomes documentation of your system's behavior. Future developers (and future Claude Code sessions) can read the tests to understand expected behavior.
4. Higher Confidence in AI-Generated Code
The biggest concern with AI-generated code is correctness. TDD directly addresses this by defining correctness criteria upfront and verifying them automatically.
Prompt Patterns for TDD
Start with Behavior, Not Implementation
Bad: "Write a function that loops through passwords and checks regex patterns"
Good: "Write tests for a password validator that enforces these rules: [list rules]"
Specify Edge Cases Explicitly
Include tests for these edge cases:
- Empty input
- Maximum length input
- Unicode characters
- Concurrent access
- Null/undefined values
Request Multiple Violation Reporting
The validator should report ALL violations, not just the first one.
Write a test that verifies multiple errors are returned for an input
that violates multiple rules.
Chain the Phases
Phase 1: Write failing tests for [feature]
Phase 2: Implement the minimum code to pass all tests
Phase 3: Refactor for readability and extensibility, keeping all tests green
TDD Anti-Patterns to Avoid with Claude Code
1. Asking for Tests After Implementation
If you ask Claude Code to implement a feature first and then write tests, the tests will be shaped by the implementation rather than the requirements. This leads to tests that pass by definition but miss edge cases.
2. Over-Mocking
Claude Code sometimes generates tests with too many mocks, testing implementation details rather than behavior. Guide it:
Write integration tests that test the actual database interactions.
Only mock external services (email, payment APIs). Do not mock repositories
or the database itself.
3. Testing Implementation Details
Test the public interface only. Do not test private methods or internal
data structures. Tests should verify behavior: given input X, expect output Y.
Measuring TDD Effectiveness
After a TDD session with Claude Code, check:
Run coverage analysis:
pytest --cov=app --cov-report=term-missing
Report:
1. Overall coverage percentage
2. Files with less than 80% coverage
3. Specific uncovered branches
Claude Code can then write additional tests to cover the gaps.
Conclusion
TDD with Claude Code combines the discipline of test-first development with the speed of AI-assisted coding. The red-green-refactor cycle becomes a natural conversation: describe the behavior, watch Claude Code write tests, confirm they fail, then ask for the implementation. The tests act as both a specification and a safety net, ensuring that AI-generated code meets your exact requirements. For teams that were previously hesitant about TDD due to the time overhead, Claude Code makes it practical and fast.
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.