Skip to content
Back to Blog
Agentic AI8 min read

Claude Code and Test-Driven Development: AI-Assisted TDD

How to practice TDD with Claude Code — writing failing tests first, implementing to pass, and refactoring with AI assistance for faster red-green-refactor cycles.

TDD and AI: Better Together

Test-Driven Development follows a simple cycle: write a failing test (red), write the minimum code to pass it (green), then refactor (refactor). Claude Code accelerates every phase of this cycle.

The traditional objection to TDD is that it feels slow — writing tests before code adds upfront time. But with Claude Code, the overhead drops dramatically. You describe the behavior you want, Claude Code writes the test, then implements the code to pass it, then refactors — all in one conversation.

The AI-Assisted TDD Workflow

Phase 1: Red — Define the Behavior

Start by telling Claude Code what you want, framed as test cases:

Write failing tests for a PasswordValidator class with these rules:
1. Minimum 8 characters
2. At least one uppercase letter
3. At least one lowercase letter
4. At least one digit
5. At least one special character (!@#$%^&*)
6. Cannot contain the user's email or username
7. Cannot be in a list of common passwords

Write comprehensive tests covering valid passwords, each individual rule violation,
and edge cases (empty string, very long passwords, unicode characters).
Do NOT write the implementation yet.

Claude Code generates:

# tests/test_password_validator.py
import pytest
from app.validators.password import PasswordValidator, PasswordValidationResult

class TestPasswordValidator:
    def setup_method(self):
        self.validator = PasswordValidator(
            common_passwords=["password123", "qwerty123", "admin123"]
        )

    # --- Valid passwords ---
    def test_valid_password(self):
        result = self.validator.validate("Str0ng!Pass", username="john", email="john@example.com")
        assert result.is_valid
        assert result.errors == []

    def test_valid_complex_password(self):
        result = self.validator.validate("C0mpl3x#P@ss!", username="jane", email="jane@test.com")
        assert result.is_valid

    # --- Length violations ---
    def test_too_short(self):
        result = self.validator.validate("Sh0rt!", username="user", email="u@e.com")
        assert not result.is_valid
        assert "at least 8 characters" in result.errors[0].lower()

    def test_exactly_8_characters_valid(self):
        result = self.validator.validate("Aa1!xxxx", username="user", email="u@e.com")
        assert result.is_valid

    # --- Missing character types ---
    def test_no_uppercase(self):
        result = self.validator.validate("lowercase1!", username="user", email="u@e.com")
        assert not result.is_valid
        assert any("uppercase" in e.lower() for e in result.errors)

    def test_no_lowercase(self):
        result = self.validator.validate("UPPERCASE1!", username="user", email="u@e.com")
        assert not result.is_valid
        assert any("lowercase" in e.lower() for e in result.errors)

    def test_no_digit(self):
        result = self.validator.validate("NoDigits!!", username="user", email="u@e.com")
        assert not result.is_valid
        assert any("digit" in e.lower() for e in result.errors)

    def test_no_special_character(self):
        result = self.validator.validate("NoSpecial1A", username="user", email="u@e.com")
        assert not result.is_valid
        assert any("special" in e.lower() for e in result.errors)

    # --- Personal info ---
    def test_contains_username(self):
        result = self.validator.validate("john!Str0ng", username="john", email="john@e.com")
        assert not result.is_valid
        assert any("username" in e.lower() for e in result.errors)

    def test_contains_email(self):
        result = self.validator.validate("john@example.com1!A", username="john", email="john@example.com")
        assert not result.is_valid
        assert any("email" in e.lower() for e in result.errors)

    def test_username_case_insensitive(self):
        result = self.validator.validate("JOHN!str0ng", username="john", email="john@e.com")
        assert not result.is_valid

    # --- Common passwords ---
    def test_common_password_rejected(self):
        result = self.validator.validate("password123", username="user", email="u@e.com")
        assert not result.is_valid
        assert any("common" in e.lower() for e in result.errors)

    # --- Edge cases ---
    def test_empty_string(self):
        result = self.validator.validate("", username="user", email="u@e.com")
        assert not result.is_valid

    def test_very_long_password(self):
        long_pass = "A1!" + "a" * 997
        result = self.validator.validate(long_pass, username="user", email="u@e.com")
        assert result.is_valid

    def test_unicode_characters(self):
        result = self.validator.validate("Str0ng!Pass", username="user", email="u@e.com")
        assert result.is_valid

    def test_multiple_violations_returns_all_errors(self):
        result = self.validator.validate("short", username="user", email="u@e.com")
        assert not result.is_valid
        assert len(result.errors) > 1  # Should report multiple issues

Phase 2: Green — Implement to Pass

Now run the tests. They should all fail since PasswordValidator does not exist yet.
Then implement PasswordValidator to make all tests pass. Use the minimum code necessary.

Claude Code runs the tests, confirms they fail, then implements:

# app/validators/password.py
from dataclasses import dataclass, field
import re

@dataclass
class PasswordValidationResult:
    is_valid: bool
    errors: list[str] = field(default_factory=list)

class PasswordValidator:
    def __init__(self, common_passwords: list[str] | None = None):
        self.common_passwords = set(
            p.lower() for p in (common_passwords or [])
        )

    def validate(
        self, password: str, username: str = "", email: str = ""
    ) -> PasswordValidationResult:
        errors: list[str] = []

        if len(password) < 8:
            errors.append("Password must be at least 8 characters")

        if not re.search(r"[A-Z]", password):
            errors.append("Password must contain at least one uppercase letter")

        if not re.search(r"[a-z]", password):
            errors.append("Password must contain at least one lowercase letter")

        if not re.search(r"[0-9]", password):
            errors.append("Password must contain at least one digit")

        if not re.search(r"[!@#$%^&*]", password):
            errors.append("Password must contain at least one special character (!@#$%^&*)")

        if username and username.lower() in password.lower():
            errors.append("Password must not contain your username")

        if email and email.lower() in password.lower():
            errors.append("Password must not contain your email")

        if password.lower() in self.common_passwords:
            errors.append("Password is too common — please choose a different one")

        return PasswordValidationResult(
            is_valid=len(errors) == 0,
            errors=errors,
        )

Claude Code then runs the tests and confirms they all pass.

Phase 3: Refactor

All tests pass. Now refactor the PasswordValidator:
1. Extract each rule into a separate method for readability
2. Make rules configurable (min length, required character types)
3. Ensure all tests still pass after refactoring

Claude Code refactors while keeping all tests green:

@dataclass
class PasswordRules:
    min_length: int = 8
    require_uppercase: bool = True
    require_lowercase: bool = True
    require_digit: bool = True
    require_special: bool = True
    special_characters: str = "!@#$%^&*"
    check_personal_info: bool = True
    check_common_passwords: bool = True

class PasswordValidator:
    def __init__(
        self,
        rules: PasswordRules | None = None,
        common_passwords: list[str] | None = None,
    ):
        self.rules = rules or PasswordRules()
        self.common_passwords = set(
            p.lower() for p in (common_passwords or [])
        )

    def validate(
        self, password: str, username: str = "", email: str = ""
    ) -> PasswordValidationResult:
        errors: list[str] = []
        self._check_length(password, errors)
        self._check_uppercase(password, errors)
        self._check_lowercase(password, errors)
        self._check_digit(password, errors)
        self._check_special(password, errors)
        self._check_personal_info(password, username, email, errors)
        self._check_common(password, errors)
        return PasswordValidationResult(is_valid=len(errors) == 0, errors=errors)

    def _check_length(self, password: str, errors: list[str]) -> None:
        if len(password) < self.rules.min_length:
            errors.append(f"Password must be at least {self.rules.min_length} characters")

    def _check_uppercase(self, password: str, errors: list[str]) -> None:
        if self.rules.require_uppercase and not re.search(r"[A-Z]", password):
            errors.append("Password must contain at least one uppercase letter")

    # ... remaining check methods follow the same pattern

TDD with Claude Code for API Endpoints

The same red-green-refactor cycle works for API development:

Write integration tests for a POST /api/v1/orders endpoint that:
1. Creates an order with line items
2. Validates that all product IDs exist
3. Calculates total from line items (price * quantity)
4. Returns 201 with the created order
5. Returns 422 if any product ID is invalid
6. Returns 400 if the cart is empty
7. Returns 409 if any product is out of stock

Write the tests first. Do not implement the endpoint yet.

Then:

Run the tests to confirm they fail. Then implement the endpoint and service
to make all tests pass.

Benefits of TDD with Claude Code

1. Tests Define the Contract

When you write tests first, you define exactly what the code should do before Claude Code writes it. This eliminates the ambiguity that causes AI-generated code to miss requirements.

2. Tests Catch Regressions Immediately

Claude Code runs tests after every change. If a refactoring breaks something, it is caught and fixed in the same conversation.

3. Tests Serve as Documentation

The test suite Claude Code generates becomes documentation of your system's behavior. Future developers (and future Claude Code sessions) can read the tests to understand expected behavior.

4. Higher Confidence in AI-Generated Code

The biggest concern with AI-generated code is correctness. TDD directly addresses this by defining correctness criteria upfront and verifying them automatically.

Prompt Patterns for TDD

Start with Behavior, Not Implementation

Bad: "Write a function that loops through passwords and checks regex patterns"

Good: "Write tests for a password validator that enforces these rules: [list rules]"

Specify Edge Cases Explicitly

Include tests for these edge cases:
- Empty input
- Maximum length input
- Unicode characters
- Concurrent access
- Null/undefined values

Request Multiple Violation Reporting

The validator should report ALL violations, not just the first one.
Write a test that verifies multiple errors are returned for an input
that violates multiple rules.

Chain the Phases

Phase 1: Write failing tests for [feature]
Phase 2: Implement the minimum code to pass all tests
Phase 3: Refactor for readability and extensibility, keeping all tests green

TDD Anti-Patterns to Avoid with Claude Code

1. Asking for Tests After Implementation

If you ask Claude Code to implement a feature first and then write tests, the tests will be shaped by the implementation rather than the requirements. This leads to tests that pass by definition but miss edge cases.

2. Over-Mocking

Claude Code sometimes generates tests with too many mocks, testing implementation details rather than behavior. Guide it:

Write integration tests that test the actual database interactions.
Only mock external services (email, payment APIs). Do not mock repositories
or the database itself.

3. Testing Implementation Details

Test the public interface only. Do not test private methods or internal
data structures. Tests should verify behavior: given input X, expect output Y.

Measuring TDD Effectiveness

After a TDD session with Claude Code, check:

Run coverage analysis:
pytest --cov=app --cov-report=term-missing

Report:
1. Overall coverage percentage
2. Files with less than 80% coverage
3. Specific uncovered branches

Claude Code can then write additional tests to cover the gaps.

Conclusion

TDD with Claude Code combines the discipline of test-first development with the speed of AI-assisted coding. The red-green-refactor cycle becomes a natural conversation: describe the behavior, watch Claude Code write tests, confirm they fail, then ask for the implementation. The tests act as both a specification and a safety net, ensuring that AI-generated code meets your exact requirements. For teams that were previously hesitant about TDD due to the time overhead, Claude Code makes it practical and fast.

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.