Claude for Test-Driven Development: The Red-Green-Refactor Loop with AI

Most developers use Claude to write code faster. A smaller group uses it to write better code — and the difference usually comes down to one discipline: test-driven development.

If you have tried TDD without AI, you know the friction: writing a failing test before you know what the implementation looks like feels backwards. If you have tried using Claude without TDD, you have probably seen it hallucinate confident-sounding code that subtly breaks edge cases. The two problems cancel each other out. TDD gives Claude a precise target; Claude eliminates the tedium that makes TDD hard to sustain.

This guide walks through the full red-green-refactor loop using Claude Code, with real code examples and the automation hooks that make it stick.

Why TDD Pairs So Well with Claude

The core problem with AI-assisted coding is ambiguity. When you tell Claude "write a function that parses user input," it will write something — but it is optimizing for a plausible output, not necessarily the correct one. Edge cases get dropped. Assumptions get baked in silently.

Tests remove that ambiguity. A failing test is an exact, executable specification. Claude cannot hallucinate around it — either the test passes or it does not.

There is a second benefit: the test suite doubles as a correction mechanism. When Claude refactors code during the green or refactor phase, you run the tests immediately. Regressions surface in seconds, not in production three weeks later.

Here is what the loop looks like in practice:

Phase	Human writes	Claude writes	Signal
Red	Failing test(s)	Nothing yet	Test fails as expected
Green	Nothing	Minimal implementation to pass	All tests pass
Refactor	Notes on what to clean up	Refactored code	Tests still pass

The human's job in this loop is specification. Claude's job is implementation. That division of labor plays to both strengths.

Setting Up Your TDD Workflow with Claude Code

Before writing a single test, configure your workspace so Claude can run tests automatically. This is the step most tutorials skip — and it is the difference between Claude as a text editor and Claude as an agentic development partner.

1. Add a test runner hook to CLAUDE.md

Create or update your CLAUDE.md with a post-edit hook that runs your test suite whenever Claude modifies a source file:

markdown## Development Workflow

After editing any file in `src/`, run the test suite:
\`\`\`
npm test -- --watchAll=false
\`\`\`

If tests fail, fix them before moving to the next file. Do not ask me whether to run tests — run them automatically.

This single instruction transforms Claude's behavior. Instead of generating code and waiting for you to validate it, it generates, tests, observes the result, and self-corrects in a tight loop.

2. Configure your test runner for fast feedback

Slow tests break the loop. Claude will wait for your test suite to finish before continuing, so optimize for speed:

json// jest.config.ts
export default {
  testEnvironment: "node",
  testPathPattern: "__tests__",
  // Run only changed files during the TDD loop
  bail: 1, // Stop after first failure — faster signal
  verbose: false, // Less noise in Claude's context window
  coverage: false, // Turn on only for final review
};

For Python projects using pytest:

ini# pytest.ini
[pytest]
addopts = -x --tb=short -q

The -x flag stops at the first failure. --tb=short gives Claude a compact traceback that fits cleanly in its context window without flooding it with noise.

3. Structure your project for TDD

Keep tests colocated with source files. Claude navigates colocated tests more reliably than a separate /tests directory:

src/
  lib/
    parser.ts
    parser.test.ts   ← colocated
  services/
    email.ts
    email.test.ts    ← colocated

The Red Phase: Writing Tests Claude Can Implement Against

The red phase is entirely yours. This is where you think — and where most TDD tutorials fail by under-explaining what "write a failing test" actually means.

A good failing test has three properties:

It tests behavior, not implementation. Test what the function does, not how it does it.

It is minimal. One assertion per test case, one test case per behavior.

It fails for the right reason. The test should fail because the implementation does not exist yet, not because of a syntax error.

Here is an example. You are building a slug generator for blog posts:

typescript// src/lib/slugify.test.ts
import { slugify } from "./slugify";

describe("slugify", () => {
  it("converts spaces to hyphens", () => {
    expect(slugify("Hello World")).toBe("hello-world");
  });

  it("lowercases the string", () => {
    expect(slugify("SCREAMING CAPS")).toBe("screaming-caps");
  });

  it("removes special characters", () => {
    expect(slugify("Hello, World!")).toBe("hello-world");
  });

  it("collapses multiple spaces", () => {
    expect(slugify("Hello   World")).toBe("hello-world");
  });

  it("trims leading and trailing hyphens", () => {
    expect(slugify("  Hello World  ")).toBe("hello-world");
  });
});

Run the tests now. They should fail with Cannot find module './slugify'. That is the correct red state — you have a precise, executable specification and no implementation.

Prompt Claude for the green phase

Now open Claude Code and give it the task with tight framing:

I have written failing tests for a slugify function. 
The test file is at src/lib/slugify.test.ts.

Write the minimal implementation in src/lib/slugify.ts 
that makes all these tests pass. Do not add functionality 
that the tests do not cover. Run npm test after writing 
to confirm.

Claude reads the tests, writes the implementation, runs the test suite, and reports back. The key instruction is "minimal implementation" — without it, Claude will add features you did not ask for.

The Green Phase: Letting Claude Close the Loop

A typical Claude response to the prompt above:

typescript// src/lib/slugify.ts
export function slugify(input: string): string {
  return input
    .toLowerCase()
    .trim()
    .replace(/[^a-z0-9\s-]/g, "")
    .replace(/\s+/g, "-")
    .replace(/^-+|-+$/g, "");
}

Claude will then run npm test automatically (because of your CLAUDE.md hook) and report:

PASS src/lib/slugify.test.ts
  slugify
    ✓ converts spaces to hyphens (2ms)
    ✓ lowercases the string (1ms)
    ✓ removes special characters (1ms)
    ✓ collapses multiple spaces (1ms)
    ✓ trims leading and trailing hyphens (1ms)

Test Suites: 1 passed, 1 total
Tests:       5 passed, 5 total

All green. Now you can refactor with confidence.

When Claude's implementation fails a test

This happens. The correct response is to leave the failing test in place and prompt Claude to fix its implementation — not to change the test to match what Claude wrote:

The test "removes special characters" is still failing.
The expected output is "hello-world" but we're getting "hello-world!".
Fix the implementation without changing the tests.

Never modify tests to make them pass. The test is the specification. If it seems wrong, have a human conversation about whether the specification needs to change — then update the test intentionally, not as a workaround.

The Refactor Phase: Using Claude to Clean Up Safely

Once tests are green, you can safely refactor. This is where Claude earns its keep beyond just generating code.

Tell Claude what bothers you about the current implementation:

The tests are all passing. Now refactor slugify.ts:
1. Extract the regex patterns into named constants with comments
2. Add a TypeScript return type annotation
3. Do not change the function signature or behavior — tests must still pass

Run tests after refactoring.

Claude's refactored output:

typescript// src/lib/slugify.ts

// Strip everything except letters, numbers, spaces, and hyphens
const ALLOWED_CHARS = /[^a-z0-9\s-]/g;

// Collapse consecutive whitespace
const MULTIPLE_SPACES = /\s+/g;

// Remove leading/trailing hyphens left after trimming
const EDGE_HYPHENS = /^-+|-+$/g;

export function slugify(input: string): string {
  return input
    .toLowerCase()
    .trim()
    .replace(ALLOWED_CHARS, "")
    .replace(MULTIPLE_SPACES, "-")
    .replace(EDGE_HYPHENS, "");
}

Tests pass. The code is more readable. You never had to manually verify that refactoring did not break anything — the test suite did it for you in under a second.

Scaling TDD with Claude: Larger Features

The loop scales to larger features by decomposing them into layers. Write tests for the lowest-level unit first, get them green, then write tests for the layer above that depends on it.

Here is the pattern for a service layer:

typescript// 1. Write tests for the data layer first
describe("ArticleRepository", () => {
  it("saves an article and returns an id", async () => { ... });
  it("fetches article by slug", async () => { ... });
});

// 2. Once green, write tests for the service layer
describe("ArticleService", () => {
  it("creates article with auto-generated slug", async () => { ... });
  it("rejects duplicate slugs", async () => { ... });
});

// 3. Once green, write tests for the API layer
describe("POST /api/articles", () => {
  it("returns 201 with the new article id", async () => { ... });
  it("returns 409 on duplicate slug", async () => { ... });
});

Each layer builds on green tests from the layer below. Claude implements each layer knowing the contract of the layers it depends on is already validated.

Prompt for each layer:

The ArticleRepository tests are passing (see repository.test.ts).
Now write the ArticleService that uses the repository.
Here are the failing tests: [paste service test file]
Make them pass, then run the full test suite.

Common Mistakes and How to Avoid Them

Asking Claude to write tests and implementation together. This defeats the purpose of TDD. The whole point is that the test specifies behavior before implementation exists. If Claude writes both, it will write tests that match whatever implementation it already had in mind — not tests that constrain it. Writing tests that test implementation details. If your test reaches into private methods or asserts on internal data structures, it becomes fragile. Test the public interface: inputs and outputs only. Skipping the refactor phase. Green tests are not the finish line. Code that passes tests but is hard to read becomes technical debt. Use Claude to refactor aggressively — the safety net is there. Letting context windows drift. On large features, Claude's context fills up with accumulated test output and file contents. Start fresh Claude sessions at natural layer boundaries (e.g., after all repository tests are green, start a new session for the service layer).

Key Takeaways

TDD gives Claude an exact specification to implement against — it eliminates hallucinated assumptions
The CLAUDE.md hook for running tests automatically is the single highest-leverage configuration change you can make
Write tests in the red phase that test behavior, not implementation, and never modify them to match Claude's output
Scale to larger features by decomposing bottom-up: data layer → service layer → API layer
Use the refactor phase actively — green tests are the safety net that makes bold refactoring cheap

Next Steps: Prove Your Claude Skills

If you are using TDD with Claude Code in production, you already understand how Claude handles tool use, context management, and agentic workflows at a deeper level than most developers. That knowledge maps directly to the Claude Certified Architect (CCA-F) exam — Anthropic's professional certification for Claude expertise.

The CCA-F covers:

Prompt engineering and context window management
Tool use and function calling patterns
Multi-agent orchestration and safety
Claude API architecture and model selection

Take a free CCA-F practice test to see where you stand — many developers pass on their first attempt after six to eight weeks of hands-on Claude Code work.