Claude Extended Thinking: The Complete Guide to Deep Reasoning Mode

You've probably noticed that Claude sometimes gets things wrong on tricky logic puzzles, multi-step math problems, or architecture decisions with many trade-offs. The issue isn't that Claude lacks knowledge — it's that it rushed to an answer without working through the problem step-by-step. Extended thinking is Anthropic's solution to exactly that problem.

Extended thinking gives Claude a scratchpad to reason through difficult problems before writing its final response. The result: measurably better accuracy on tasks that require deliberate, sequential reasoning. In this guide, you'll learn what extended thinking is, which models support it, when to use it (and when to skip it), and how to wire it up in your own applications.

What Is Claude Extended Thinking?

Extended thinking is a mode where Claude generates a chain of internal reasoning — called "thinking blocks" — before producing its final answer. Think of it as Claude talking through a problem out loud before committing to a response.

Unlike a separate reasoning model, extended thinking is the same Claude model given more compute time. When you enable it, Claude produces a sequence of thinking content blocks (visible via the API) followed by the normal text response block. The thinking blocks show how Claude broke down the problem, explored alternatives, checked its work, and arrived at a conclusion.

This is fundamentally different from standard prompt chaining or chain-of-thought prompting. The thinking happens at the model level, is much richer than what you'd get by asking Claude to "think step by step," and is tightly integrated with how Claude plans multi-step tasks.

Which Claude Models Support Extended Thinking?

As of April 2026, extended thinking is available on:

Model	Extended Thinking	Notes
Claude Opus 4.6	✅ Full support	Best reasoning depth, highest token budget
Claude Sonnet 4.6	✅ Full support	Best balance of speed + reasoning quality
Claude Haiku 4.5	❌ Not supported	Use standard mode; Haiku is optimized for speed
Claude 3.7 Sonnet	✅ Legacy support	First model to ship extended thinking
Claude 3.5 and below	❌ Not supported	—

Rule of thumb: Use extended thinking with Sonnet 4.6 for everyday complex tasks. Upgrade to Opus 4.6 when the problem domain is genuinely hard — advanced math proofs, multi-party contract analysis, complex debugging across large codebases.

When to Use Extended Thinking (and When to Skip It)

Extended thinking is not universally better. Research shows it can hurt performance by up to 36% on tasks that benefit from fast, intuitive responses — similar to how humans perform worse when they overthink simple decisions.

Use Extended Thinking For:

Multi-step mathematical reasoning

GCD calculations, optimization problems, probability puzzles, financial modeling with multiple constraints. Claude's standard mode often makes arithmetic errors mid-chain; extended thinking catches and corrects them.

Complex code architecture decisions

"Should I use event sourcing or CQRS for this payments service?" or "Refactor this 1,200-line God class into proper domain services." Extended thinking lets Claude consider trade-offs systematically before prescribing a solution.

Long-horizon agentic tasks

When Claude is running multiple tool calls — reading files, querying APIs, writing code, running tests — extended thinking lets it replan based on intermediate results rather than following a rigid upfront plan.

Nuanced analysis with many variables

Competitive analysis, legal document review, technical feasibility assessments. Tasks where missing one factor could invalidate the whole answer benefit from Claude's ability to enumerate considerations.

Debugging hard-to-reproduce errors

Stack traces with ambiguous root causes, race conditions, environment-specific failures. Extended thinking lets Claude systematically work through hypotheses.

Skip Extended Thinking For:

Simple factual questions — "What's the capital of France?" doesn't need a thinking budget.
Creative writing and summarization — These tasks are fluency-driven, not logic-driven.
High-frequency, latency-sensitive API calls — Chat responses, autocomplete, classification at scale.
Tasks where you've already specified the full reasoning path — If your prompt already breaks down every step, extended thinking adds latency without benefit.

How to Enable Extended Thinking via the API

Enabling extended thinking in the Anthropic Python SDK takes three lines:

pythonimport anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # How much Claude can spend on internal reasoning
    },
    messages=[
        {
            "role": "user",
            "content": "Design a rate-limiting strategy for a multi-tenant SaaS API. "
                       "We have free (100 req/min), pro (1000 req/min), and enterprise (custom) tiers. "
                       "Consider Redis, token bucket vs. sliding window, and burst handling."
        }
    ]
)

# Response contains both thinking blocks and the final text block
for block in response.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)

Streaming with Extended Thinking

For production use — especially when max_tokens exceeds ~21,000 — streaming is required. Here's how to handle both thinking deltas and text deltas in the stream:

pythonimport anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[
        {
            "role": "user",
            "content": "Given this Python stack trace, identify the root cause "
                       "and suggest a fix:\n\n[paste your stack trace here]"
        }
    ],
) as stream:
    current_block_type = None

    for event in stream:
        if event.type == "content_block_start":
            current_block_type = event.content_block.type
            label = "Thinking..." if current_block_type == "thinking" else "Response:"
            print(f"\n{label}", flush=True)

        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

        elif event.type == "content_block_stop":
            print()  # newline after each block

TypeScript / Node.js Example

typescriptimport Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
  },
  messages: [
    {
      role: "user",
      content:
        "Analyze the time complexity of this algorithm and suggest an O(n log n) alternative: [paste code]",
    },
  ],
});

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("Reasoning:", block.thinking);
  } else if (block.type === "text") {
    console.log("Answer:", block.text);
  }
}

Choosing the Right `budget_tokens`

The budget_tokens parameter controls how many tokens Claude can spend on internal reasoning. This is distinct from max_tokens, which caps the final response.

Key constraints:

budget_tokens must be less than max_tokens
Minimum: 1,024 tokens
Maximum: varies by model (Opus 4.6 supports up to 32,000 for thinking)
Claude won't always use the full budget — it stops when it has a confident answer

Task Complexity	Recommended `budget_tokens`
Moderate (debugging, code review)	3,000 – 5,000
Hard (architecture decisions, algorithm design)	8,000 – 12,000
Very hard (multi-domain analysis, theorem proving)	16,000 – 32,000

Tip: Start at 5,000 and benchmark against your use case. If Claude is running out of budget (you'll see truncated thinking blocks), double it. If you're regularly using less than 30% of the budget, reduce it to save cost.

Using Extended Thinking in Claude.ai and Claude Code

You don't need the API to access extended thinking — it's available directly in the Claude.ai interface and Claude Code.

In Claude.ai:

Click the model selector dropdown

Choose Claude Sonnet 4.6 or Opus 4.6

Click "Search and tools" (lower left)

Toggle Extended thinking on

You'll see a collapsible "Thinking" block before each response when the model decides to use it. Claude won't always engage extended thinking on every message — it activates automatically when the task seems to warrant it.

In Claude Code:

Use /think or /ultrathink prefixes in your message to request deep reasoning mode:

/think Refactor this service to use the repository pattern. 
       Consider backward compatibility and the existing test suite.

/ultrathink Design a schema migration strategy for renaming the `users` 
             table to `accounts` across 12 microservices with zero downtime.

/ultrathink allocates a larger thinking budget than /think, making it suitable for the most demanding architectural and debugging tasks.

Real-World Use Cases

1. Algorithm Analysis

Extended thinking excels at algorithm problems that require holding multiple constraints in mind simultaneously:

pythonresponse = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=8000,
    thinking={"type": "enabled", "budget_tokens": 6000},
    messages=[{
        "role": "user",
        "content": "Find the minimum number of coins to make change for $0.41 "
                   "using US coin denominations. Show why a greedy approach fails here."
    }]
)

2. Code Review with Reasoning

Instead of getting a surface-level review, extended thinking lets Claude reason about architectural concerns before surfacing them:

pythonwith open("payment_service.py") as f:
    code = f.read()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=12000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{
        "role": "user",
        "content": f"Review this payment service for security vulnerabilities, "
                   f"race conditions, and PCI-DSS compliance issues:\n\n{code}"
    }]
)

3. Technical Specification Generation

When building something new, extended thinking helps Claude systematically consider edge cases and produce more complete specs than standard mode.

Cost and Latency Trade-offs

Extended thinking adds both latency and token cost. Thinking tokens are billed at the same rate as output tokens for the model you're using.

Rough estimates for Sonnet 4.6 at 10,000 budget_tokens:

Additional latency: ~5–15 seconds before first text token
Additional cost: ~$0.003–$0.015 per call depending on how much of the budget is used

For most production applications, extended thinking is reserved for asynchronous workflows — analysis jobs, batch processing, agent planning steps — rather than real-time chat responses. If you need low latency, use standard mode and reserve extended thinking for the complex sub-tasks within your pipeline.

Key Takeaways

Extended thinking is the same Claude model with more compute time — not a separate model
Enable it with thinking={"type": "enabled", "budget_tokens": N} in the API
Use it for math, code architecture, multi-step analysis, and hard debugging — skip it for factual Q&A and creative tasks
/think and /ultrathink work in Claude Code for quick access without the API
Start with budget_tokens: 5000 and adjust based on task complexity and actual usage
Streaming is required when max_tokens > ~21,000

Next Steps

Understanding when and how to use extended thinking is part of mastering the full Claude API. If you're preparing for the Claude Certified Architect (CCA-F) exam, this is exactly the kind of architecture-level knowledge the exam tests — knowing which model features to reach for and why.

Practice your Claude API knowledge with our CCA-F Practice Test Bank — 200+ scenario-based questions covering extended thinking, model selection, prompt engineering, agentic patterns, and cost optimization. The same decisions you'd make on a real AI project, tested in exam format.

Not ready to pay yet? Start with our free 20-question CCA sample quiz — no signup required.

Claude Extended Thinking: When to Use It and How to Build With It

Claude Extended Thinking: The Complete Guide to Deep Reasoning Mode

What Is Claude Extended Thinking?

Which Claude Models Support Extended Thinking?

When to Use Extended Thinking (and When to Skip It)

Use Extended Thinking For:

Skip Extended Thinking For:

How to Enable Extended Thinking via the API

Streaming with Extended Thinking

TypeScript / Node.js Example

Choosing the Right `budget_tokens`

Using Extended Thinking in Claude.ai and Claude Code

Real-World Use Cases

1. Algorithm Analysis

2. Code Review with Reasoning

3. Technical Specification Generation

Cost and Latency Trade-offs

Key Takeaways

Next Steps

Ready to Start Practicing?

Claude Extended Thinking: The Complete Guide to Deep Reasoning Mode

What Is Claude Extended Thinking?

Which Claude Models Support Extended Thinking?

When to Use Extended Thinking (and When to Skip It)

Use Extended Thinking For:

Skip Extended Thinking For:

How to Enable Extended Thinking via the API

Streaming with Extended Thinking

TypeScript / Node.js Example

Choosing the Right budget_tokens

Using Extended Thinking in Claude.ai and Claude Code

Real-World Use Cases

1. Algorithm Analysis

2. Code Review with Reasoning

3. Technical Specification Generation

Cost and Latency Trade-offs

Key Takeaways

Next Steps

Ready to Start Practicing?

Choosing the Right `budget_tokens`