Claude Extended Thinking: When to Use It and How to Build With It
Learn how Claude's extended thinking mode works, which models support it, when it improves results, and how to enable it via the API with code examples.
Claude Extended Thinking: The Complete Guide to Deep Reasoning Mode
You've probably noticed that Claude sometimes gets things wrong on tricky logic puzzles, multi-step math problems, or architecture decisions with many trade-offs. The issue isn't that Claude lacks knowledge — it's that it rushed to an answer without working through the problem step-by-step. Extended thinking is Anthropic's solution to exactly that problem.
Extended thinking gives Claude a scratchpad to reason through difficult problems before writing its final response. The result: measurably better accuracy on tasks that require deliberate, sequential reasoning. In this guide, you'll learn what extended thinking is, which models support it, when to use it (and when to skip it), and how to wire it up in your own applications.
What Is Claude Extended Thinking?
Extended thinking is a mode where Claude generates a chain of internal reasoning — called "thinking blocks" — before producing its final answer. Think of it as Claude talking through a problem out loud before committing to a response.
Unlike a separate reasoning model, extended thinking is the same Claude model given more compute time. When you enable it, Claude produces a sequence of thinking content blocks (visible via the API) followed by the normal text response block. The thinking blocks show how Claude broke down the problem, explored alternatives, checked its work, and arrived at a conclusion.
This is fundamentally different from standard prompt chaining or chain-of-thought prompting. The thinking happens at the model level, is much richer than what you'd get by asking Claude to "think step by step," and is tightly integrated with how Claude plans multi-step tasks.
Which Claude Models Support Extended Thinking?
As of April 2026, extended thinking is available on:
| Model | Extended Thinking | Notes |
|---|---|---|
| Claude Opus 4.6 | ✅ Full support | Best reasoning depth, highest token budget |
| Claude Sonnet 4.6 | ✅ Full support | Best balance of speed + reasoning quality |
| Claude Haiku 4.5 | ❌ Not supported | Use standard mode; Haiku is optimized for speed |
| Claude 3.7 Sonnet | ✅ Legacy support | First model to ship extended thinking |
| Claude 3.5 and below | ❌ Not supported | — |
When to Use Extended Thinking (and When to Skip It)
Extended thinking is not universally better. Research shows it can hurt performance by up to 36% on tasks that benefit from fast, intuitive responses — similar to how humans perform worse when they overthink simple decisions.
Use Extended Thinking For:
Multi-step mathematical reasoningGCD calculations, optimization problems, probability puzzles, financial modeling with multiple constraints. Claude's standard mode often makes arithmetic errors mid-chain; extended thinking catches and corrects them.
Complex code architecture decisions"Should I use event sourcing or CQRS for this payments service?" or "Refactor this 1,200-line God class into proper domain services." Extended thinking lets Claude consider trade-offs systematically before prescribing a solution.
Long-horizon agentic tasksWhen Claude is running multiple tool calls — reading files, querying APIs, writing code, running tests — extended thinking lets it replan based on intermediate results rather than following a rigid upfront plan.
Nuanced analysis with many variablesCompetitive analysis, legal document review, technical feasibility assessments. Tasks where missing one factor could invalidate the whole answer benefit from Claude's ability to enumerate considerations.
Debugging hard-to-reproduce errorsStack traces with ambiguous root causes, race conditions, environment-specific failures. Extended thinking lets Claude systematically work through hypotheses.
Skip Extended Thinking For:
- Simple factual questions — "What's the capital of France?" doesn't need a thinking budget.
- Creative writing and summarization — These tasks are fluency-driven, not logic-driven.
- High-frequency, latency-sensitive API calls — Chat responses, autocomplete, classification at scale.
- Tasks where you've already specified the full reasoning path — If your prompt already breaks down every step, extended thinking adds latency without benefit.
How to Enable Extended Thinking via the API
Enabling extended thinking in the Anthropic Python SDK takes three lines:
pythonimport anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # How much Claude can spend on internal reasoning
},
messages=[
{
"role": "user",
"content": "Design a rate-limiting strategy for a multi-tenant SaaS API. "
"We have free (100 req/min), pro (1000 req/min), and enterprise (custom) tiers. "
"Consider Redis, token bucket vs. sliding window, and burst handling."
}
]
)
# Response contains both thinking blocks and the final text block
for block in response.content:
if block.type == "thinking":
print("=== THINKING ===")
print(block.thinking)
elif block.type == "text":
print("=== RESPONSE ===")
print(block.text)Streaming with Extended Thinking
For production use — especially when max_tokens exceeds ~21,000 — streaming is required. Here's how to handle both thinking deltas and text deltas in the stream:
pythonimport anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[
{
"role": "user",
"content": "Given this Python stack trace, identify the root cause "
"and suggest a fix:\n\n[paste your stack trace here]"
}
],
) as stream:
current_block_type = None
for event in stream:
if event.type == "content_block_start":
current_block_type = event.content_block.type
label = "Thinking..." if current_block_type == "thinking" else "Response:"
print(f"\n{label}", flush=True)
elif event.type == "content_block_delta":
if event.delta.type == "thinking_delta":
print(event.delta.thinking, end="", flush=True)
elif event.delta.type == "text_delta":
print(event.delta.text, end="", flush=True)
elif event.type == "content_block_stop":
print() # newline after each blockTypeScript / Node.js Example
typescriptimport Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000,
},
messages: [
{
role: "user",
content:
"Analyze the time complexity of this algorithm and suggest an O(n log n) alternative: [paste code]",
},
],
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Reasoning:", block.thinking);
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}Choosing the Right budget_tokens
The budget_tokens parameter controls how many tokens Claude can spend on internal reasoning. This is distinct from max_tokens, which caps the final response.
budget_tokensmust be less thanmax_tokens- Minimum: 1,024 tokens
- Maximum: varies by model (Opus 4.6 supports up to 32,000 for thinking)
- Claude won't always use the full budget — it stops when it has a confident answer
| Task Complexity | Recommended budget_tokens |
|---|---|
| Moderate (debugging, code review) | 3,000 – 5,000 |
| Hard (architecture decisions, algorithm design) | 8,000 – 12,000 |
| Very hard (multi-domain analysis, theorem proving) | 16,000 – 32,000 |
Using Extended Thinking in Claude.ai and Claude Code
You don't need the API to access extended thinking — it's available directly in the Claude.ai interface and Claude Code.
In Claude.ai:You'll see a collapsible "Thinking" block before each response when the model decides to use it. Claude won't always engage extended thinking on every message — it activates automatically when the task seems to warrant it.
In Claude Code:Use /think or /ultrathink prefixes in your message to request deep reasoning mode:
/think Refactor this service to use the repository pattern.
Consider backward compatibility and the existing test suite./ultrathink Design a schema migration strategy for renaming the `users`
table to `accounts` across 12 microservices with zero downtime./ultrathink allocates a larger thinking budget than /think, making it suitable for the most demanding architectural and debugging tasks.
Real-World Use Cases
1. Algorithm Analysis
Extended thinking excels at algorithm problems that require holding multiple constraints in mind simultaneously:
pythonresponse = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8000,
thinking={"type": "enabled", "budget_tokens": 6000},
messages=[{
"role": "user",
"content": "Find the minimum number of coins to make change for $0.41 "
"using US coin denominations. Show why a greedy approach fails here."
}]
)2. Code Review with Reasoning
Instead of getting a surface-level review, extended thinking lets Claude reason about architectural concerns before surfacing them:
pythonwith open("payment_service.py") as f:
code = f.read()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=12000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[{
"role": "user",
"content": f"Review this payment service for security vulnerabilities, "
f"race conditions, and PCI-DSS compliance issues:\n\n{code}"
}]
)3. Technical Specification Generation
When building something new, extended thinking helps Claude systematically consider edge cases and produce more complete specs than standard mode.
Cost and Latency Trade-offs
Extended thinking adds both latency and token cost. Thinking tokens are billed at the same rate as output tokens for the model you're using.
Rough estimates for Sonnet 4.6 at 10,000 budget_tokens:- Additional latency: ~5–15 seconds before first text token
- Additional cost: ~$0.003–$0.015 per call depending on how much of the budget is used
For most production applications, extended thinking is reserved for asynchronous workflows — analysis jobs, batch processing, agent planning steps — rather than real-time chat responses. If you need low latency, use standard mode and reserve extended thinking for the complex sub-tasks within your pipeline.
Key Takeaways
- Extended thinking is the same Claude model with more compute time — not a separate model
- Enable it with
thinking={"type": "enabled", "budget_tokens": N}in the API - Use it for math, code architecture, multi-step analysis, and hard debugging — skip it for factual Q&A and creative tasks
/thinkand/ultrathinkwork in Claude Code for quick access without the API- Start with
budget_tokens: 5000and adjust based on task complexity and actual usage - Streaming is required when
max_tokens > ~21,000
Next Steps
Understanding when and how to use extended thinking is part of mastering the full Claude API. If you're preparing for the Claude Certified Architect (CCA-F) exam, this is exactly the kind of architecture-level knowledge the exam tests — knowing which model features to reach for and why.
Practice your Claude API knowledge with our CCA-F Practice Test Bank — 200+ scenario-based questions covering extended thinking, model selection, prompt engineering, agentic patterns, and cost optimization. The same decisions you'd make on a real AI project, tested in exam format.Not ready to pay yet? Start with our free 20-question CCA sample quiz — no signup required.
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.