Claude Model Selection Guide: Haiku vs Sonnet vs Opus — When to Use Each
Learn exactly when to choose Claude Haiku, Sonnet, or Opus for your project. Covers cost, speed, performance trade-offs, and real code examples. 2026 edition.
Claude Model Selection: Haiku vs Sonnet vs Opus — The Decision Framework Every Developer Needs
You've decided to build with Claude. Now the first real decision hits: which model? Pick Haiku and you might sacrifice quality. Pick Opus and you might blow your budget before launch. Pick the wrong tier and your latency tanks or your output quality disappoints.
This guide gives you a concrete decision framework — not generic advice, but specific rules for routing different tasks to the right model tier. By the end, you'll know exactly which model to use for which workload, what the cost difference actually means in practice, and how to switch models in code without rewriting your integration.
The Four Claude Model Tiers in 2026
Anthropic currently offers four model families, each with a clear performance and cost position:
| Model | ID | Best For | Relative Speed |
|---|---|---|---|
| Haiku 4.5 | claude-haiku-4-5-20251001 | High-volume, fast tasks | Fastest |
| Sonnet 4.6 | claude-sonnet-4-6 | Most production workloads | Fast |
| Opus 4.7 | claude-opus-4-7 | Complex reasoning, accuracy-critical | Moderate |
| Opus 4.8 | claude-opus-4-8 | Frontier tasks, agentic workflows | Moderate+ |
| Fable 5 | claude-fable-5 | Creative and narrative tasks | Varies |
For most developers building applications, the practical choice is between three tiers: Haiku (cheap + fast), Sonnet (balanced), and Opus (powerful + expensive). That's the comparison this guide focuses on.
The 70/20/10 Rule for Model Routing
Before diving into specifics, here's the mental model that works for most production systems:
- 70% of your requests can go to Haiku — classification, extraction, summarization, simple Q&A
- 20% of your requests need Sonnet — multi-step reasoning, code generation, nuanced writing
- 10% of your requests need Opus — complex analysis, agentic tasks, accuracy-critical decisions
Most teams that ship to production use a routing layer, not a single model. Start with Sonnet as your default, then optimize by routing simpler tasks down to Haiku and harder tasks up to Opus.
When to Use Claude Haiku
Claude Haiku 4.5 is Anthropic's fastest and most cost-efficient model. It's built for high-throughput, latency-sensitive workloads where you're processing thousands of requests.
Use Haiku when:- Classifying or tagging — sentiment analysis, intent detection, category labeling
- Extracting structured data — pulling fields from documents, parsing emails
- Simple summarization — condensing short text (under 2,000 tokens)
- Autocomplete or suggestions — next-word predictions, short content completions
- Guardrails and content moderation — checking outputs for policy violations
- Real-time chat responses — the first pass in a two-stage pipeline
- The task requires multi-step reasoning (Haiku doesn't chain logic well across many steps)
- Code generation is non-trivial (Haiku handles snippets but struggles with complex refactors)
- Output quality is user-facing and high-stakes
pythonimport anthropic
client = anthropic.Anthropic()
def classify_intent(user_message: str) -> str:
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Fast and cheap for classification
max_tokens=50,
messages=[
{
"role": "user",
"content": f"Classify the intent of this message in one word (question/complaint/purchase/feedback): {user_message}"
}
]
)
return response.content[0].text.strip()
# Processing 10,000 support tickets? Haiku is the right call.
intent = classify_intent("Where is my order? It's been 5 days!")
print(intent) # "complaint"When to Use Claude Sonnet
Claude Sonnet 4.6 is the workhorse model — fast enough for real-time applications, smart enough for genuinely complex tasks. It's where most production applications should start.
Sonnet 4.6 includes a 1 million token context window, which makes it exceptional for large-document processing, long conversation history, and codebase-level analysis without chunking.
Use Sonnet when:- Code generation — writing functions, debugging, explaining code, generating tests
- Nuanced writing — blog posts, emails, reports that need genuine craft
- Multi-document analysis — comparing PDFs, summarizing research, synthesizing sources
- RAG pipelines — generating answers from retrieved context chunks
- Conversational AI — chatbots that need to track context and reason across turns
- Structured output generation — producing complex JSON, filling templates
- Most agentic tasks — Claude Code itself runs on Sonnet by default
pythonimport anthropic
import json
client = anthropic.Anthropic()
def generate_code_with_tests(description: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6", # Balanced: capable for code, fast enough for dev loops
max_tokens=2048,
messages=[
{
"role": "user",
"content": f"""Write a Python function that {description}.
Include:
1. Type hints
2. A docstring
3. Three pytest test cases
Return as JSON with keys: function_code, test_code"""
}
]
)
return json.loads(response.content[0].text)
result = generate_code_with_tests("validates an email address using regex")
print(result["function_code"])python# Analyzing an entire codebase or long document
with open("large_report.txt", "r") as f:
document = f.read() # Could be 500+ pages
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=[
{
"role": "user",
"content": f"Summarize the key financial risks in this report:\n\n{document}"
}
]
)When to Use Claude Opus
Claude Opus (4.7 and 4.8) is Anthropic's most capable model family. Opus 4.8 specifically introduces dynamic workflow capabilities optimized for long-running agentic tasks, where the model needs to plan, reason, and adapt over many steps.
The trade-off: Opus costs significantly more than Sonnet and runs slower. That's the right trade-off when the quality of the output directly impacts business outcomes.
Use Opus when:- High-stakes analysis — investment memos, legal document review, medical information synthesis
- Complex multi-step reasoning — problems where Sonnet's output is clearly missing logical steps
- Long-horizon agentic tasks — automations that run for minutes or hours and need consistent judgment
- Research synthesis — pulling insight from contradictory sources and resolving tensions
- CCA exam prep generation — creating high-quality practice questions requires frontier-level understanding of nuance
- Architecture decisions — reviewing system designs for correctness, security, and trade-offs
pythonimport anthropic
client = anthropic.Anthropic()
def deep_analysis(context: str, question: str) -> str:
"""Use Opus for reasoning-intensive tasks where accuracy matters."""
response = client.messages.create(
model="claude-opus-4-8", # Worth the cost when the output is high-stakes
max_tokens=8192,
thinking={
"type": "enabled",
"budget_tokens": 5000 # Let Opus reason before answering
},
messages=[
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}\n\nProvide a detailed, rigorous analysis."
}
]
)
# Extended thinking produces better-reasoned outputs for Opus
for block in response.content:
if block.type == "text":
return block.text
return ""
analysis = deep_analysis(
context="[Full acquisition target financials...]",
question="What are the three biggest risks in this acquisition and how material are they?"
)Cost Comparison: The Real Numbers
Understanding relative costs helps you design smarter routing. Here's the framework for thinking about it:
Haiku is roughly 10-20x cheaper than Opus per token. For classification tasks running at 1,000 requests/day, routing to Haiku instead of Sonnet can save 60-80% of your API bill. Sonnet sits in the middle — priced for production use with significant cost savings over Opus. Opus commands a premium but delivers measurably better outputs on complex tasks. The question is always: does the quality improvement justify the cost for this specific task? Practical routing example:pythondef route_to_model(task_type: str, complexity: str) -> str:
"""Route tasks to the appropriate model tier."""
if task_type in ["classify", "extract", "moderate", "tag"]:
return "claude-haiku-4-5-20251001"
if task_type in ["generate", "summarize", "chat", "code"] and complexity == "standard":
return "claude-sonnet-4-6"
if complexity == "high" or task_type in ["analyze", "reason", "architect"]:
return "claude-opus-4-8"
return "claude-sonnet-4-6" # Default to Sonnet
# Usage
model = route_to_model(task_type="classify", complexity="standard")
# → "claude-haiku-4-5-20251001"
model = route_to_model(task_type="analyze", complexity="high")
# → "claude-opus-4-8"Benchmarking Your Own Workload
Generic benchmarks from Anthropic don't tell you which model is right for your task. Run this yourself before committing to a model tier:
pythonimport anthropic
import time
client = anthropic.Anthropic()
def benchmark_model(model: str, prompt: str, runs: int = 5) -> dict:
latencies = []
for _ in range(runs):
start = time.time()
response = client.messages.create(
model=model,
max_tokens=512,
messages=[{"role": "user", "content": prompt}]
)
latency = time.time() - start
latencies.append(latency)
return {
"model": model,
"avg_latency_s": sum(latencies) / len(latencies),
"min_latency_s": min(latencies),
"sample_output": response.content[0].text[:200]
}
test_prompt = "Summarize the key benefits of prompt caching in Claude API in 3 bullet points."
for model in ["claude-haiku-4-5-20251001", "claude-sonnet-4-6", "claude-opus-4-8"]:
result = benchmark_model(model, test_prompt)
print(f"\n{result['model']}")
print(f" Avg latency: {result['avg_latency_s']:.2f}s")
print(f" Sample: {result['sample_output'][:100]}...")Run this with a sample of your real production prompts. You'll quickly see where Haiku's output quality is "good enough" and where it falls short.
The Two-Stage Pipeline Pattern
For many production systems, the right architecture isn't a single model — it's a pipeline:
pythondef two_stage_pipeline(document: str, question: str) -> str:
"""
Stage 1: Haiku extracts relevant sections (fast, cheap)
Stage 2: Sonnet answers using only the relevant content (accurate)
"""
# Stage 1: Use Haiku to find relevant sections
extract_response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract the 3-5 most relevant passages for this question: '{question}'\n\nDocument:\n{document}"
}]
)
relevant_passages = extract_response.content[0].text
# Stage 2: Use Sonnet to reason over the extracted content
answer_response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"Based on these passages:\n{relevant_passages}\n\nAnswer: {question}"
}]
)
return answer_response.content[0].textThis pattern is especially powerful for RAG systems: Haiku handles the high-volume retrieval and extraction passes; Sonnet handles the synthesis and generation.
Model Selection for CCA Certification Candidates
If you're preparing for the Claude Certified Architect (CCA) exam, model selection is a tested domain. The exam expects you to know:
- When to use extended thinking (Opus) vs standard mode (Sonnet/Haiku)
- How context window sizes affect architecture decisions
- Cost optimization patterns like model routing and prompt caching
- The trade-offs between latency, cost, and output quality
Understanding these trade-offs at a code level — not just conceptually — is what separates passing scores from high scores on the CCA.
Key Takeaways
- Default to Sonnet for most production workloads — it's the best balance of capability and cost in 2026
- Route to Haiku for classification, extraction, moderation, and any task where "good enough" is truly good enough at 10-20x lower cost
- Escalate to Opus only when reasoning depth, analysis quality, or agentic reliability directly impacts your product outcomes
- Benchmark on your actual prompts — generic benchmarks don't predict your workload's performance
- Two-stage pipelines combine Haiku's speed for extraction with Sonnet's quality for synthesis
- Model IDs are pinned versions — use them explicitly in production so a model upgrade doesn't change your behavior unexpectedly
Next Steps
The fastest way to internalize these trade-offs is to run them yourself. Set up a free Anthropic account, take your 5 most common production prompts, and benchmark all three tiers.
If you're working toward the Claude Certified Architect (CCA-F) certification, AI for Anything has a full practice test bank covering model selection, agentic patterns, and system design — the exact domains tested on the exam.
Explore CCA Practice Tests →You can also start with our free CCA Study Guide to understand the full exam structure before diving into timed practice.
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.