Claude Sonnet 4.6's 1M Token Context Window Is Now GA — What Every Developer Needs to Know

You've heard the number before: 1 million tokens. But until recently, it came with asterisks — beta headers required, pricing premiums, and spotty reliability. That changed in March 2026. Anthropic made the 1M token context window generally available for both Claude Opus 4.6 and Claude Sonnet 4.6 at standard per-token pricing — no multipliers, no special flags.

If you build with Claude, this changes what's possible. This guide breaks down exactly what changed, what you can actually do with a million tokens, and — just as importantly — when you shouldn't use it.

What Changed: 1M Context Goes GA at Standard Pricing

Before this release, long-context Claude access required a beta header (anthropic-beta: output-300k-2026-03-24 or similar) and came with an implicit premium. Requests over 200K tokens were routed differently, and the pricing math was opaque.

Here's the new reality as of April 2026:

Model	Context Window	Input price	Output price
Claude Sonnet 4.6	1M tokens (GA)	$3 / MTok	$15 / MTok
Claude Opus 4.6	1M tokens (GA)	$15 / MTok	$75 / MTok

Key changes:

Requests over 200K tokens now work automatically — no beta header required
Pricing is flat: a 900K-token request costs the same per-token rate as a 9K one
Media limits jumped from 100 to 600 images or PDF pages per request
Sonnet 4.6 also ships with improvements across coding, computer use, agentic planning, and long-context reasoning

Anthropic notes that early users preferred Sonnet 4.6 over Sonnet 4.5 in roughly 70% of tests and even favored it over Claude Opus 4.5 in 59% of side-by-side comparisons — citing better instruction following, fewer hallucinations, and less overengineering.

What 1 Million Tokens Actually Looks Like

Abstract token counts are hard to reason about. Here's what fits inside 1M tokens:

~750,000 words of English text (roughly 10+ average novels)
A 100,000-line codebase with full context
600 scanned PDF pages (new media limit)
An entire fiscal year of meeting transcripts
Every commit message, PR description, and inline comment in a mid-sized repo

For software developers, this means Claude can see your entire codebase at once — not just the file you're editing, but the API layer, the schema migrations, the test suite, and the frontend consuming it. That's a qualitatively different kind of assistance than current-file autocomplete.

4 Real Use Cases Worth Your Attention

1. Full-Codebase Architectural Review

This is the headline use case for engineering teams. Instead of pasting isolated snippets and getting isolated advice, you can now send Claude your entire repository and ask cross-cutting questions:

pythonimport anthropic

client = anthropic.Anthropic()

# Load your entire codebase as context
with open("codebase_dump.txt", "r") as f:
    codebase = f.read()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": f"""Here is our entire codebase:

{codebase}

Please identify:
1. All locations where we make database calls without connection pooling
2. Any N+1 query patterns in our ORM usage
3. Inconsistent error handling patterns across services"""
        }
    ]
)

print(message.content[0].text)

Claude can now trace a data flow from the HTTP handler through the service layer to the database and back — without you having to manually stitch the context together.

2. Legal and Contract Document Analysis

Legal tech is a core enterprise use case. Law firms and legal teams are using the 1M window to cross-reference depositions, surface connections across case files, and analyze multi-hundred-page contracts in a single pass.

A typical prompt pattern: feed in all discovery documents, all relevant statutes, and prior case precedents — then ask Claude to identify contradictions, flag missing clauses, or summarize obligations by party. What used to require multiple chunked API calls (with all the coherence problems that brings) now fits in one shot.

3. Long-Running Agent Memory

For agent workflows, the 1M window solves a fundamental problem: agents forgetting what they decided two hours ago. When Claude runs a complex multi-step task — automated code review, data pipeline analysis, extended research — a large context window ensures early decisions and constraints remain visible throughout the task.

This integrates naturally with Claude Managed Agents, which launched in public beta earlier this month. Agent tasks that previously required chunked memory systems or external vector stores can now hold more of their working state directly in context.

4. Multi-Document Research Synthesis

Researchers and analysts can now load hundreds of papers, annual reports, or regulatory filings in a single request. Claude can synthesize findings across sources, identify conflicting claims, and produce structured summaries — without you managing the chunking logic.

Anthropic highlights scientific discovery as a key use case: reasoning across research literature, mathematical frameworks, databases, and simulation code simultaneously. The expanded media limit (600 images/PDFs per request) makes multimodal research workflows practical for the first time.

When NOT to Use 1M Tokens

This is the section that saves you money and frustration.

Don't dump everything in and hope. Loading irrelevant files dilutes the signal Claude uses to prioritize attention. A 1M token request where only 50K tokens are relevant will produce worse results than a focused 50K token request. Context quality matters more than context quantity. Don't use it for real-time user-facing features. Processing 1M tokens takes time. For applications where a user is waiting for a response — a chat interface, a live coding assistant — the latency is noticeable. Long context works best in async or batch workflows: overnight analysis jobs, background indexing tasks, scheduled reviews. Don't confuse window size with working memory. Claude attends to all tokens in context, but attention isn't uniform. Information in the middle of very long contexts can receive less weight than information at the beginning or end. For critical instructions or constraints, put them at the start and, if needed, repeat them near the end. Consider prompt caching first. If your large context is relatively static (a system prompt + large document base that doesn't change often), Anthropic's prompt caching can reduce both latency and cost significantly. Cache the stable part; only update the dynamic part per request.

How This Fits the Claude Certified Architect Exam

If you're preparing for the Claude Certified Architect (CCA-F) certification, context window architecture is an exam topic you'll encounter directly. The CCA exam tests whether you can make appropriate architectural decisions — including:

When to use long context vs. RAG (Retrieval-Augmented Generation)
How context length affects latency, cost, and model performance
Designing agent systems that use context efficiently
Prompt caching strategies for large document workloads

Understanding the tradeoffs in this guide isn't just practical — it's exam-relevant. The CCA distinguishes between engineers who know how to call the API and architects who know how to design systems around it.

Key Takeaways

Claude Sonnet 4.6 and Opus 4.6 now support 1M token context windows GA with flat per-token pricing — no beta header, no premium multiplier
Sonnet 4.6 costs $3 input / $15 output per million tokens across the full window
The media limit jumped to 600 images or PDF pages per request
Best use cases: full-codebase analysis, legal document review, long-running agent workflows, multi-document research
Avoid it for real-time user-facing features, irrelevant content dumps, and cases where prompt caching would serve better
For CCA exam candidates, this release surfaces real architectural tradeoffs you need to understand cold

Start Practicing These Concepts

The best way to understand context window architecture is to work through exam-style questions that test your decision-making — not just your ability to recall facts.

Our Claude Certified Architect Practice Test Bank includes questions on context window design, RAG vs. long-context tradeoffs, agent memory patterns, and prompt caching strategy — exactly the architectural thinking the CCA-F exam measures.

Browse Free Sample Questions →

Sources: Anthropic — Introducing Claude Sonnet 4.6 · 1M Context Now GA · Claude API Context Windows Docs · Claude Release Notes

Claude Sonnet 4.6's 1M Token Context Window: Complete Developer Guide (2026)