Claude Task Budgets: Control Token Spend in Agentic Loops with Opus 4.7

If you've built autonomous agents with Claude, you've hit the wall: the agent burns through tokens on a single research pass, runs over your cost ceiling, or—worse—gets hard-cut by max_tokens mid-task and returns a half-finished result. Claude Opus 4.7 ships a new answer: task budgets, a soft-token-limit system that lets Claude see its remaining allowance and self-regulate all the way to a graceful finish.

This guide covers everything developers need to know: what task budgets are, how they differ from max_tokens, the new xhigh effort tier they pair with, and practical implementation patterns for production agentic systems.

What Are Claude Task Budgets?

Task budgets are an advisory token ceiling that spans an entire agentic loop—including thinking tokens, tool calls, tool results, and final output. Unlike max_tokens, which is a per-request hard cap invisible to the model, a task budget is surfaced directly to Claude as a running countdown.

As Claude generates reasoning, issues tool calls, and processes results, it watches the counter decrease. It uses that signal to make runtime decisions: how deeply to search, whether to summarize instead of quote, when to stop gathering and start synthesizing. The model finishes its work in a way that matches whatever tokens remain, rather than cutting off abruptly.

Task budgets are currently in public beta on Claude Opus 4.7. You opt in by passing the beta header task-budgets-2026-03-13 in your API requests.

Task Budgets vs. `max_tokens`: The Key Difference

Feature	`max_tokens`	Task Budget
Scope	Per-request	Full agentic loop
Visible to model?	No	Yes (countdown)
Behavior at limit	Hard cutoff	Graceful wind-down
Type	Hard cap	Advisory ceiling
Minimum value	Any positive int	20,000 tokens

max_tokens stays relevant—you still set it to give Claude room to think and act within each individual request. The task budget governs the broader work session. The two parameters are complementary, not competing.

The xhigh Effort Tier: Why It Matters Here

Alongside task budgets, Opus 4.7 introduced a new effort level: xhigh (extra-high), sitting between high and max. In Claude Code, the default effort for all plans was raised to xhigh as of the May 2026 update.

Why does this matter for task budgets? At higher effort levels, Claude thinks more carefully per step—which consumes more tokens per turn. Without a task budget, an agent running at xhigh effort on a long research or coding task can balloon costs unexpectedly. Task budgets close that loop: you unlock deeper reasoning per step while capping total loop spend.

Practical rule: if you set effort to xhigh or max, set max_tokens to at least 64,000 per request to give Claude adequate headroom, and use a task budget of 50,000–128,000 tokens to cap the full session.

How to Implement Task Budgets

Basic Setup

pythonimport anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-7-20260416",
    max_tokens=64000,
    betas=["task-budgets-2026-03-13"],
    task_budget={"tokens": 80000},
    system="You are a research agent. Complete your task within the token budget provided.",
    messages=[
        {
            "role": "user",
            "content": "Research the five most common SQL injection patterns and write a mitigation guide."
        }
    ]
)

The task_budget object takes a tokens integer. The minimum is 20,000. There is no published maximum, but Anthropic's documentation recommends staying within 200,000 for most tasks—above that, the model's self-regulation becomes less precise.

Carrying Budgets Across Compaction Cycles

For long-running agents that compress their context mid-task (compaction cycles), you can forward the remaining budget so the agent retains awareness across the full job:

pythonimport anthropic

client = anthropic.Anthropic()

def run_agent_with_budget(
    messages: list,
    total_budget: int = 100000,
    tokens_used_so_far: int = 0
):
    remaining = total_budget - tokens_used_so_far

    response = client.beta.messages.create(
        model="claude-opus-4-7-20260416",
        max_tokens=64000,
        betas=["task-budgets-2026-03-13"],
        task_budget={"tokens": remaining},
        messages=messages
    )

    tokens_used_so_far += response.usage.input_tokens + response.usage.output_tokens
    return response, tokens_used_so_far

Carrying the budget forward is what makes task budgets genuinely useful for real agent systems rather than single isolated responses. Without forwarding, a compacted agent restarts with a fresh budget and loses cost governance across the full workflow.

TypeScript / Node.js Example

typescriptimport Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function runBudgetedAgent(task: string, tokenBudget = 80000) {
  const response = await client.beta.messages.create({
    model: "claude-opus-4-7-20260416",
    max_tokens: 64000,
    betas: ["task-budgets-2026-03-13"],
    task_budget: { tokens: tokenBudget },
    messages: [{ role: "user", content: task }],
  });

  console.log(`Tokens used: ${response.usage.input_tokens + response.usage.output_tokens}`);
  console.log(`Budget remaining: ${tokenBudget - (response.usage.input_tokens + response.usage.output_tokens)}`);

  return response;
}

Setting the Right Token Budget

There is no universal answer—task complexity drives budget. Anthropic's guidance breaks down roughly as follows:

Targeted refactors and focused analysis (50,000–75,000 tokens)

Single-file reviews, targeted bug fixes, quick API integration tasks. The agent reads, reasons, and responds without extensive tool chaining.

Multi-file code generation or research tasks (75,000–128,000 tokens)

Full feature implementations, competitive research reports, integration test suites. Expect multiple tool calls, context accumulation, and a longer synthesis phase.

Complex autonomous workflows (128,000–200,000 tokens)

End-to-end feature branches, security audit passes, document generation pipelines. Budget generously; Claude will self-regulate down, but you can't self-regulate up once the task starts.

One practical approach: start with a 100,000-token budget for all tasks, log actual usage per task type for a week, then tighten budgets based on observed p95 usage. Over-provisioning is far cheaper than under-provisioning—hard cutoffs break tasks silently.

Real-World Use Cases

Code Review Agents

Code review agents are a natural fit. You give the agent access to a PR diff via a tool, set a budget of 60,000 tokens, and let Claude decide how deep to go: quick lint-level pass or full logic trace. The agent paces itself rather than doing an exhaustive analysis that costs ten times more than the review warranted.

Research and Summarization Pipelines

For content pipelines that pull from multiple sources—news feeds, academic papers, competitor sites—task budgets prevent a single expensive source from consuming the entire session budget. Claude reads, decides the source is high-value, and allocates more tokens to it; or flags it as low-value and moves on.

Financial Data Agents

Anthropic's ten finance agent templates (announced May 5, 2026) use task budgets internally to govern pitchbook generation and KYC screening. A pitchbook agent searching 30 company filings needs different governance than a KYC pass over a single document. Task budgets let you set one policy and let the model calibrate.

CCA Exam Preparation Agents

For the Claude Certified Architect exam, understanding task budgets is increasingly important. Exam candidates are tested on agentic architecture design, including how to govern cost and quality in long-running agent loops. Task budgets represent a first-class architectural pattern you should be able to describe and implement.

Common Mistakes to Avoid

Setting the budget too close to the minimum (20,000 tokens). Claude won't be able to do meaningful multi-step work. Start at 50,000 minimum for any non-trivial task. Not setting max_tokens high enough. A 40,000-token max_tokens limit at xhigh effort will starve Claude's thinking. Always set max_tokens to at least 64,000 when using task budgets. Treating the task budget as a hard cost guarantee. Token cost includes input tokens (which include tool results). A task budget of 80,000 can still cost more than expected if tool results are large. Budget for total tokens, not just output. Forgetting to carry the budget across compaction. If your agent compacts its context mid-task, forward the remaining token count. Omitting this effectively resets cost governance at each compaction cycle.

Key Takeaways

Task budgets are an advisory token ceiling visible to Claude across a full agentic loop—not a hard per-request cap
The xhigh effort tier (new in Opus 4.7) pairs with task budgets: deeper reasoning per step, bounded total spend
Minimum budget is 20,000 tokens; 50,000–128,000 is the practical range for most production tasks
Carry budgets forward across context compaction cycles to maintain governance over long-running agents
Task budgets are currently in public beta; opt in with the task-budgets-2026-03-13 beta header
This is an active CCA exam topic—understanding agentic cost governance is a tested architecture skill

Next Steps

Task budgets are one piece of the agentic architecture puzzle that the Claude Certified Architect (CCA) exam tests. If you're preparing for the CCA-F certification, our practice test bank includes 150+ questions specifically on agentic patterns, token governance, tool-use design, and Opus 4.7 features—the exact competencies Anthropic validates in the exam.

Start with our free CCA study guide to map your knowledge gaps, then drill the agentic architecture module where task budgets, multi-agent orchestration, and effort-level tradeoffs are all tested.

Sources: Anthropic Task Budgets Docs · Introducing Claude Opus 4.7 · Claude Opus 4.7 — What's New · Engadget: Anthropic doubles Claude Code rate limits