Claude Code Nested Sub-Agents: The Complete Guide to 5-Level Deep Recursion

Claude Code v2.1.172, shipped June 10, 2026, quietly added one of the most powerful — and dangerous — features in the agentic coding space: nested sub-agents up to 5 levels deep. Previously, nesting was forbidden. Now a sub-agent can spawn its own sub-agents, which can spawn theirs, all the way down a five-frame recursive stack.

If you're a developer building with Claude Code, this changes how you think about agent architecture. It also changes how fast you can hit your monthly token budget. This guide breaks down exactly how nested sub-agents work, what they cost, and the three pitfalls that will burn you if you ignore them.

What Are Nested Sub-Agents and Why Do They Exist?

Before v2.1.172, Claude Code supported parallel sub-agents — multiple independent agents running side by side, each with its own context. Nested sub-agents are different: they're recursive. The parent spawns a child; the child can spawn a grandchild; and so on, down to depth 5.

The motivation is pure context management. A single agent with a 200K token context window fills up fast when doing real work — 15 web searches, 12 log greps, a wide codebase scan across hundreds of files. Before nesting, the only escape hatch was to hand off to a parallel sibling. That works, but it doesn't compose well for tasks that are inherently hierarchical: a root orchestrator that directs a planner that directs specialized workers.

Nested sub-agents solve this by letting each level offload its messy work before its own context window fills. When depth-1 is approaching capacity, it spawns a depth-2 child to handle the next dirty task. The parent reads only the leaf's compressed summary. The token-heavy middle gets discarded.

Think of it as recursion with a 5-frame call stack — except each frame carries its own system prompt, model selection, and tool allowlist.

How Nested Sub-Agents Work: The Architecture

Each sub-agent in the stack is an independent Claude Code session with:

Fresh 200K context window — no inherited pollution from the parent
Independent system prompt — configure each depth for its specific role
Independent model selection — you don't have to run Opus all the way down
Independent tool allowlist — lock each level to only the tools it needs
Summary handoff — the parent gets only the final structured output

The stack cap is 5 levels (root = depth 0, max child = depth 4). Attempts to spawn a sixth level fail gracefully — the sub-agent at depth 4 cannot issue an Agent tool call that succeeds.

Here's a concrete architecture pattern for a large codebase refactor:

Depth 0 (Root Orchestrator)   — claude-opus-4-8, plans the full refactor
  └── Depth 1 (File Triage)   — claude-sonnet-4-6, categorizes affected files
        └── Depth 2 (Module Analyzer) — claude-haiku-4-5, reads each module
              └── Depth 3 (Diff Generator) — claude-haiku-4-5, generates patches
                    └── Depth 4 (Validator)  — claude-haiku-4-5, lint + test check

The root never sees the raw file reads or diffs. It sees only a triage summary, then a list of validated patches. Its context stays clean for high-level decision-making.

The Token Math: What 5 Levels Actually Costs

This is where developers get surprised. Nested sub-agents are not free context. They multiply it.

A typical single-agent session on a complex task: ~50K tokens.

A 3-level nested chain doing the same task: approximately 350K tokens — a 7× multiplier. At Fable 5 pricing ($10 input / $50 output per million tokens), a task that costs $0.50 on a flat session can cost $3.50+ in a nested chain.

Why? Because:

Every level re-issues its own system prompt (2–8K tokens per level)

The parent serializes its current state to pass context to the child (5–20K tokens)

Autocompact fires at ~187K tokens, submitting the whole context for summarization — and can fire multiple times per level

The rule of thumb: ~7× token multiplier per branch depth. Run the math before you architect a deep chain.

Practical cost controls:

Use depth 1–2 for 90% of tasks — you get fresh context without runaway costs
Reserve depth 3–5 for genuinely hierarchical work (large monorepo analysis, multi-stage synthesis pipelines)
Set CLAUDE_CODE_SUBAGENT_MODEL=haiku at the shell level so every unspecified sub-agent defaults to Haiku, not Opus

3 Critical Pitfalls (and How to Avoid Them)

Pitfall 1: Opus-Everywhere Model Routing

By default, nested sub-agents inherit the calling model unless you explicitly override. If your root is running Opus, every child defaults to Opus too. A 5-level Opus chain on a 2-hour task can cost $40–$80 in a single session.

Fix: Set the default sub-agent model at the shell level:

bashexport CLAUDE_CODE_SUBAGENT_MODEL=claude-haiku-4-5-20251001

Then promote only the levels that genuinely need reasoning horsepower:

typescript// In your CLAUDE.md or agent spawning config
// Root: Opus (planning, architecture decisions)
// Depth 1: Sonnet (module-level analysis)
// Depth 2+: Haiku (file reads, search, diff generation)

The performance difference between Opus and Haiku on mechanical tasks — file reads, grep, formatting — is negligible. The cost difference is 20×.

Pitfall 2: Silent Allowlist Expansion

Without explicit configuration, the tools allowlist in Claude Code's agent config accepts everything. This means if you add a new sub-agent type (say, a migration-runner), every existing parent in your chain that uses Agent(*) can silently spawn it — including agents that should never touch migrations.

Fix: Anchor your allowlists explicitly at every level:

yaml# In your agent configuration
tools:
  - Agent(repro-runner)
  - Agent(log-summariser)
  - Read
  - Grep
  - Bash

Never use Agent(*) in production nested-agent pipelines. Enumerate exactly which sub-agent types each level is permitted to spawn. This also protects against prompt injection through child agents — a depth-2 agent that gets compromised can't spawn arbitrary new agents if its allowed types are locked.

Pitfall 3: Autocompact Overhead at Depth

Autocompact — Claude Code's automatic context compression — triggers at approximately 187K tokens. In a nested chain, this can fire multiple times per turn at each level. Each compaction summarizes the bloated context and costs 100–200K tokens of its own.

Fix: Tune compaction thresholds per level:

bash# Higher threshold for root (let it accumulate more before compacting)
CLAUDE_CODE_COMPACT_THRESHOLD=150000  # for root
CLAUDE_CODE_COMPACT_THRESHOLD=80000   # for depth 2+

At deeper levels, you actually want earlier compaction — the whole point of nesting is to keep each level's context lean. Set depth-2+ agents to compact aggressively (threshold 60–80K) so they summarize and hand off before they balloon.

Real-World Use Cases That Justify Nested Agents

Nesting adds complexity. Don't use it unless the task is genuinely hierarchical. Here are use cases where nested sub-agents provide real leverage:

Large monorepo refactors — Root orchestrates, depth-1 triages by domain, depth-2 workers handle individual packages in isolation. Multi-stage research synthesis — Root defines research questions, depth-1 spawns per-topic searchers, depth-2 cross-references findings across sources. Automated QA pipelines — Root selects test scenarios, depth-1 generates test plans per scenario, depth-2 executes isolated test runs and captures failures. Documentation generation at scale — Root maps the codebase structure, depth-1 generates docs per module, depth-2 writes individual function/class docs with type signatures.

For anything smaller — a single-file edit, a bug fix, a 10-file refactor — stick with flat or parallel agents. The overhead isn't worth it.

Claude Code v2.1.172: What Else Shipped

Along with nested sub-agents, v2.1.172 added:

Smarter model and region handling — Amazon Bedrock now reads the AWS region from ~/.aws config files when AWS_REGION isn't set, matching standard AWS SDK precedence. Eliminates a common misconfiguration for Bedrock-hosted Claude deployments.
New plugin search — Improved discovery for MCP server plugins from within Claude Code.
Better Chrome, VSCode, and terminal workflows — Performance improvements and bug fixes across the IDE integrations.

The Bedrock region fix is small but impactful for teams running Claude Code in multi-region AWS environments — no more environment variable workarounds.

Key Takeaways

Nested sub-agents are now live in Claude Code v2.1.172 (June 10, 2026), up to 5 levels deep
Each level gets a fresh 200K context window — the parent only sees the child's summary
Token multiplier is ~7× — price your workflows before committing to deep nesting
Default sub-agent model to Haiku at the shell level; promote only what needs Opus/Sonnet
Anchor your tools allowlist explicitly — never use Agent(*) in nested pipelines
Tune autocompact thresholds per level — depth-2+ should compact early and often
Use nested agents for hierarchical work — flat or parallel agents for everything else

Next Steps

If you're preparing for the Claude Certified Architect (CCA) exam, nested sub-agent architecture is exactly the kind of agentic systems design knowledge that appears in the exam. Understanding how context windows, model routing, and tool allowlists interact across agent depths is core to the Systems Design section.

Explore the CCA practice question bank at AI for Anything — 400+ questions covering agentic systems, the Claude API, MCP, and prompt engineering patterns. Free sample questions available without signing up.

Sources: Anthropic Claude Code release notes via Releasebot · Claude Code Nested Sub-Agents: 5 Levels Deep via ofox.ai · Claude Code Nested Subagents guide via claudefa.st · ADR-147 implementation reference via GitHub

Claude Code Nested Sub-Agents: The Complete Guide to 5-Level Deep Recursion (2026)