How to Build Multi-Agent AI Systems with Claude: Orchestrator + Subagent Pattern

You want Claude to do more than answer a single question. You want it to research a topic, write a report, validate the output, and format it for publishing — all in one run, without you babysitting it.

That's multi-agent orchestration. And it's the architecture behind every serious Claude-powered product in production today.

This tutorial walks you through the core pattern: one orchestrator agent that breaks down a task and delegates, plus multiple subagents that execute focused subtasks in parallel. You'll get working TypeScript code, cost management strategies, and a real-world example you can adapt immediately.

Why Multi-Agent? The Problem With Single-Prompt AI

A single Claude call has hard limits:

Context window ceiling — Complex tasks need more information than fits in one prompt
No parallelism — Sequential reasoning is slow when subtasks are independent
No specialization — A general prompt can't be simultaneously a researcher, a critic, and a formatter
No checkpointing — If Claude goes off-track in a 10,000-token chain-of-thought, you waste the whole call

Multi-agent systems solve all four. The orchestrator handles coordination logic; subagents handle execution. Each subagent gets a tight, focused system prompt that makes it excellent at one thing. Subagents that don't depend on each other run in parallel, collapsing wall-clock time dramatically.

The pattern mirrors how software teams work: a tech lead breaks work into tickets, engineers execute in parallel, and the lead reviews and integrates. Claude's API lets you implement exactly this.

The Orchestrator-Subagent Pattern Explained

The pattern has three moving parts:

User Request
     │
     ▼
┌─────────────┐
│ Orchestrator │  ← Breaks task into subtasks, manages state
└──────┬──────┘
       │  spawns
  ┌────┴────┐
  ▼         ▼
[Agent A] [Agent B]   ← Subagents: one job each, run in parallel
  │         │
  └────┬────┘
       ▼
  [Integration]       ← Orchestrator merges results
       │
       ▼
  Final Output

Each subagent is just a Claude API call with:

A specialized system prompt defining its role

Focused input (not the full task context)

Structured output format the orchestrator can parse

The orchestrator is also a Claude call — but its job is planning and synthesis, not execution.

Setting Up: Dependencies and API Client

typescriptimport Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Reusable helper for a single agent call
async function runAgent(
  systemPrompt: string,
  userMessage: string,
  model: string = "claude-haiku-4-5-20251001" // use Haiku for subagents to save cost
): Promise<string> {
  const response = await client.messages.create({
    model,
    max_tokens: 2048,
    system: systemPrompt,
    messages: [{ role: "user", content: userMessage }],
  });

  const content = response.content[0];
  if (content.type !== "text") throw new Error("Unexpected response type");
  return content.text;
}

Cost tip: Use claude-haiku-4-5-20251001 for subagents handling focused, structured tasks. Reserve claude-sonnet-4-6 for the orchestrator (complex reasoning) and only reach for claude-opus-4-6 if synthesis quality is critical.

Step 1: The Orchestrator Agent

The orchestrator receives the user's goal and produces a structured task plan — a JSON array of subtask definitions that it will dispatch to subagents.

typescriptconst ORCHESTRATOR_SYSTEM = `You are a task orchestration agent. 
Given a high-level goal, break it into 2-5 independent subtasks that can be executed in parallel.

Return ONLY a JSON array with this structure:
[
  {
    "id": "task_1",
    "role": "one-sentence description of this subagent's job",
    "systemPrompt": "detailed system prompt for this subagent",
    "input": "the specific input this subagent should process"
  }
]

Rules:
- Each subtask must be completable independently (no dependencies on other subtasks)
- Subtask inputs must be self-contained — don't reference 'the other agents'
- system prompts should be specific and role-focused`;

async function orchestrate(goal: string): Promise<SubTask[]> {
  const response = await runAgent(
    ORCHESTRATOR_SYSTEM,
    `Goal: ${goal}`,
    "claude-sonnet-4-6" // orchestrator needs stronger reasoning
  );

  // Parse the JSON plan
  const jsonMatch = response.match(/\[[\s\S]*\]/);
  if (!jsonMatch) throw new Error("Orchestrator did not return valid JSON");
  return JSON.parse(jsonMatch[0]) as SubTask[];
}

interface SubTask {
  id: string;
  role: string;
  systemPrompt: string;
  input: string;
}

Step 2: Running Subagents in Parallel

This is where the speed win comes from. Use Promise.all() to fire all subagent calls simultaneously:

typescriptinterface SubTaskResult {
  id: string;
  role: string;
  output: string;
  error?: string;
}

async function runSubagents(subtasks: SubTask[]): Promise<SubTaskResult[]> {
  const results = await Promise.all(
    subtasks.map(async (task): Promise<SubTaskResult> => {
      try {
        const output = await runAgent(task.systemPrompt, task.input);
        return { id: task.id, role: task.role, output };
      } catch (err) {
        // Don't let one failed subagent kill the whole pipeline
        const error = err instanceof Error ? err.message : "Unknown error";
        return { id: task.id, role: task.role, output: "", error };
      }
    })
  );
  return results;
}

The try/catch per subagent is critical. If one subagent fails (rate limit, bad output, timeout), you want the other results to survive. The integrator can handle partial results gracefully.

Step 3: The Integration Agent

Once subagents finish, a final Claude call synthesizes all outputs into a coherent result:

typescriptconst INTEGRATOR_SYSTEM = `You are a synthesis agent. You receive outputs from multiple specialized agents
and integrate them into a single, coherent, well-structured response.

Do not simply concatenate. Find connections, resolve contradictions, and produce a unified whole.
Preserve factual specifics from each agent's output.`;

async function integrate(
  goal: string,
  results: SubTaskResult[]
): Promise<string> {
  const validResults = results.filter((r) => !r.error && r.output);

  if (validResults.length === 0) {
    throw new Error("All subagents failed — cannot integrate");
  }

  const agentOutputs = validResults
    .map((r) => `## ${r.role}\n${r.output}`)
    .join("\n\n");

  const prompt = `Original goal: ${goal}\n\nAgent outputs to integrate:\n\n${agentOutputs}`;

  return runAgent(INTEGRATOR_SYSTEM, prompt, "claude-sonnet-4-6");
}

Step 4: Wiring It Together

typescriptasync function runMultiAgentPipeline(goal: string): Promise<string> {
  console.log("🎯 Orchestrating task:", goal);

  // 1. Plan
  const subtasks = await orchestrate(goal);
  console.log(`📋 Planned ${subtasks.length} subtasks`);
  subtasks.forEach((t) => console.log(`  - ${t.id}: ${t.role}`));

  // 2. Execute in parallel
  console.log("⚡ Running subagents in parallel...");
  const startTime = Date.now();
  const results = await runSubagents(subtasks);
  const elapsed = ((Date.now() - startTime) / 1000).toFixed(1);

  const succeeded = results.filter((r) => !r.error).length;
  console.log(`✅ ${succeeded}/${subtasks.length} subagents completed in ${elapsed}s`);

  // 3. Integrate
  console.log("🔀 Integrating results...");
  const finalOutput = await integrate(goal, results);

  return finalOutput;
}

// Example usage
const result = await runMultiAgentPipeline(
  "Write a competitive analysis of Claude Code vs GitHub Copilot vs Cursor for professional developers"
);
console.log(result);

Real-World Example: Content Research Pipeline

Here's the pattern applied to a concrete use case — the AiA content pipeline that generates research-backed blog articles:

typescript// Specialized subagent system prompts for content creation

const RESEARCH_AGENT_SYSTEM = `You are a research analyst. Given a topic, identify:
1. The 5 most important facts/statistics (with approximate sources)
2. The 3 main pain points the target audience has with this topic
3. The key terminology and concepts someone needs to understand
Return as structured markdown.`;

const OUTLINE_AGENT_SYSTEM = `You are a content strategist. Given a topic and keywords,
create a detailed article outline with:
- H1 title optimized for the primary keyword
- H2 sections with clear learning objectives
- Key points to cover in each section
- Suggested code examples or tables
Return as structured markdown.`;

const COMPETITOR_AGENT_SYSTEM = `You are an SEO analyst. Given a topic,
describe what angle and depth a strong-performing article on this topic would have.
Consider: search intent, what developers actually want to learn, common misconceptions to address.
Return 3-5 specific recommendations.`;

async function researchPipeline(topic: string, keywords: string[]) {
  const goal = `Create comprehensive research for an article on: "${topic}". Target keywords: ${keywords.join(", ")}`;

  // These three can run in parallel — they don't depend on each other
  const [research, outline, seoAngle] = await Promise.all([
    runAgent(RESEARCH_AGENT_SYSTEM, topic),
    runAgent(OUTLINE_AGENT_SYSTEM, `Topic: ${topic}\nKeywords: ${keywords.join(", ")}`),
    runAgent(COMPETITOR_AGENT_SYSTEM, topic),
  ]);

  return { research, outline, seoAngle };
}

This approach cuts the pipeline from ~45 seconds (sequential) to ~15 seconds (parallel) on typical content tasks.

Handling Errors and Retries

Production multi-agent systems need retry logic. Claude's API occasionally returns rate limit errors (429) or transient failures:

typescriptasync function runAgentWithRetry(
  systemPrompt: string,
  userMessage: string,
  maxRetries: number = 3
): Promise<string> {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await runAgent(systemPrompt, userMessage);
    } catch (err) {
      const isRateLimit =
        err instanceof Error && err.message.includes("rate_limit");
      const isLastAttempt = attempt === maxRetries;

      if (isLastAttempt) throw err;

      // Exponential backoff: 1s, 2s, 4s
      const waitMs = Math.pow(2, attempt - 1) * 1000;
      console.warn(`Attempt ${attempt} failed. Retrying in ${waitMs}ms...`);
      await new Promise((resolve) => setTimeout(resolve, waitMs));
    }
  }
  throw new Error("Unreachable");
}

Cost Management for Multi-Agent Systems

Multi-agent systems multiply your API calls. Here's the cost model to keep in mind:

Agent Role	Recommended Model	Typical Tokens	Cost/Call
Orchestrator (planner)	claude-sonnet-4-6	~1,500 in + 500 out	~$0.004
Subagent (execution)	claude-haiku-4-5	~800 in + 600 out	~$0.0003
Integrator (synthesis)	claude-sonnet-4-6	~3,000 in + 1,000 out	~$0.013
Total (4 subagents)	Mixed	—	~$0.019

Four strategies to keep costs down:

Haiku for focused tasks — Subagents that extract, transform, or format data rarely need Sonnet's reasoning. Haiku is 20x cheaper per token.

Structured outputs reduce tokens — Ask subagents to return JSON or markdown tables instead of prose. Tighter formats mean fewer output tokens.

Gate expensive calls — Use a cheap classification agent to decide whether the full pipeline is needed, or whether a simpler single-call response suffices.

Cache orchestration plans — If the same class of task is requested repeatedly, cache the orchestrator's task plan and only run subagents fresh.

Common Mistakes to Avoid

Mistake 1: Circular dependencies between subagents

If Subagent B needs Subagent A's output, they can't run in parallel. Design subtasks to be truly independent, or chain them sequentially in separate phases.

Mistake 2: Passing the full context to every subagent

Each subagent should only receive the information it needs. Bloated contexts waste tokens and confuse specialized agents.

Mistake 3: No output validation

Subagent outputs are LLM responses — they can be malformed JSON, off-format, or hallucinated. Always validate before passing to the integrator.

typescriptfunction validateSubagentOutput(output: string, expectedFormat: "json" | "markdown" | "text"): boolean {
  if (expectedFormat === "json") {
    try {
      JSON.parse(output);
      return true;
    } catch {
      return false;
    }
  }
  return output.length > 50; // basic sanity check for text/markdown
}

Mistake 4: Unbounded parallelism

Launching 20 subagents at once will hit rate limits. Cap parallel calls to 5-10, or use a concurrency limiter like p-limit.

typescriptimport pLimit from "p-limit";

const limit = pLimit(5); // max 5 concurrent calls

const results = await Promise.all(
  subtasks.map((task) =>
    limit(() => runAgent(task.systemPrompt, task.input))
  )
);

When to Use Multi-Agent vs. Single-Call

Multi-agent isn't always the answer. Use this decision framework:

Situation	Approach
Simple Q&A, single-step generation	Single call
Task has 2+ independent subtasks	Multi-agent (parallel)
Task requires > 100K tokens of context	Multi-agent (split context)
Need expert specialization (researcher + writer + critic)	Multi-agent (specialized system prompts)
Real-time response required (< 2s)	Single call or pre-computed
Cost-sensitive, simple outputs	Single call with Haiku

Key Takeaways

The orchestrator-subagent pattern is the foundation of production Claude agent systems — one planner, many executors, one integrator
Promise.all() is your parallelism primitive — independent subagents should always run concurrently
Use Haiku for execution, Sonnet for reasoning — this alone cuts multi-agent costs by 60-80%
Fail gracefully — wrap each subagent in try/catch so one failure doesn't collapse the pipeline
Validate subagent outputs before passing them downstream — LLMs can return unexpected formats
Multi-agent orchestration is a core topic on the Claude Certified Architect (CCA) exam — understanding these patterns deeply will serve you both in production and in certification prep

Next Steps

Ready to go deeper?

Practice what you learned:

Implement the pipeline above and test it on a research task in your domain
Experiment with different subagent model choices (Haiku vs Sonnet) and measure cost vs quality tradeoffs
Add streaming to the integrator step for real-time output

Prepare for the Claude Certified Architect (CCA) exam:

The CCA exam tests multi-agent architecture, orchestration patterns, tool use, and cost optimization — exactly what this tutorial covered. Our CCA Practice Test Bank includes 150+ questions with detailed explanations, including a full section on agentic system design.

Explore related tutorials:

How to Build Your First Claude Agent — single-agent foundations
Claude Tool Use: Function Calling Complete Guide — adding tools to your agents
Best MCP Servers for Claude Code 2026 — extend your agents with pre-built tools

Claude Multi-Agent Orchestration: Build Parallel AI Pipelines (2026 Tutorial)