tutorials11 min read

Claude Multi-Agent Orchestration: Build Parallel AI Pipelines (2026 Tutorial)

Learn to build multi-agent AI systems with Claude using the orchestrator-subagent pattern. Step-by-step tutorial with code, cost tips, and real-world examples.

How to Build Multi-Agent AI Systems with Claude: Orchestrator + Subagent Pattern

You want Claude to do more than answer a single question. You want it to research a topic, write a report, validate the output, and format it for publishing — all in one run, without you babysitting it.

That's multi-agent orchestration. And it's the architecture behind every serious Claude-powered product in production today.

This tutorial walks you through the core pattern: one orchestrator agent that breaks down a task and delegates, plus multiple subagents that execute focused subtasks in parallel. You'll get working TypeScript code, cost management strategies, and a real-world example you can adapt immediately.

Why Multi-Agent? The Problem With Single-Prompt AI

A single Claude call has hard limits:

  • Context window ceiling — Complex tasks need more information than fits in one prompt
  • No parallelism — Sequential reasoning is slow when subtasks are independent
  • No specialization — A general prompt can't be simultaneously a researcher, a critic, and a formatter
  • No checkpointing — If Claude goes off-track in a 10,000-token chain-of-thought, you waste the whole call

Multi-agent systems solve all four. The orchestrator handles coordination logic; subagents handle execution. Each subagent gets a tight, focused system prompt that makes it excellent at one thing. Subagents that don't depend on each other run in parallel, collapsing wall-clock time dramatically.

The pattern mirrors how software teams work: a tech lead breaks work into tickets, engineers execute in parallel, and the lead reviews and integrates. Claude's API lets you implement exactly this.

The Orchestrator-Subagent Pattern Explained

The pattern has three moving parts:

User Request
     │
     ▼
┌─────────────┐
│ Orchestrator │  ← Breaks task into subtasks, manages state
└──────┬──────┘
       │  spawns
  ┌────┴────┐
  ▼         ▼
[Agent A] [Agent B]   ← Subagents: one job each, run in parallel
  │         │
  └────┬────┘
       ▼
  [Integration]       ← Orchestrator merges results
       │
       ▼
  Final Output

Each subagent is just a Claude API call with:

  • A specialized system prompt defining its role
  • Focused input (not the full task context)
  • Structured output format the orchestrator can parse
  • The orchestrator is also a Claude call — but its job is planning and synthesis, not execution.

    Setting Up: Dependencies and API Client

    typescriptimport Anthropic from "@anthropic-ai/sdk";
    
    const client = new Anthropic({
      apiKey: process.env.ANTHROPIC_API_KEY,
    });
    
    // Reusable helper for a single agent call
    async function runAgent(
      systemPrompt: string,
      userMessage: string,
      model: string = "claude-haiku-4-5-20251001" // use Haiku for subagents to save cost
    ): Promise<string> {
      const response = await client.messages.create({
        model,
        max_tokens: 2048,
        system: systemPrompt,
        messages: [{ role: "user", content: userMessage }],
      });
    
      const content = response.content[0];
      if (content.type !== "text") throw new Error("Unexpected response type");
      return content.text;
    }

    Cost tip: Use claude-haiku-4-5-20251001 for subagents handling focused, structured tasks. Reserve claude-sonnet-4-6 for the orchestrator (complex reasoning) and only reach for claude-opus-4-6 if synthesis quality is critical.

    Step 1: The Orchestrator Agent

    The orchestrator receives the user's goal and produces a structured task plan — a JSON array of subtask definitions that it will dispatch to subagents.

    typescriptconst ORCHESTRATOR_SYSTEM = `You are a task orchestration agent. 
    Given a high-level goal, break it into 2-5 independent subtasks that can be executed in parallel.
    
    Return ONLY a JSON array with this structure:
    [
      {
        "id": "task_1",
        "role": "one-sentence description of this subagent's job",
        "systemPrompt": "detailed system prompt for this subagent",
        "input": "the specific input this subagent should process"
      }
    ]
    
    Rules:
    - Each subtask must be completable independently (no dependencies on other subtasks)
    - Subtask inputs must be self-contained — don't reference 'the other agents'
    - system prompts should be specific and role-focused`;
    
    async function orchestrate(goal: string): Promise<SubTask[]> {
      const response = await runAgent(
        ORCHESTRATOR_SYSTEM,
        `Goal: ${goal}`,
        "claude-sonnet-4-6" // orchestrator needs stronger reasoning
      );
    
      // Parse the JSON plan
      const jsonMatch = response.match(/\[[\s\S]*\]/);
      if (!jsonMatch) throw new Error("Orchestrator did not return valid JSON");
      return JSON.parse(jsonMatch[0]) as SubTask[];
    }
    
    interface SubTask {
      id: string;
      role: string;
      systemPrompt: string;
      input: string;
    }

    Step 2: Running Subagents in Parallel

    This is where the speed win comes from. Use Promise.all() to fire all subagent calls simultaneously:

    typescriptinterface SubTaskResult {
      id: string;
      role: string;
      output: string;
      error?: string;
    }
    
    async function runSubagents(subtasks: SubTask[]): Promise<SubTaskResult[]> {
      const results = await Promise.all(
        subtasks.map(async (task): Promise<SubTaskResult> => {
          try {
            const output = await runAgent(task.systemPrompt, task.input);
            return { id: task.id, role: task.role, output };
          } catch (err) {
            // Don't let one failed subagent kill the whole pipeline
            const error = err instanceof Error ? err.message : "Unknown error";
            return { id: task.id, role: task.role, output: "", error };
          }
        })
      );
      return results;
    }

    The try/catch per subagent is critical. If one subagent fails (rate limit, bad output, timeout), you want the other results to survive. The integrator can handle partial results gracefully.

    Step 3: The Integration Agent

    Once subagents finish, a final Claude call synthesizes all outputs into a coherent result:

    typescriptconst INTEGRATOR_SYSTEM = `You are a synthesis agent. You receive outputs from multiple specialized agents
    and integrate them into a single, coherent, well-structured response.
    
    Do not simply concatenate. Find connections, resolve contradictions, and produce a unified whole.
    Preserve factual specifics from each agent's output.`;
    
    async function integrate(
      goal: string,
      results: SubTaskResult[]
    ): Promise<string> {
      const validResults = results.filter((r) => !r.error && r.output);
    
      if (validResults.length === 0) {
        throw new Error("All subagents failed — cannot integrate");
      }
    
      const agentOutputs = validResults
        .map((r) => `## ${r.role}\n${r.output}`)
        .join("\n\n");
    
      const prompt = `Original goal: ${goal}\n\nAgent outputs to integrate:\n\n${agentOutputs}`;
    
      return runAgent(INTEGRATOR_SYSTEM, prompt, "claude-sonnet-4-6");
    }

    Step 4: Wiring It Together

    typescriptasync function runMultiAgentPipeline(goal: string): Promise<string> {
      console.log("🎯 Orchestrating task:", goal);
    
      // 1. Plan
      const subtasks = await orchestrate(goal);
      console.log(`📋 Planned ${subtasks.length} subtasks`);
      subtasks.forEach((t) => console.log(`  - ${t.id}: ${t.role}`));
    
      // 2. Execute in parallel
      console.log("⚡ Running subagents in parallel...");
      const startTime = Date.now();
      const results = await runSubagents(subtasks);
      const elapsed = ((Date.now() - startTime) / 1000).toFixed(1);
    
      const succeeded = results.filter((r) => !r.error).length;
      console.log(`✅ ${succeeded}/${subtasks.length} subagents completed in ${elapsed}s`);
    
      // 3. Integrate
      console.log("🔀 Integrating results...");
      const finalOutput = await integrate(goal, results);
    
      return finalOutput;
    }
    
    // Example usage
    const result = await runMultiAgentPipeline(
      "Write a competitive analysis of Claude Code vs GitHub Copilot vs Cursor for professional developers"
    );
    console.log(result);

    Real-World Example: Content Research Pipeline

    Here's the pattern applied to a concrete use case — the AiA content pipeline that generates research-backed blog articles:

    typescript// Specialized subagent system prompts for content creation
    
    const RESEARCH_AGENT_SYSTEM = `You are a research analyst. Given a topic, identify:
    1. The 5 most important facts/statistics (with approximate sources)
    2. The 3 main pain points the target audience has with this topic
    3. The key terminology and concepts someone needs to understand
    Return as structured markdown.`;
    
    const OUTLINE_AGENT_SYSTEM = `You are a content strategist. Given a topic and keywords,
    create a detailed article outline with:
    - H1 title optimized for the primary keyword
    - H2 sections with clear learning objectives
    - Key points to cover in each section
    - Suggested code examples or tables
    Return as structured markdown.`;
    
    const COMPETITOR_AGENT_SYSTEM = `You are an SEO analyst. Given a topic,
    describe what angle and depth a strong-performing article on this topic would have.
    Consider: search intent, what developers actually want to learn, common misconceptions to address.
    Return 3-5 specific recommendations.`;
    
    async function researchPipeline(topic: string, keywords: string[]) {
      const goal = `Create comprehensive research for an article on: "${topic}". Target keywords: ${keywords.join(", ")}`;
    
      // These three can run in parallel — they don't depend on each other
      const [research, outline, seoAngle] = await Promise.all([
        runAgent(RESEARCH_AGENT_SYSTEM, topic),
        runAgent(OUTLINE_AGENT_SYSTEM, `Topic: ${topic}\nKeywords: ${keywords.join(", ")}`),
        runAgent(COMPETITOR_AGENT_SYSTEM, topic),
      ]);
    
      return { research, outline, seoAngle };
    }

    This approach cuts the pipeline from ~45 seconds (sequential) to ~15 seconds (parallel) on typical content tasks.

    Handling Errors and Retries

    Production multi-agent systems need retry logic. Claude's API occasionally returns rate limit errors (429) or transient failures:

    typescriptasync function runAgentWithRetry(
      systemPrompt: string,
      userMessage: string,
      maxRetries: number = 3
    ): Promise<string> {
      for (let attempt = 1; attempt <= maxRetries; attempt++) {
        try {
          return await runAgent(systemPrompt, userMessage);
        } catch (err) {
          const isRateLimit =
            err instanceof Error && err.message.includes("rate_limit");
          const isLastAttempt = attempt === maxRetries;
    
          if (isLastAttempt) throw err;
    
          // Exponential backoff: 1s, 2s, 4s
          const waitMs = Math.pow(2, attempt - 1) * 1000;
          console.warn(`Attempt ${attempt} failed. Retrying in ${waitMs}ms...`);
          await new Promise((resolve) => setTimeout(resolve, waitMs));
        }
      }
      throw new Error("Unreachable");
    }

    Cost Management for Multi-Agent Systems

    Multi-agent systems multiply your API calls. Here's the cost model to keep in mind:

    Agent RoleRecommended ModelTypical TokensCost/Call
    Orchestrator (planner)claude-sonnet-4-6~1,500 in + 500 out~$0.004
    Subagent (execution)claude-haiku-4-5~800 in + 600 out~$0.0003
    Integrator (synthesis)claude-sonnet-4-6~3,000 in + 1,000 out~$0.013
    Total (4 subagents)Mixed~$0.019
    Four strategies to keep costs down:
  • Haiku for focused tasks — Subagents that extract, transform, or format data rarely need Sonnet's reasoning. Haiku is 20x cheaper per token.
  • Structured outputs reduce tokens — Ask subagents to return JSON or markdown tables instead of prose. Tighter formats mean fewer output tokens.
  • Gate expensive calls — Use a cheap classification agent to decide whether the full pipeline is needed, or whether a simpler single-call response suffices.
  • Cache orchestration plans — If the same class of task is requested repeatedly, cache the orchestrator's task plan and only run subagents fresh.
  • Common Mistakes to Avoid

    Mistake 1: Circular dependencies between subagents

    If Subagent B needs Subagent A's output, they can't run in parallel. Design subtasks to be truly independent, or chain them sequentially in separate phases.

    Mistake 2: Passing the full context to every subagent

    Each subagent should only receive the information it needs. Bloated contexts waste tokens and confuse specialized agents.

    Mistake 3: No output validation

    Subagent outputs are LLM responses — they can be malformed JSON, off-format, or hallucinated. Always validate before passing to the integrator.

    typescriptfunction validateSubagentOutput(output: string, expectedFormat: "json" | "markdown" | "text"): boolean {
      if (expectedFormat === "json") {
        try {
          JSON.parse(output);
          return true;
        } catch {
          return false;
        }
      }
      return output.length > 50; // basic sanity check for text/markdown
    }

    Mistake 4: Unbounded parallelism

    Launching 20 subagents at once will hit rate limits. Cap parallel calls to 5-10, or use a concurrency limiter like p-limit.

    typescriptimport pLimit from "p-limit";
    
    const limit = pLimit(5); // max 5 concurrent calls
    
    const results = await Promise.all(
      subtasks.map((task) =>
        limit(() => runAgent(task.systemPrompt, task.input))
      )
    );

    When to Use Multi-Agent vs. Single-Call

    Multi-agent isn't always the answer. Use this decision framework:

    SituationApproach
    Simple Q&A, single-step generationSingle call
    Task has 2+ independent subtasksMulti-agent (parallel)
    Task requires > 100K tokens of contextMulti-agent (split context)
    Need expert specialization (researcher + writer + critic)Multi-agent (specialized system prompts)
    Real-time response required (< 2s)Single call or pre-computed
    Cost-sensitive, simple outputsSingle call with Haiku

    Key Takeaways

    • The orchestrator-subagent pattern is the foundation of production Claude agent systems — one planner, many executors, one integrator
    • Promise.all() is your parallelism primitive — independent subagents should always run concurrently
    • Use Haiku for execution, Sonnet for reasoning — this alone cuts multi-agent costs by 60-80%
    • Fail gracefully — wrap each subagent in try/catch so one failure doesn't collapse the pipeline
    • Validate subagent outputs before passing them downstream — LLMs can return unexpected formats
    • Multi-agent orchestration is a core topic on the Claude Certified Architect (CCA) exam — understanding these patterns deeply will serve you both in production and in certification prep

    Next Steps

    Ready to go deeper?

    Practice what you learned:
    • Implement the pipeline above and test it on a research task in your domain
    • Experiment with different subagent model choices (Haiku vs Sonnet) and measure cost vs quality tradeoffs
    • Add streaming to the integrator step for real-time output

    Prepare for the Claude Certified Architect (CCA) exam:

    The CCA exam tests multi-agent architecture, orchestration patterns, tool use, and cost optimization — exactly what this tutorial covered. Our CCA Practice Test Bank includes 150+ questions with detailed explanations, including a full section on agentic system design.

    Explore related tutorials:

    Ready to Start Practicing?

    300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.

    Free CCA Study Kit

    Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.