Claude Multi-Agent Orchestration: Build Parallel AI Pipelines (2026 Tutorial)
Learn to build multi-agent AI systems with Claude using the orchestrator-subagent pattern. Step-by-step tutorial with code, cost tips, and real-world examples.
How to Build Multi-Agent AI Systems with Claude: Orchestrator + Subagent Pattern
You want Claude to do more than answer a single question. You want it to research a topic, write a report, validate the output, and format it for publishing — all in one run, without you babysitting it.
That's multi-agent orchestration. And it's the architecture behind every serious Claude-powered product in production today.
This tutorial walks you through the core pattern: one orchestrator agent that breaks down a task and delegates, plus multiple subagents that execute focused subtasks in parallel. You'll get working TypeScript code, cost management strategies, and a real-world example you can adapt immediately.
Why Multi-Agent? The Problem With Single-Prompt AI
A single Claude call has hard limits:
- Context window ceiling — Complex tasks need more information than fits in one prompt
- No parallelism — Sequential reasoning is slow when subtasks are independent
- No specialization — A general prompt can't be simultaneously a researcher, a critic, and a formatter
- No checkpointing — If Claude goes off-track in a 10,000-token chain-of-thought, you waste the whole call
Multi-agent systems solve all four. The orchestrator handles coordination logic; subagents handle execution. Each subagent gets a tight, focused system prompt that makes it excellent at one thing. Subagents that don't depend on each other run in parallel, collapsing wall-clock time dramatically.
The pattern mirrors how software teams work: a tech lead breaks work into tickets, engineers execute in parallel, and the lead reviews and integrates. Claude's API lets you implement exactly this.
The Orchestrator-Subagent Pattern Explained
The pattern has three moving parts:
User Request
│
▼
┌─────────────┐
│ Orchestrator │ ← Breaks task into subtasks, manages state
└──────┬──────┘
│ spawns
┌────┴────┐
▼ ▼
[Agent A] [Agent B] ← Subagents: one job each, run in parallel
│ │
└────┬────┘
▼
[Integration] ← Orchestrator merges results
│
▼
Final OutputEach subagent is just a Claude API call with:
The orchestrator is also a Claude call — but its job is planning and synthesis, not execution.
Setting Up: Dependencies and API Client
typescriptimport Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// Reusable helper for a single agent call
async function runAgent(
systemPrompt: string,
userMessage: string,
model: string = "claude-haiku-4-5-20251001" // use Haiku for subagents to save cost
): Promise<string> {
const response = await client.messages.create({
model,
max_tokens: 2048,
system: systemPrompt,
messages: [{ role: "user", content: userMessage }],
});
const content = response.content[0];
if (content.type !== "text") throw new Error("Unexpected response type");
return content.text;
}claude-haiku-4-5-20251001 for subagents handling focused, structured tasks. Reserve claude-sonnet-4-6 for the orchestrator (complex reasoning) and only reach for claude-opus-4-6 if synthesis quality is critical.
Step 1: The Orchestrator Agent
The orchestrator receives the user's goal and produces a structured task plan — a JSON array of subtask definitions that it will dispatch to subagents.
typescriptconst ORCHESTRATOR_SYSTEM = `You are a task orchestration agent.
Given a high-level goal, break it into 2-5 independent subtasks that can be executed in parallel.
Return ONLY a JSON array with this structure:
[
{
"id": "task_1",
"role": "one-sentence description of this subagent's job",
"systemPrompt": "detailed system prompt for this subagent",
"input": "the specific input this subagent should process"
}
]
Rules:
- Each subtask must be completable independently (no dependencies on other subtasks)
- Subtask inputs must be self-contained — don't reference 'the other agents'
- system prompts should be specific and role-focused`;
async function orchestrate(goal: string): Promise<SubTask[]> {
const response = await runAgent(
ORCHESTRATOR_SYSTEM,
`Goal: ${goal}`,
"claude-sonnet-4-6" // orchestrator needs stronger reasoning
);
// Parse the JSON plan
const jsonMatch = response.match(/\[[\s\S]*\]/);
if (!jsonMatch) throw new Error("Orchestrator did not return valid JSON");
return JSON.parse(jsonMatch[0]) as SubTask[];
}
interface SubTask {
id: string;
role: string;
systemPrompt: string;
input: string;
}Step 2: Running Subagents in Parallel
This is where the speed win comes from. Use Promise.all() to fire all subagent calls simultaneously:
typescriptinterface SubTaskResult {
id: string;
role: string;
output: string;
error?: string;
}
async function runSubagents(subtasks: SubTask[]): Promise<SubTaskResult[]> {
const results = await Promise.all(
subtasks.map(async (task): Promise<SubTaskResult> => {
try {
const output = await runAgent(task.systemPrompt, task.input);
return { id: task.id, role: task.role, output };
} catch (err) {
// Don't let one failed subagent kill the whole pipeline
const error = err instanceof Error ? err.message : "Unknown error";
return { id: task.id, role: task.role, output: "", error };
}
})
);
return results;
}The try/catch per subagent is critical. If one subagent fails (rate limit, bad output, timeout), you want the other results to survive. The integrator can handle partial results gracefully.
Step 3: The Integration Agent
Once subagents finish, a final Claude call synthesizes all outputs into a coherent result:
typescriptconst INTEGRATOR_SYSTEM = `You are a synthesis agent. You receive outputs from multiple specialized agents
and integrate them into a single, coherent, well-structured response.
Do not simply concatenate. Find connections, resolve contradictions, and produce a unified whole.
Preserve factual specifics from each agent's output.`;
async function integrate(
goal: string,
results: SubTaskResult[]
): Promise<string> {
const validResults = results.filter((r) => !r.error && r.output);
if (validResults.length === 0) {
throw new Error("All subagents failed — cannot integrate");
}
const agentOutputs = validResults
.map((r) => `## ${r.role}\n${r.output}`)
.join("\n\n");
const prompt = `Original goal: ${goal}\n\nAgent outputs to integrate:\n\n${agentOutputs}`;
return runAgent(INTEGRATOR_SYSTEM, prompt, "claude-sonnet-4-6");
}Step 4: Wiring It Together
typescriptasync function runMultiAgentPipeline(goal: string): Promise<string> {
console.log("🎯 Orchestrating task:", goal);
// 1. Plan
const subtasks = await orchestrate(goal);
console.log(`📋 Planned ${subtasks.length} subtasks`);
subtasks.forEach((t) => console.log(` - ${t.id}: ${t.role}`));
// 2. Execute in parallel
console.log("⚡ Running subagents in parallel...");
const startTime = Date.now();
const results = await runSubagents(subtasks);
const elapsed = ((Date.now() - startTime) / 1000).toFixed(1);
const succeeded = results.filter((r) => !r.error).length;
console.log(`✅ ${succeeded}/${subtasks.length} subagents completed in ${elapsed}s`);
// 3. Integrate
console.log("🔀 Integrating results...");
const finalOutput = await integrate(goal, results);
return finalOutput;
}
// Example usage
const result = await runMultiAgentPipeline(
"Write a competitive analysis of Claude Code vs GitHub Copilot vs Cursor for professional developers"
);
console.log(result);Real-World Example: Content Research Pipeline
Here's the pattern applied to a concrete use case — the AiA content pipeline that generates research-backed blog articles:
typescript// Specialized subagent system prompts for content creation
const RESEARCH_AGENT_SYSTEM = `You are a research analyst. Given a topic, identify:
1. The 5 most important facts/statistics (with approximate sources)
2. The 3 main pain points the target audience has with this topic
3. The key terminology and concepts someone needs to understand
Return as structured markdown.`;
const OUTLINE_AGENT_SYSTEM = `You are a content strategist. Given a topic and keywords,
create a detailed article outline with:
- H1 title optimized for the primary keyword
- H2 sections with clear learning objectives
- Key points to cover in each section
- Suggested code examples or tables
Return as structured markdown.`;
const COMPETITOR_AGENT_SYSTEM = `You are an SEO analyst. Given a topic,
describe what angle and depth a strong-performing article on this topic would have.
Consider: search intent, what developers actually want to learn, common misconceptions to address.
Return 3-5 specific recommendations.`;
async function researchPipeline(topic: string, keywords: string[]) {
const goal = `Create comprehensive research for an article on: "${topic}". Target keywords: ${keywords.join(", ")}`;
// These three can run in parallel — they don't depend on each other
const [research, outline, seoAngle] = await Promise.all([
runAgent(RESEARCH_AGENT_SYSTEM, topic),
runAgent(OUTLINE_AGENT_SYSTEM, `Topic: ${topic}\nKeywords: ${keywords.join(", ")}`),
runAgent(COMPETITOR_AGENT_SYSTEM, topic),
]);
return { research, outline, seoAngle };
}This approach cuts the pipeline from ~45 seconds (sequential) to ~15 seconds (parallel) on typical content tasks.
Handling Errors and Retries
Production multi-agent systems need retry logic. Claude's API occasionally returns rate limit errors (429) or transient failures:
typescriptasync function runAgentWithRetry(
systemPrompt: string,
userMessage: string,
maxRetries: number = 3
): Promise<string> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await runAgent(systemPrompt, userMessage);
} catch (err) {
const isRateLimit =
err instanceof Error && err.message.includes("rate_limit");
const isLastAttempt = attempt === maxRetries;
if (isLastAttempt) throw err;
// Exponential backoff: 1s, 2s, 4s
const waitMs = Math.pow(2, attempt - 1) * 1000;
console.warn(`Attempt ${attempt} failed. Retrying in ${waitMs}ms...`);
await new Promise((resolve) => setTimeout(resolve, waitMs));
}
}
throw new Error("Unreachable");
}Cost Management for Multi-Agent Systems
Multi-agent systems multiply your API calls. Here's the cost model to keep in mind:
| Agent Role | Recommended Model | Typical Tokens | Cost/Call |
|---|---|---|---|
| Orchestrator (planner) | claude-sonnet-4-6 | ~1,500 in + 500 out | ~$0.004 |
| Subagent (execution) | claude-haiku-4-5 | ~800 in + 600 out | ~$0.0003 |
| Integrator (synthesis) | claude-sonnet-4-6 | ~3,000 in + 1,000 out | ~$0.013 |
| Total (4 subagents) | Mixed | — | ~$0.019 |
Common Mistakes to Avoid
Mistake 1: Circular dependencies between subagentsIf Subagent B needs Subagent A's output, they can't run in parallel. Design subtasks to be truly independent, or chain them sequentially in separate phases.
Mistake 2: Passing the full context to every subagentEach subagent should only receive the information it needs. Bloated contexts waste tokens and confuse specialized agents.
Mistake 3: No output validationSubagent outputs are LLM responses — they can be malformed JSON, off-format, or hallucinated. Always validate before passing to the integrator.
typescriptfunction validateSubagentOutput(output: string, expectedFormat: "json" | "markdown" | "text"): boolean {
if (expectedFormat === "json") {
try {
JSON.parse(output);
return true;
} catch {
return false;
}
}
return output.length > 50; // basic sanity check for text/markdown
}Launching 20 subagents at once will hit rate limits. Cap parallel calls to 5-10, or use a concurrency limiter like p-limit.
typescriptimport pLimit from "p-limit";
const limit = pLimit(5); // max 5 concurrent calls
const results = await Promise.all(
subtasks.map((task) =>
limit(() => runAgent(task.systemPrompt, task.input))
)
);When to Use Multi-Agent vs. Single-Call
Multi-agent isn't always the answer. Use this decision framework:
| Situation | Approach |
|---|---|
| Simple Q&A, single-step generation | Single call |
| Task has 2+ independent subtasks | Multi-agent (parallel) |
| Task requires > 100K tokens of context | Multi-agent (split context) |
| Need expert specialization (researcher + writer + critic) | Multi-agent (specialized system prompts) |
| Real-time response required (< 2s) | Single call or pre-computed |
| Cost-sensitive, simple outputs | Single call with Haiku |
Key Takeaways
- The orchestrator-subagent pattern is the foundation of production Claude agent systems — one planner, many executors, one integrator
Promise.all()is your parallelism primitive — independent subagents should always run concurrently- Use Haiku for execution, Sonnet for reasoning — this alone cuts multi-agent costs by 60-80%
- Fail gracefully — wrap each subagent in try/catch so one failure doesn't collapse the pipeline
- Validate subagent outputs before passing them downstream — LLMs can return unexpected formats
- Multi-agent orchestration is a core topic on the Claude Certified Architect (CCA) exam — understanding these patterns deeply will serve you both in production and in certification prep
Next Steps
Ready to go deeper?
Practice what you learned:- Implement the pipeline above and test it on a research task in your domain
- Experiment with different subagent model choices (Haiku vs Sonnet) and measure cost vs quality tradeoffs
- Add streaming to the integrator step for real-time output
The CCA exam tests multi-agent architecture, orchestration patterns, tool use, and cost optimization — exactly what this tutorial covered. Our CCA Practice Test Bank includes 150+ questions with detailed explanations, including a full section on agentic system design.
Explore related tutorials:- How to Build Your First Claude Agent — single-agent foundations
- Claude Tool Use: Function Calling Complete Guide — adding tools to your agents
- Best MCP Servers for Claude Code 2026 — extend your agents with pre-built tools
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.