Multi-agent orchestration with Claude parallel subagents
Table of Contents
Coordinating multiple AI agents that work in parallel is one of the most powerful patterns you can apply to complex, multi-step problems. Claude Managed Agents gives you a first-class way to do this: a lead orchestrator delegates tasks to specialist subagents, each with their own tools and model, all running concurrently. This guide walks through the architecture, real implementation patterns, and how to monitor everything in the Claude Console.
If you’re new to the Claude API, start with getting started with the Claude API in Python before diving in here.
How Claude Managed Agents orchestration actually works
In Claude’s multi-agent model (available via the Anthropic API as of 2026), you define an orchestrator agent — typically a more capable model like claude-opus-4 — and one or more subagents that it can invoke. The orchestrator decides which subagents to call, with what inputs, and whether to call them sequentially or in parallel.
Each subagent is itself a Claude model instance with:
- Its own system prompt (defining its role and constraints)
- Its own tool set (file access, web search, code execution, etc.)
- An independently configured model (you can use cheaper or faster models for simpler subtasks)
The key API primitive here is the computer_use and tool_use capabilities combined with agent handoffs — the orchestrator calls a subagent as if it were a tool, passes structured input, and receives structured output. Internally, Anthropic routes this through their managed infrastructure, so you get tracing, retries, and cost attribution per subagent automatically.
For a deeper look at how this fits with MCP tooling, see the guide on Claude Managed Agents and MCP in production.
Step 1: Define your orchestrator and subagent roles
Before writing any code, map out who does what. The orchestrator should own the task decomposition and result aggregation. Subagents should be narrow specialists that do one thing well.
Example: Codebase refactoring system
| Agent | Model | Responsibility |
|---|---|---|
| Orchestrator | claude-opus-4 | Plan tasks, fan out, merge results |
| Analysis Agent | claude-sonnet-4 | Read files, identify refactor targets |
| Refactor Agent | claude-sonnet-4 | Apply changes, write updated files |
| Test Agent | claude-haiku-4 | Run tests, report pass/fail |
Here’s how you define the subagents in Python using the Anthropic SDK:
import anthropic
import asyncio
client = anthropic.Anthropic()
# Subagent definitions — passed to the orchestrator as callable tools
subagent_tools = [
{
"name": "analysis_agent",
"description": "Reads source files and returns a list of refactoring opportunities with file paths and line numbers.",
"input_schema": {
"type": "object",
"properties": {
"directory": {"type": "string", "description": "Absolute path to the codebase root"},
"focus": {"type": "string", "description": "Area of concern, e.g. 'dead code', 'duplication'"}
},
"required": ["directory"]
}
},
{
"name": "refactor_agent",
"description": "Applies a specific refactoring to a file. Returns the rewritten file content.",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string"},
"instruction": {"type": "string"}
},
"required": ["file_path", "instruction"]
}
},
{
"name": "test_agent",
"description": "Runs the test suite for the given directory and returns a structured result.",
"input_schema": {
"type": "object",
"properties": {
"directory": {"type": "string"},
"test_command": {"type": "string", "description": "e.g. 'pytest tests/'"}
},
"required": ["directory", "test_command"]
}
}
]
Step 2: Implement parallel subagent dispatch
The orchestrator calls multiple subagents by emitting multiple tool_use blocks in a single response. You handle those calls concurrently on your side, then return all results in one tool_result message. This is where the parallelism happens — Claude won’t wait for one subagent before calling the next if they’re independent.
async def run_subagent(tool_name: str, tool_input: dict, shared_fs_path: str) -> dict:
"""Dispatch a single subagent call. Each subagent gets its own Claude API call."""
system_prompts = {
"analysis_agent": (
"You are a code analysis specialist. Read files from the shared filesystem, "
"identify refactoring opportunities, and return structured JSON results. "
f"Shared filesystem root: {shared_fs_path}"
),
"refactor_agent": (
"You are a refactoring specialist. Apply the given instruction to the file and "
"write the updated version back to disk. Return a summary of changes made."
),
"test_agent": (
"You are a test runner. Execute the given test command and return a structured "
"result with pass/fail counts and any error output."
)
}
response = client.messages.create(
model="claude-sonnet-4-5", # cheaper model for subagents
max_tokens=4096,
system=system_prompts[tool_name],
messages=[
{"role": "user", "content": f"Execute your task with these inputs: {tool_input}"}
]
)
return {
"tool_use_id": tool_input.get("_tool_use_id"),
"content": response.content[0].text
}
async def dispatch_parallel_subagents(tool_calls: list, shared_fs_path: str) -> list:
"""Run all subagent calls concurrently using asyncio."""
tasks = [
run_subagent(call["name"], call["input"], shared_fs_path)
for call in tool_calls
]
return await asyncio.gather(*tasks)
The shared filesystem is critical here. All agents read from and write to the same directory on disk, so the refactor agent can pick up exactly what the analysis agent identified — no serialization gymnastics needed.
This pattern is similar in spirit to running parallel coding agents with Cursor and Claude, just moved into a fully programmatic API-driven loop.
Step 3: Orchestrator agent loop with shared state
Here’s a complete orchestrator loop. It sends the initial task to the orchestrator, handles tool calls by dispatching subagents in parallel, and feeds results back until the orchestrator signals it’s done.
async def run_orchestrator(task: str, shared_fs_path: str):
messages = [{"role": "user", "content": task}]
orchestrator_system = """
You are a lead engineering orchestrator. Break complex tasks into parallel subtasks
and delegate to specialist subagents using tools. Call independent subagents in the
same response to maximize parallelism. Aggregate subagent results and produce a
final summary when all work is complete.
"""
while True:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=8096,
system=orchestrator_system,
tools=subagent_tools,
messages=messages
)
# Append orchestrator response to conversation
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
# Orchestrator is done — extract final text
for block in response.content:
if hasattr(block, "text"):
print("Final result:\n", block.text)
break
if response.stop_reason == "tool_use":
tool_calls = [
{"name": block.name, "input": {**block.input, "_tool_use_id": block.id}, "id": block.id}
for block in response.content
if block.type == "tool_use"
]
# Run all subagents in parallel
results = await dispatch_parallel_subagents(tool_calls, shared_fs_path)
# Return all results in a single user message
tool_results = [
{
"type": "tool_result",
"tool_use_id": r["tool_use_id"],
"content": r["content"]
}
for r in results
]
messages.append({"role": "user", "content": tool_results})
Step 4: Customer support pipeline with compliance checking
A second real-world pattern — multi-step customer support with three subagents: one that researches the customer’s account history, one that drafts a response, and one that checks compliance before sending.
support_subagents = [
{
"name": "research_agent",
"description": "Fetches customer account history and recent interactions from the CRM.",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {"type": "string"},
"query_context": {"type": "string"}
},
"required": ["customer_id"]
}
},
{
"name": "draft_agent",
"description": "Drafts a personalized support response given customer context and the original query.",
"input_schema": {
"type": "object",
"properties": {
"customer_context": {"type": "string"},
"original_query": {"type": "string"},
"tone": {"type": "string", "enum": ["formal", "friendly", "technical"]}
},
"required": ["customer_context", "original_query"]
}
},
{
"name": "compliance_agent",
"description": "Reviews a draft response for regulatory compliance and brand policy. Returns APPROVED or a list of required changes.",
"input_schema": {
"type": "object",
"properties": {
"draft_text": {"type": "string"},
"jurisdiction": {"type": "string"}
},
"required": ["draft_text"]
}
}
]
In this pipeline, research_agent runs first (the orchestrator calls it alone), then draft_agent and compliance_agent run in parallel against the research output. The orchestrator merges the compliance feedback into the final draft only if the compliance agent returns changes.
Step 5: Monitor traces in the Claude Console
Every subagent call, its inputs, outputs, token usage, and latency appear as a nested trace in the Claude Console under Workbench > Agent Traces. You can drill into each span to see exactly what each subagent received and returned.
A few things to watch for in production:
- Token attribution: each subagent’s cost is tracked separately — useful for optimizing which tasks can drop to
claude-haiku-4 - Retry behavior: if a subagent call fails, the managed infrastructure retries with exponential backoff by default — check your trace to distinguish a first-attempt success from a retried one
- Parallelism confirmation: the trace view shows subagent calls that started at the same timestamp — if you see sequential calls where you expected parallel, the orchestrator likely created a dependency in its reasoning that you need to break via your system prompt
For automated monitoring in CI pipelines, the same tracing approach applies as in automated AI code review in GitHub pull requests.
If you want to understand the tool_use mechanics that underpin subagent calls, the Claude Tool Use practical guide covers function calling patterns in depth.
Key takeaways
- Model selection per agent matters: use
claude-opus-4for orchestration logic and cheaper models (haiku,sonnet) for narrow subtasks — it cuts costs significantly at scale - Shared filesystems are the simplest state layer: rather than passing large blobs through tool results, write to disk and pass paths — subagents pick up where others left off without bloating your context window
- Parallel dispatch requires explicit independence: the orchestrator will only call multiple tools in one response if its system prompt and task framing make clear which subtasks don’t depend on each other — be explicit about this in your orchestrator prompt