Multi-agent orchestration with Claude parallel subagents

Table of Contents

Coordinating multiple AI agents that work in parallel is one of the most powerful patterns you can apply to complex, multi-step problems. Claude Managed Agents gives you a first-class way to do this: a lead orchestrator delegates tasks to specialist subagents, each with their own tools and model, all running concurrently. This guide walks through the architecture, real implementation patterns, and how to monitor everything in the Claude Console.

If you’re new to the Claude API, start with getting started with the Claude API in Python before diving in here.

How Claude Managed Agents orchestration actually works

In Claude’s multi-agent model (available via the Anthropic API as of 2026), you define an orchestrator agent — typically a more capable model like claude-opus-4 — and one or more subagents that it can invoke. The orchestrator decides which subagents to call, with what inputs, and whether to call them sequentially or in parallel.

Each subagent is itself a Claude model instance with:

Its own system prompt (defining its role and constraints)
Its own tool set (file access, web search, code execution, etc.)
An independently configured model (you can use cheaper or faster models for simpler subtasks)

The key API primitive here is the computer_use and tool_use capabilities combined with agent handoffs — the orchestrator calls a subagent as if it were a tool, passes structured input, and receives structured output. Internally, Anthropic routes this through their managed infrastructure, so you get tracing, retries, and cost attribution per subagent automatically.

For a deeper look at how this fits with MCP tooling, see the guide on Claude Managed Agents and MCP in production.

Step 1: Define your orchestrator and subagent roles

Before writing any code, map out who does what. The orchestrator should own the task decomposition and result aggregation. Subagents should be narrow specialists that do one thing well.

Example: Codebase refactoring system

Agent	Model	Responsibility
Orchestrator	`claude-opus-4`	Plan tasks, fan out, merge results
Analysis Agent	`claude-sonnet-4`	Read files, identify refactor targets
Refactor Agent	`claude-sonnet-4`	Apply changes, write updated files
Test Agent	`claude-haiku-4`	Run tests, report pass/fail

Here’s how you define the subagents in Python using the Anthropic SDK:

import anthropic
import asyncio

client = anthropic.Anthropic()

# Subagent definitions — passed to the orchestrator as callable tools
subagent_tools = [
    {
        "name": "analysis_agent",
        "description": "Reads source files and returns a list of refactoring opportunities with file paths and line numbers.",
        "input_schema": {
            "type": "object",
            "properties": {
                "directory": {"type": "string", "description": "Absolute path to the codebase root"},
                "focus": {"type": "string", "description": "Area of concern, e.g. 'dead code', 'duplication'"}
            },
            "required": ["directory"]
        }
    },
    {
        "name": "refactor_agent",
        "description": "Applies a specific refactoring to a file. Returns the rewritten file content.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {"type": "string"},
                "instruction": {"type": "string"}
            },
            "required": ["file_path", "instruction"]
        }
    },
    {
        "name": "test_agent",
        "description": "Runs the test suite for the given directory and returns a structured result.",
        "input_schema": {
            "type": "object",
            "properties": {
                "directory": {"type": "string"},
                "test_command": {"type": "string", "description": "e.g. 'pytest tests/'"}
            },
            "required": ["directory", "test_command"]
        }
    }
]

Step 2: Implement parallel subagent dispatch

The orchestrator calls multiple subagents by emitting multiple tool_use blocks in a single response. You handle those calls concurrently on your side, then return all results in one tool_result message. This is where the parallelism happens — Claude won’t wait for one subagent before calling the next if they’re independent.

async def run_subagent(tool_name: str, tool_input: dict, shared_fs_path: str) -> dict:
    """Dispatch a single subagent call. Each subagent gets its own Claude API call."""
    
    system_prompts = {
        "analysis_agent": (
            "You are a code analysis specialist. Read files from the shared filesystem, "
            "identify refactoring opportunities, and return structured JSON results. "
            f"Shared filesystem root: {shared_fs_path}"
        ),
        "refactor_agent": (
            "You are a refactoring specialist. Apply the given instruction to the file and "
            "write the updated version back to disk. Return a summary of changes made."
        ),
        "test_agent": (
            "You are a test runner. Execute the given test command and return a structured "
            "result with pass/fail counts and any error output."
        )
    }

    response = client.messages.create(
        model="claude-sonnet-4-5",  # cheaper model for subagents
        max_tokens=4096,
        system=system_prompts[tool_name],
        messages=[
            {"role": "user", "content": f"Execute your task with these inputs: {tool_input}"}
        ]
    )
    
    return {
        "tool_use_id": tool_input.get("_tool_use_id"),
        "content": response.content[0].text
    }


async def dispatch_parallel_subagents(tool_calls: list, shared_fs_path: str) -> list:
    """Run all subagent calls concurrently using asyncio."""
    tasks = [
        run_subagent(call["name"], call["input"], shared_fs_path)
        for call in tool_calls
    ]
    return await asyncio.gather(*tasks)

The shared filesystem is critical here. All agents read from and write to the same directory on disk, so the refactor agent can pick up exactly what the analysis agent identified — no serialization gymnastics needed.

This pattern is similar in spirit to running parallel coding agents with Cursor and Claude, just moved into a fully programmatic API-driven loop.

Step 3: Orchestrator agent loop with shared state

Here’s a complete orchestrator loop. It sends the initial task to the orchestrator, handles tool calls by dispatching subagents in parallel, and feeds results back until the orchestrator signals it’s done.

async def run_orchestrator(task: str, shared_fs_path: str):
    messages = [{"role": "user", "content": task}]

    orchestrator_system = """
    You are a lead engineering orchestrator. Break complex tasks into parallel subtasks 
    and delegate to specialist subagents using tools. Call independent subagents in the 
    same response to maximize parallelism. Aggregate subagent results and produce a 
    final summary when all work is complete.
    """

    while True:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=8096,
            system=orchestrator_system,
            tools=subagent_tools,
            messages=messages
        )

        # Append orchestrator response to conversation
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Orchestrator is done — extract final text
            for block in response.content:
                if hasattr(block, "text"):
                    print("Final result:\n", block.text)
            break

        if response.stop_reason == "tool_use":
            tool_calls = [
                {"name": block.name, "input": {**block.input, "_tool_use_id": block.id}, "id": block.id}
                for block in response.content
                if block.type == "tool_use"
            ]

            # Run all subagents in parallel
            results = await dispatch_parallel_subagents(tool_calls, shared_fs_path)

            # Return all results in a single user message
            tool_results = [
                {
                    "type": "tool_result",
                    "tool_use_id": r["tool_use_id"],
                    "content": r["content"]
                }
                for r in results
            ]
            messages.append({"role": "user", "content": tool_results})

Step 4: Customer support pipeline with compliance checking

A second real-world pattern — multi-step customer support with three subagents: one that researches the customer’s account history, one that drafts a response, and one that checks compliance before sending.

support_subagents = [
    {
        "name": "research_agent",
        "description": "Fetches customer account history and recent interactions from the CRM.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string"},
                "query_context": {"type": "string"}
            },
            "required": ["customer_id"]
        }
    },
    {
        "name": "draft_agent",
        "description": "Drafts a personalized support response given customer context and the original query.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_context": {"type": "string"},
                "original_query": {"type": "string"},
                "tone": {"type": "string", "enum": ["formal", "friendly", "technical"]}
            },
            "required": ["customer_context", "original_query"]
        }
    },
    {
        "name": "compliance_agent",
        "description": "Reviews a draft response for regulatory compliance and brand policy. Returns APPROVED or a list of required changes.",
        "input_schema": {
            "type": "object",
            "properties": {
                "draft_text": {"type": "string"},
                "jurisdiction": {"type": "string"}
            },
            "required": ["draft_text"]
        }
    }
]

In this pipeline, research_agent runs first (the orchestrator calls it alone), then draft_agent and compliance_agent run in parallel against the research output. The orchestrator merges the compliance feedback into the final draft only if the compliance agent returns changes.

Step 5: Monitor traces in the Claude Console

Every subagent call, its inputs, outputs, token usage, and latency appear as a nested trace in the Claude Console under Workbench > Agent Traces. You can drill into each span to see exactly what each subagent received and returned.

A few things to watch for in production:

Token attribution: each subagent’s cost is tracked separately — useful for optimizing which tasks can drop to claude-haiku-4
Retry behavior: if a subagent call fails, the managed infrastructure retries with exponential backoff by default — check your trace to distinguish a first-attempt success from a retried one
Parallelism confirmation: the trace view shows subagent calls that started at the same timestamp — if you see sequential calls where you expected parallel, the orchestrator likely created a dependency in its reasoning that you need to break via your system prompt

For automated monitoring in CI pipelines, the same tracing approach applies as in automated AI code review in GitHub pull requests.

If you want to understand the tool_use mechanics that underpin subagent calls, the Claude Tool Use practical guide covers function calling patterns in depth.

Key takeaways

Model selection per agent matters: use claude-opus-4 for orchestration logic and cheaper models (haiku, sonnet) for narrow subtasks — it cuts costs significantly at scale
Shared filesystems are the simplest state layer: rather than passing large blobs through tool results, write to disk and pass paths — subagents pick up where others left off without bloating your context window
Parallel dispatch requires explicit independence: the orchestrator will only call multiple tools in one response if its system prompt and task framing make clear which subtasks don’t depend on each other — be explicit about this in your orchestrator prompt