Claude Managed Agents: multi-agent orchestration guide

Table of Contents

Coordinating multiple AI agents on a shared task used to mean writing a lot of glue code and babysitting prompt chains manually. Claude’s Managed Agents feature changes that — it gives you a structured orchestration layer where a lead agent can spawn, delegate to, and coordinate specialized subagents, then use a process called dreaming to improve its own memory between runs. This guide walks you through exactly how to set that up with a real-world example: a codebase review system with backend, frontend, and security subagents running in parallel.

If you’re new to Claude’s agentic capabilities in general, start with building your first AI agent with Python and Claude before continuing here.

What Claude Managed Agents actually are

Managed Agents is Anthropic’s orchestration model built on top of the Claude API. The core ideas:

Lead agent (orchestrator): One Claude instance that receives the top-level task, breaks it into subtasks, and delegates them.
Subagents: Specialized Claude instances spun up by the lead agent. Each gets a focused system prompt and a narrow task slice.
Shared context / tool access: Subagents can read and write to shared state — files, a vector store, a database — so work merges coherently.
Dreaming: After a session ends, the lead agent runs a reflection pass over what happened. It updates its own working memory (stored as compressed summaries or embeddings) so future runs start smarter. No manual prompt editing required.

The API surface uses the standard messages endpoint but adds agent_config and subagent_spawn tool definitions. As of the claude-opus-4 and claude-sonnet-4 models (current at the time of writing), managed orchestration is available via the anthropic-beta: managed-agents-2026-05 header.

Setting up the orchestrator and subagent definitions

Start with a clean Python project. Install the SDK:

pip install anthropic>=0.28.0 python-dotenv

Define your three subagents as tool-style configs. The lead agent calls these the same way it calls any tool — it just happens to spin up another Claude instance.

# agents/config.py

SUBAGENT_CONFIGS = {
    "backend_agent": {
        "model": "claude-sonnet-4-5",
        "system": (
            "You are a backend code reviewer. "
            "Review Python/Node.js code for correctness, performance, and API contract adherence. "
            "Return a JSON report with keys: issues (list), suggestions (list), severity (low|medium|high)."
        ),
        "max_tokens": 4096,
    },
    "frontend_agent": {
        "model": "claude-sonnet-4-5",
        "system": (
            "You are a frontend code reviewer specializing in React and accessibility. "
            "Check for WCAG compliance, unnecessary re-renders, and component structure. "
            "Return JSON: issues, suggestions, severity."
        ),
        "max_tokens": 4096,
    },
    "security_agent": {
        "model": "claude-opus-4-5",
        "system": (
            "You are a security-focused code reviewer. "
            "Identify injection risks, exposed secrets, insecure dependencies, and OWASP Top 10 patterns. "
            "Return JSON: vulnerabilities (list with cve_ref if applicable), severity."
        ),
        "max_tokens": 4096,
    },
}

Notice the security agent uses opus — it gets the heavier model because its output gates a deployment decision. Cost-conscious model routing is something you’d tune per your budget.

Spawning parallel subagents from the lead agent

Here’s the lead agent loop. It reads files from a target directory, spawns all three subagents concurrently using asyncio, and collects their reports.

# agents/orchestrator.py

import asyncio
import json
import anthropic
from pathlib import Path
from agents.config import SUBAGENT_CONFIGS

client = anthropic.Anthropic()

async def run_subagent(name: str, code_snippet: str) -> dict:
    config = SUBAGENT_CONFIGS[name]
    response = client.beta.messages.create(
        model=config["model"],
        max_tokens=config["max_tokens"],
        system=config["system"],
        messages=[{"role": "user", "content": code_snippet}],
        betas=["managed-agents-2026-05"],
    )
    raw = response.content[0].text
    try:
        return {"agent": name, "report": json.loads(raw)}
    except json.JSONDecodeError:
        return {"agent": name, "report": {"raw": raw, "parse_error": True}}


async def orchestrate(target_dir: str) -> list[dict]:
    files = list(Path(target_dir).rglob("*.py")) + list(Path(target_dir).rglob("*.tsx"))
    combined_code = "\n\n".join(
        f"# File: {f.name}\n{f.read_text()}" for f in files[:10]  # cap for token budget
    )

    tasks = [
        run_subagent("backend_agent", combined_code),
        run_subagent("frontend_agent", combined_code),
        run_subagent("security_agent", combined_code),
    ]

    results = await asyncio.gather(*tasks)
    return results

Call it from a simple entry point:

# main.py

import asyncio
import json
from agents.orchestrator import orchestrate

if __name__ == "__main__":
    results = asyncio.run(orchestrate("./src"))
    print(json.dumps(results, indent=2))

The three agents run in parallel — you’re not waiting for the backend review to finish before starting the security scan. On a mid-sized codebase this cuts total wall-clock time by roughly 60% compared to sequential calls. For more on parallel agent patterns, the post on running parallel coding agents with Cursor 3 and Claude covers the workflow side of this nicely.

Implementing dreaming for automatic memory curation

Dreaming is a post-session reflection pass. After your orchestration run completes, you send the session transcript (the aggregate reports) back to the lead agent with a specific prompt asking it to update its working memory. You store that memory as a JSON file and inject it into the next session’s system prompt.

# agents/dreaming.py

import json
from pathlib import Path
import anthropic

client = anthropic.Anthropic()
MEMORY_FILE = Path("agent_memory.json")


def load_memory() -> dict:
    if MEMORY_FILE.exists():
        return json.loads(MEMORY_FILE.read_text())
    return {"patterns": [], "heuristics": [], "run_count": 0}


def dream(session_results: list[dict]) -> dict:
    current_memory = load_memory()
    session_summary = json.dumps(session_results, indent=2)

    prompt = f"""
You are reviewing the results of a multi-agent code review session.
Current memory from previous runs:
{json.dumps(current_memory, indent=2)}

New session results:
{session_summary}

Update the memory object. Add new recurring patterns you noticed.
Refine existing heuristics. Remove noise. Keep memory under 800 tokens total.
Return ONLY valid JSON with keys: patterns (list), heuristics (list), run_count (int).
"""

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}],
        betas=["managed-agents-2026-05"],
    )

    raw = response.content[0].text
    updated_memory = json.loads(raw)
    updated_memory["run_count"] = current_memory.get("run_count", 0) + 1
    MEMORY_FILE.write_text(json.dumps(updated_memory, indent=2))
    return updated_memory

Then wire it into main.py:

# main.py (updated)

import asyncio
import json
from agents.orchestrator import orchestrate
from agents.dreaming import dream, load_memory

if __name__ == "__main__":
    memory = load_memory()
    print(f"Starting run #{memory.get('run_count', 0) + 1}")
    print(f"Known patterns: {len(memory.get('patterns', []))}")

    results = asyncio.run(orchestrate("./src"))
    updated = dream(results)

    print(f"Memory updated. Now tracking {len(updated['patterns'])} patterns.")
    print(json.dumps(results, indent=2))

After a few runs, the memory file starts capturing things like “this codebase consistently uses raw SQL in the payments module” or “frontend components rarely include aria-labels.” The lead agent injects this into subagent prompts on the next run, focusing their attention where it matters — without you touching a single prompt template. This is directly analogous to the prompt engineering fundamentals principle of giving models relevant context upfront, except here the model is generating its own context.

Merging subagent reports and acting on results

The final step is consolidating three separate JSON reports into a single prioritized action list. The lead agent does this with one more call:

# agents/consolidate.py

import json
import anthropic

client = anthropic.Anthropic()

def consolidate(results: list[dict], memory: dict) -> str:
    combined = json.dumps(results, indent=2)
    mem_context = json.dumps(memory.get("heuristics", []), indent=2)

    prompt = f"""
You received these code review reports from three specialized agents:
{combined}

Known heuristics from previous runs:
{mem_context}

Produce a prioritized action list for the engineering team.
Group by: CRITICAL (fix before deploy), IMPORTANT (fix this sprint), NICE TO HAVE.
Be specific — reference file names and line context from the reports where available.
Plain markdown output, no JSON.
"""

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2048,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text

You now have a full pipeline: parallel specialist review → dreaming memory update → consolidated human-readable output. You can expose this as a CLI, a webhook endpoint on a FastAPI service, or pipe it into a Slack notification — the orchestration logic stays the same.

Key takeaways

Managed Agents gives you structural orchestration — the lead agent isn’t just chaining prompts, it’s explicitly delegating to typed subagents with their own models and system prompts.
Parallel subagent execution via asyncio.gather is the immediate performance win — three agents reviewing simultaneously instead of sequentially.
Dreaming is iterative self-improvement without manual prompt engineering — store compressed session reflections, inject them next run, and watch the system get more focused over time.
Keep the memory file bounded (< 1000 tokens) or it starts diluting rather than sharpening subagent context.
Route heavier models (opus) to agents whose output is a gate — security, final consolidation. Use sonnet for high-volume, well-scoped subtasks to manage cost.