AI / LLM

Orchestrator-Workers: Dynamic Task Decomposition for LLM Agents

March 6, 20266 min readAILLM

Routing dispatches an input to one of a fixed set of handlers. Parallelization fans out to a fixed set of parallel branches. Both patterns share a useful property: the author names the handlers or branches at code-write time. That property becomes a limitation as soon as the number or identity of subtasks depends on the input. A research request that spans three topics needs three workers; a request that spans seven needs seven. Writing a router with a Literal over every possible subtopic is not feasible.

The orchestrator-workers pattern lifts the limitation. A central LLM, the orchestrator, plans the decomposition at runtime using structured output, spawns one worker per subtask, collects their outputs, and synthesizes a final result. Anthropic names this "the most powerful workflow pattern" in its agentic design guide and notes that its own multi-agent research system uses it, with a lead researcher on a larger model coordinating subagents on smaller ones (Anthropic, 2024). The rest of this article describes how the three roles compose and where the pattern tends to fail.

Three roles, one pattern

flowchart TD
    GOAL([User goal]) --> O[Orchestrator LLM]
    O -->|Plan: N subtasks| FAN{Dynamic fan-out}
    FAN --> W1[Worker 1]
    FAN --> W2[Worker 2]
    FAN --> WN[Worker N]
    W1 --> SYN[Synthesizer]
    W2 --> SYN
    WN --> SYN
    SYN --> OUT([Final output])

The orchestrator reads the goal and produces a plan. The plan is a list of subtask descriptions, each with enough detail for a worker to execute it independently. The plan is structured output; each element has a description and any metadata the worker needs.

The workers are LLM calls with their own prompts. Each worker sees one subtask description and returns its result. Workers run in parallel whenever their subtasks are independent, which is most of the time. Each worker has its own context window; they do not see each other's work.

The synthesizer merges the worker outputs into a single response. The synthesizer is usually another LLM call; when the merging logic is simple, a deterministic function is cheaper and more reliable.

The number and identity of workers are decided by the orchestrator at runtime. Nothing about the set of workers is fixed in code.

The orchestrator plans in structured output

The orchestrator is the load-bearing component. Its output is a typed plan, not prose. Bad plans produce bad fan-outs, and bad fan-outs produce expensive, incoherent syntheses. Anthropic specifically reports, from production experience with its research system, that "task descriptions matter. Without detailed subtask descriptions, agents duplicated work, left gaps, or misinterpreted tasks" (Anthropic, 2024).

A good orchestrator prompt names three things: the goal, the constraints on the plan (maximum number of subtasks, required coverage), and the shape of each subtask. A TaskPlan with an analysis field and a list of TaskItem objects is a reasonable default.

Two versions in code

The excerpt below shows the pattern without a framework. The orchestrator plans. Workers run concurrently under asyncio.gather. The synthesizer is a final LLM call that receives the worker outputs.

import asyncio
from openai import AsyncOpenAI, OpenAI
from pydantic import BaseModel, Field

sync_client = OpenAI()
async_client = AsyncOpenAI(timeout=60.0, max_retries=2)

class TaskItem(BaseModel):
    description: str = Field(description="Actionable description of the subtask")

class TaskPlan(BaseModel):
    analysis: str
    tasks: list[TaskItem]

def plan(goal: str) -> TaskPlan:
    r = sync_client.responses.parse(
        model="gpt-4o-mini",
        instructions="Decompose the goal into 2-5 independent subtasks. Be specific.",
        input=goal, text_format=TaskPlan,
    )
    return r.output[0].content[0].parsed

async def worker(subtask: TaskItem) -> str:
    r = await async_client.responses.create(
        model="gpt-4o-mini",
        instructions="Complete the subtask. Return a concise result.",
        input=subtask.description,
    )
    return r.output_text

async def orchestrate(goal: str) -> str:
    p = plan(goal)
    results = await asyncio.gather(*(worker(t) for t in p.tasks))
    bundle = "\n".join(f"- {t.description}\n  {r}" for t, r in zip(p.tasks, results))
    final = sync_client.responses.create(
        model="gpt-4o-mini",
        instructions="Synthesize these subtask results into one coherent response.",
        input=f"Goal: {goal}\nPlan: {p.analysis}\nResults:\n{bundle}",
    )
    return final.output_text

The LangGraph version uses the Send API to emit one worker invocation per subtask. The number of workers is decided at runtime by the plan; each worker runs in isolation.

from typing import TypedDict, Annotated
from operator import add
from langgraph.graph import StateGraph, START, END
from langgraph.types import Send
from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-4o-mini")

class State(TypedDict):
    goal: str
    plan: TaskPlan
    results: Annotated[list[str], add]
    final: str

def plan_node(s: State) -> State:
    return {**s, "plan": model.with_structured_output(TaskPlan).invoke(s["goal"])}

def fan_out(s: State):
    return [Send("worker", {"description": t.description}) for t in s["plan"].tasks]

def worker_node(task: dict) -> dict:
    return {"results": [model.invoke(f"Complete: {task['description']}").content]}

def synthesize_node(s: State) -> State:
    bundle = "\n".join(f"- {r}" for r in s["results"])
    return {**s, "final": model.invoke(f"Synthesize: {bundle}").content}

graph = (StateGraph(State)
         .add_node("plan", plan_node).add_node("worker", worker_node)
         .add_node("synthesize", synthesize_node)
         .add_edge(START, "plan")
         .add_conditional_edges("plan", fan_out, ["worker"])
         .add_edge("worker", "synthesize").add_edge("synthesize", END)
         .compile())

Full runnable versions live at github.com/subodhjena/agentic-patterns under examples/11_orchestrator_workers.py and examples/11_orchestrator_workers_langgraph.py.

Where the pattern tends to fail

Orchestrator-workers has more moving parts than routing or parallelization. Each part has its own failure mode.

Vague subtasks. The orchestrator produces subtasks like "research the topic" or "analyze the input." Workers interpret them inconsistently, duplicate work, and leave gaps. Tighten the orchestrator prompt and the TaskItem schema. Require a concrete description and, where appropriate, a required output shape per subtask.

Dependent subtasks treated as independent. The orchestrator emits subtasks that implicitly rely on each other ("search for articles" and "summarize the articles"). Workers run in parallel, but the second worker has no input. Either serialize the chain (prompt chaining) or structure the plan as a DAG the runtime can honor.

Runaway fan-out. The orchestrator produces ten subtasks when three would suffice. Cost explodes, synthesis becomes harder, and quality drops. Cap the plan size in the schema or in the prompt.

Synthesizer overload. The synthesizer receives too much text and produces a shallow summary. Either trim the worker outputs with structured return types, or run the synthesis as a second-stage decomposition rather than a single call.

Workers that duplicate the orchestrator's job. A worker that replans its own subtask is a signal that the orchestrator is under-specifying. Fix the orchestrator; do not give workers orchestrator latitude.

No fallback on plan failure. A malformed plan should surface as an error, not a silent misroute. Treat the orchestrator output as untrusted input and validate it before spawning workers.

When to choose this over routing or parallelization

The pattern lives between routing (single specialist) and full agents (dynamic loop). The choice matters.

Property	Routing	Parallelization	Orchestrator-workers
Who decides the set of subtasks	Author, at compile time	Author, at compile time	Orchestrator LLM, at runtime
Number of downstream calls	One	Fixed N	Variable N per input
Subtasks are	Mutually exclusive handlers	Fixed aspects or replicas	Input-dependent decomposition
Synthesis	None	Deterministic merge	LLM call or deterministic merge
Cost profile	One extra call (classifier)	N parallel calls	Orchestrator + N workers + synthesizer

Reach for orchestrator-workers when the subtasks genuinely depend on the input. If the same decomposition would work for every input, it is not a decomposition; it is a fan-out, and the right pattern is parallelization. If the decomposition never changes in shape, it is not a decomposition; it is a chain, and the right pattern is prompt chaining.

Neighbors in the series

Parallelization, the previous article, is the pattern orchestrator-workers generalizes. Prompt chaining is the pattern it extends when subtasks become dependent. The Supervisor and Router pattern, covered in the Multi-Agent stage, is orchestrator-workers with agents rather than LLM calls as workers. Plan-and-execute, in the Agents stage, looks similar but differs in two ways: the planner works with a persistent state across steps, and the workers are usually tool calls rather than independent LLM subtasks.

References

Anthropic. Building effective agents. December 2024.
Anthropic. How we built our multi-agent research system. 2024.
LangChain. Map-reduce and the Send API in LangGraph. 2024.
Wu, Qingyun, et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. 2023.
Chen, Weize, et al. AgentVerse: Facilitating Multi-Agent Collaboration. 2023.

agentic-patterns orchestrator-workers workflows decomposition ai llm

← Back to all posts