AI / LLM
Plan-and-Execute: Two-Phase Agents That Plan First, Then Act
A ReAct agent plans and acts in the same breath. Each turn produces a brief thought, then a tool call, then an observation, then another thought, and so on. The pattern works and ships widely, but it has a specific cost: planning is re-derived at every step, and the model replans even when the plan has not actually changed. For tasks with clear structure, where the steps are genuinely knowable in advance, the constant replanning is overhead.
Plan-and-execute separates the two phases. A planner LLM produces a complete, step-by-step plan up front. An executor, typically a smaller ReAct-style agent, works through the plan one step at a time. Optionally, a replanner checks after each step whether the remaining plan still makes sense and updates it when it does not. The pattern was popularized by Wang and colleagues in 2023 as a way to reduce token cost on long-horizon tasks without giving up the self-correction ReAct is known for (Wang et al., 2023). Anthropic lists it as one of three canonical agent patterns, alongside ReAct and the tool-calling loop.
Two phases
The planner's job is to look at the goal and produce a plan that an executor can follow. The plan is structured: a list of concrete steps, each with enough context to stand alone. The planner runs once per task; its cost is higher per call than an executor step, because it uses more context and often a stronger model, but it runs only at the boundaries of execution.
The executor's job is to carry out a single step of the plan at a time, using the tool-calling loop covered in the previous article. The executor knows nothing about steps further down the plan; it sees only the current step and the accumulated state. That isolation is deliberate. It keeps each step's window focused and forces the planner to specify the step well enough to stand alone.
The optional replanner inspects the state after each step. If the observed result matches expectation, the plan continues unchanged. If something surprising happened, the replanner revises the remaining plan. The replanner is usually a copy of the planner with a different prompt.
The shape
flowchart TD
GOAL([User goal]) --> P[Planner LLM]
P --> PLAN[Plan: step 1, step 2, step 3]
PLAN --> E[Executor agent]
E -->|step result| S{All steps done?}
S -->|no, plan holds| E
S -->|no, surprise| RP[Replanner]
RP --> PLAN
S -->|yes| OUT([Final output])
The inner loop goes through the plan; the outer loop runs only when a surprise demands a replan. A run in which every step succeeds as predicted terminates after one planner call, N executor calls, and one synthesis step. A run with several surprises pays for replans proportionally.
Two versions in code
The excerpt below is the pattern without a framework. The planner returns a typed plan; the executor uses a simple tool-calling loop on one step at a time; after each step, an optional check decides whether to replan. Replanning is kept to a minimum for clarity.
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class PlanStep(BaseModel):
description: str
class Plan(BaseModel):
steps: list[PlanStep]
def make_plan(goal: str) -> Plan:
r = client.responses.parse(
model="gpt-4o-mini",
instructions="Produce a step-by-step plan. Each step standalone and concrete.",
input=goal, text_format=Plan,
)
return r.output[0].content[0].parsed
def execute_step(step: str, state: str, tools: list, tools_impl: dict) -> str:
messages = [
{"role": "system", "content": "Execute the current step. Use tools when needed."},
{"role": "user", "content": f"State so far:\n{state}\n\nCurrent step: {step}"},
]
r = client.chat.completions.create(
model="gpt-4o-mini", messages=messages, tools=tools)
msg = r.choices[0].message
if not msg.tool_calls:
return msg.content
# Minimal one-shot tool execution; a full executor would loop here.
call = msg.tool_calls[0]
return tools_impl[call.function.name](**__import__('json').loads(call.function.arguments))
def run(goal: str, tools: list, tools_impl: dict) -> str:
plan = make_plan(goal)
state = ""
for step in plan.steps:
result = execute_step(step.description, state, tools, tools_impl)
state += f"\nStep: {step.description}\nResult: {result}"
return state
The LangGraph version separates the planner and executor as nodes. The executor is itself a prebuilt ReAct agent; the planner and replanner are model calls with structured output. A conditional edge after each step decides whether to return to the planner or continue.
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import create_react_agent
from langchain.chat_models import init_chat_model
model = init_chat_model("gpt-4o-mini")
executor = create_react_agent(model=model, tools=[search])
class State(TypedDict):
goal: str; plan: Plan; state: str; step_idx: int
def plan_node(s: State) -> State:
return {**s, "plan": model.with_structured_output(Plan).invoke(
f"Plan step-by-step for: {s['goal']}"), "step_idx": 0, "state": ""}
def execute_node(s: State) -> State:
step = s["plan"].steps[s["step_idx"]].description
out = executor.invoke({"messages": [("user", f"State: {s['state']}\nStep: {step}")]})
tail = out["messages"][-1].content
return {**s, "state": s["state"] + f"\n- {step}: {tail}",
"step_idx": s["step_idx"] + 1}
def cont(s: State) -> str:
return "done" if s["step_idx"] >= len(s["plan"].steps) else "execute"
graph = (StateGraph(State)
.add_node("plan", plan_node).add_node("execute", execute_node)
.add_edge(START, "plan").add_edge("plan", "execute")
.add_conditional_edges("execute", cont,
{"execute": "execute", "done": END})
.compile())
Full runnable versions live at github.com/subodhjena/agentic-patterns under examples/09_plan_and_execute.py and examples/09_plan_and_execute_langgraph.py.
When the plan must adapt
A plan made before acting is a plan made without knowing what the first action will return. The gap between expected and observed results is unavoidable, and a pattern that cannot close it is brittle. Plan-and-execute handles this in two ways, and teams pick the one that matches their latency and cost budget.
Static plans with retry. The plan is made once. If a step fails, the executor retries with a modified prompt but does not change the plan. This is the cheapest variant and works when surprises are rare and local.
Replanning. After each step, a replanner reads the current state and either keeps the remaining plan or rewrites it. Replanning is more robust but doubles the planner's call count in the worst case. The variant to adopt depends on how often reality disagrees with the initial plan; measure before defaulting.
A third, middle variant keeps the plan but appends new steps at the end when surprises emerge. It avoids the full-replanner cost but lets the plan grow to absorb new information.
Where the pattern tends to fail
Plan-and-execute is less forgiving than ReAct on some input types. The failure modes below are the ones that matter in practice.
Plans written at the wrong altitude. A plan at the "do everything" level is useless; a plan that enumerates every tool call is brittle. Tune the planner prompt toward five to ten concrete steps; more than that is a signal to use a hierarchical planner.
Executor outgrowing its step. An executor that spawns its own multi-step reasoning inside one plan step defeats the separation. Keep executor budgets tight (three to five tool calls per step). If a step needs more, the plan was wrong.
State bloat. Each step appends to a running state that the executor sees on the next step. Without pruning, the state grows beyond the window. Summarize older steps or retain only the results the planner flagged as relevant.
Plans with hidden dependencies. A plan that is really a DAG, where step three depends on the output of step two in ways the executor cannot infer from the text, requires structured state passing rather than free-text accumulation.
Fake determinism. Treating the plan as immutable when the task actually requires adaptation produces confident failure. The symptom is an agent that completes all its steps but returns a poor answer. Add replanning.
Planner errors propagate. A bad plan does not self-correct without replanning. A step that should have happened first is not retroactively added. For high-stakes tasks, replanning is not optional.
Trade against ReAct
The agent patterns occupy a spectrum from fully interleaved reasoning and action (ReAct) to fully separated phases (plan-and-execute). The axes below describe the trade.
| Axis | ReAct | Plan-and-execute |
|---|---|---|
| Planning frequency | Every turn | Once, plus replans |
| Token efficiency | Lower; replans implicitly | Higher; plans once |
| Best-fit inputs | Exploratory, unknown-depth | Clear structure, predictable tool use |
| Error recovery | Immediate, every turn | Requires explicit replan |
| Observability | Per-turn thoughts | Per-step plan plus per-step trace |
| Executor complexity | Single loop | Inner loop per step |
For tasks where the steps are clearly enumerable from the goal, plan-and-execute is usually more efficient. For tasks where the path is discovered as the agent acts, ReAct remains the better default.
Neighbors in the series
ReAct, the previous article, is the pattern plan-and-execute separates. The tool-calling loop, also previous, is what the executor inside plan-and-execute uses per step. Orchestrator-workers is a non-iterative cousin: the orchestrator plans, workers run in parallel, and the synthesizer produces the final answer. Tree-of-thought, covered later in the Reasoning stage, extends planning to a search over alternative plans rather than a single plan. Hierarchical teams, in the Multi-Agent stage, extends plan-and-execute to nested planners.
References
- Wang, Lei, et al. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. ACL 2023.
- Anthropic. Building effective agents. December 2024.
- LangChain. Plan-and-execute agents in LangGraph. 2024.
- Yao, Shunyu, et al. ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
- Liu, Bo, et al. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. 2023.