AI / LLM

Choosing the Right Agentic Pattern: A Decision Framework

May 15, 20267 min readAILLM

This is the last article in the Agentic Patterns series. The preceding thirty-three articles covered individual patterns: how each works, when it fits, where it fails, what it costs. This article collapses them into the decision a practitioner actually faces. A task arrives. Which pattern should the team reach for first?

The guidance that follows is drawn from the same primary sources this series has referenced throughout: Anthropic's engineering writeups, OpenAI's agent guides, Google's orchestration documentation, the Microsoft MagenticOne paper, the DeepMind scaling study, and the academic literature on specific patterns. The framework is opinionated. The opinion is the same one that has run through the whole series: the default should be the simplest thing that could possibly work, and complexity should be added deliberately, not reflexively.

Four decision trees

The Anthropic agent-building guide and the agentic-patterns reference each organize pattern choice as a sequence of questions. The four trees below reproduce that structure, each covering one family of decisions.

Workflow versus agent

Start here: Can a single LLM call solve it?
  |
  +-- Yes --> Single LLM call. Done.
  |
  +-- No --> Are the steps fixed and known?
              |
              +-- Yes --> Is it sequential?
              |            |
              |            +-- Yes --> Prompt Chaining
              |            |
              |            +-- No  --> Parallelization (sectioning or voting)
              |
              +-- No --> Does input determine the path?
                          |
                          +-- Yes, into categories --> Routing
                          |
                          +-- Yes, into dynamic subtasks --> Orchestrator-Workers
                          |
                          +-- No --> Does it need iterative refinement?
                                      |
                                      +-- Yes --> Evaluator-Optimizer
                                      |
                                      +-- No --> Is the task fully open-ended?
                                                  |
                                                  +-- Yes --> Agent with ReAct loop

The tree terminates at the first pattern that fits. A single LLM call is the assumed default; everything else is an earned upgrade.

Multi-agent coordination

Do you need multiple specialized agents?
  |
  +-- No --> Single agent is fine.
  |
  +-- Yes --> How should they coordinate?
              |
              +-- Central control       --> Supervisor / Router
              +-- Peer-to-peer          --> Handoffs / Swarm
              +-- Deep hierarchy        --> Hierarchical Teams
              +-- Tight collaboration   --> Shared Scratchpad
              +-- Turn-taking dialogue  --> Group Chat Patterns

The question "do you need multiple specialized agents?" is genuine. DeepMind's 2024 scaling study found that sequential planning tasks degrade 39 to 70 percent under multi-agent architectures compared with single-agent baselines. Multi-agent is the right answer on parallelizable tasks with clear specialization; on sequential or loosely-structured tasks, a single agent usually wins.

Reasoning pattern

How important is correctness, and at what cost?
  |
  +-- Standard multi-step reasoning              --> Chain-of-Thought
  +-- Need exploration / backtracking            --> Tree-of-Thought
  +-- Need synthesis of partial solutions        --> Graph-of-Thought
  +-- Can retry the whole task on failure        --> Reflexion
  +-- Maximum correctness, cost not an issue     --> LATS

The cost axis matters. Chain-of-thought is a free upgrade. Self-consistency costs N calls. Tree-of-thought costs roughly k^depth. LATS costs 5 to 10x a ReAct baseline. Match the pattern to the correctness the task needs and the budget the team has.

Framework choice

If you need	Use
Maximum control over agent flow	LangGraph
Distributed agents across machines	AutoGen
Quick prototyping with roles	CrewAI
OpenAI-native with guardrails	OpenAI Agents SDK
Claude-native agent building	Claude Agent SDK
Gemini-optimized, code-first	Google ADK
Document-centric or RAG-heavy	LlamaIndex
Algorithmic prompt optimization	DSPy
AWS-managed, zero-ops	AWS Bedrock Agents
No framework (recommended start)	Direct API calls plus `asyncio`

The last row is not a joke. For a team building its first production agent, starting with direct API calls plus the model provider's SDK is a reasonable default. A framework can be added later once the team knows what they actually need from it.

Anti-patterns to avoid

Anthropic's and Google's production retrospectives name the same mistakes. The tables below consolidate the architecture and execution anti-patterns; each has a recurring symptom and a known fix.

Architecture

Anti-pattern	Symptom	Fix
Agent when a workflow suffices	Unpredictable behavior, runaway cost	Start with workflows; upgrade only when needed
Too many agents	Coordination overhead exceeds benefit	Start with one; split only when a single agent's context or tools overflow
Shared everything	Agents drown in irrelevant context	Use independent scratchpads; share only final results
No stopping condition	Infinite loops, cost incidents	Always set `max_iterations`; track repeat-call detection
Framework worship	Abstraction hides bugs	Understand the underlying API first
More than sixteen tools on one agent	Error amplification (DeepMind finding)	Split tools across specialist agents; add tool search

Execution

Anti-pattern	Symptom	Fix
No evaluation	Cannot tell if the system works	Measure performance; add complexity only when metrics improve
Trusting output blindly	Hallucinations compound	Add verification steps; use the evaluator pattern
Self-evaluation	Agent praises its own mediocre work	Use a separate evaluator tuned to be skeptical
Mixing concerns	Confused prompts, tool overload	One agent, one clear responsibility
Skipping human oversight	Dangerous actions or bad outputs	Add approval gates for irreversible actions
Over-engineering tools	Agent cannot figure out how to use them	Keep tools simple; one tool, one action

These are not edge cases. They are the failure modes that teams ship repeatedly across organizations and across frameworks. Knowing the list is most of the battle.

The simplicity test

Anthropic's agent-building guide ends with a short test that this series has repeatedly pointed back to. Before adding any agentic complexity, ask three questions.

First: does a single LLM call with good prompting solve this? If yes, stop. Every pattern in this series is a candidate to introduce unnecessary complexity; the discipline is to avoid doing so.

Second: does adding this pattern measurably improve outcomes? If the team cannot measure the improvement, they cannot defend the complexity. Build the evaluation harness before the architecture.

Third: can the architecture be explained in one paragraph? If not, it is more complex than the task needs. Simplify until it can be.

The test is not about being minimalist for its own sake. It is about making complexity a deliberate choice rather than an accumulating default. A team that passes the test has an architecture they can defend, operate, and improve. A team that fails it has an architecture that will get worse over time.

A minimal reference example

The excerpt below shows the smallest pattern that could possibly work for a classification-and-response task: a workflow with two calls and a validation gate. No agent. No multi-agent. No reasoning pattern. No framework.

from openai import OpenAI
from pydantic import BaseModel
from typing import Literal

client = OpenAI()

class Route(BaseModel):
    category: Literal["billing", "technical", "general"]
    confidence: float

def answer(query: str) -> str:
    route = client.responses.parse(
        model="gpt-4o-mini",
        instructions="Classify as billing, technical, or general.",
        input=query, text_format=Route,
    ).output[0].content[0].parsed

    if route.confidence < 0.5:
        return "Forwarding to a human agent."

    persona = {"billing": "billing specialist",
               "technical": "technical support engineer",
               "general": "general assistant"}[route.category]
    return client.responses.create(
        model="gpt-4o-mini",
        instructions=f"Answer as a {persona}.",
        input=query,
    ).output_text

This solves a real problem. It has one gate (confidence threshold). It has no agent. It ships in fifteen lines. Most production workloads should start here.

Teams that upgrade from this to a routing workflow, and later to a multi-agent system, follow a measurable path: each upgrade is justified by a specific failure of the previous version. Teams that skip to multi-agent architectures on day one usually end up rebuilding toward something closer to this excerpt.

Where to look next

The thirty-three preceding articles cover each pattern named in the decision trees above. The index at /blog lists all of them. The runnable code examples live at github.com/subodhjena/agentic-patterns; the examples/ directory contains one numbered file per pattern, with both raw API and LangChain or LangGraph implementations from lesson 05 onward.

Three closing recommendations apply across the series.

Invest in evaluation before architecture. The only reliable way to know whether an upgrade helps is to measure it. Without an evaluation harness, every decision in the decision trees above is a guess.

Invest in tool design before prompt tuning. Anthropic's SWE-bench team spent more time on tools than on prompts and reported a 40 percent completion-time reduction from iterative tool rewrites. Tool design has higher leverage than almost any other intervention.

Revisit harness assumptions as models improve. Every component in an agent harness encodes an assumption about what the model cannot do. When the model can do more, the harness can do less. Audit annually, or after each major model upgrade, whether each component still earns its cost.

The series has tried to be specific where specific advice exists and honest where it does not. Choosing the right pattern is a judgment call that benefits from knowing the options. The goal of this framework is not to remove the judgment but to clarify what is being judged.

Neighbors in the series

This is the capstone article; every previous article in the series is a neighbor. The most directly related are the two articles that anchor the first and last stages: "Workflows Versus Agents in LLM Systems" (article 1) argues for the workflow default that this framework inherits; "Scaling and Cost Optimization" (article 32) provides the empirical grounding for the multi-agent warnings.

References

Anthropic. Building effective agents. December 2024.
Anthropic. A practical guide to building agents. March 2025.
OpenAI. Practices for deploying LLM-based agents. 2024.
Han, Joshua, et al. Towards a Science of Scaling Agent Systems. Google DeepMind, 2024.
LangChain. LangGraph concepts. 2024.

agentic-patterns decision-framework anti-patterns simplicity-test ai llm

← Back to all posts