AI / LLM

The Tool-Calling Agent Loop: ReAct as It Actually Ships

7 min readAILLM

The ReAct paper described a loop in which a language model emitted "Thought", "Action", and "Observation" blocks as free text, and a harness parsed them back out. The pattern was prescient; the parsing was fragile. Modern agent frameworks implement the same loop on top of native tool-calling, the SDK feature in which the model emits a structured tool call directly and the harness receives a typed object. The control flow is the same; the interface is no longer text.

This article covers the tool-calling agent loop as it actually ships. The shape is straightforward: the model receives tools, decides whether to call one, and if not, returns a final answer. The hardening around that shape is where production differs from demo. OpenAI's Agents SDK documents a six-step loop that includes input guardrails, output guardrails, handoff handling, and a max-turns limit (OpenAI, 2024); LangGraph's create_react_agent and Google's Agent Development Kit implement the same contract with different names. The rest of this article walks the loop and names the defenses each step needs.

One turn, step by step

sequenceDiagram
    participant U as User
    participant H as Harness
    participant G1 as Input guardrails
    participant L as LLM
    participant T as Tool
    participant G2 as Output guardrails

    U->>H: Query
    H->>G1: Check input
    G1-->>H: Pass or abort
    H->>L: Messages + tools
    L-->>H: Tool calls or final answer
    alt tool call
        H->>T: Execute
        T-->>H: Result
        H->>L: Append tool message, loop
    else final answer
        H->>G2: Check output
        G2-->>H: Pass or rewrite
        H->>U: Response
    end

The loop runs until the model returns a final answer or until max_turns is exceeded. Each branch of the diagram corresponds to a decision the model made on the current turn.

The OpenAI Agents SDK phrases the same loop as six steps:

  1. Check input guardrails in parallel. If any tripwire triggers, abort.
  2. Call the LLM with conversation plus tools.
  3. If the response is a final output, run output guardrails and return.
  4. If the response contains tool calls, execute them, append results, and return to step 2.
  5. If the response is a handoff, switch the active agent and return to step 1.
  6. Check max_turns to prevent infinite loops.

The shape is generic. Every production framework has the same five actions: guard, invoke, dispatch, execute, check termination. Framework choice affects ergonomics, not the underlying pattern.

Native tool-calling replaces text parsing

In the original ReAct formulation, the model emitted lines like Action: search("France GDP"), and the harness pulled the function name and argument out with a regex. The approach was sensitive to formatting drift: a missing quote, a stray newline, a different action name, and the parse failed.

Native tool-calling, introduced in OpenAI's function-calling API in 2023 and now supported across all major models, replaces this with a structured output contract. The SDK returns a list of tool-call objects with a name field, an id field, and a JSON-typed arguments payload. The harness never parses free text for this purpose again. The upgrade is less visible than the reasoning improvements of the past several years but removes a whole class of production failures.

Two versions in code

The raw loop below is a tool-calling ReAct agent with a step budget and repeated-call detection. It is short, but every line of defense is intentional.

from openai import OpenAI
import json

client = OpenAI()

def run_tool(name: str, args: dict, tools_impl: dict) -> str:
    fn = tools_impl.get(name)
    return fn(**args) if fn else f"unknown tool: {name}"

def agent(query: str, tools_spec: list, tools_impl: dict,
          max_turns: int = 8) -> str:
    messages = [{"role": "user", "content": query}]
    seen_calls: set[tuple] = set()
    for _ in range(max_turns):
        r = client.chat.completions.create(
            model="gpt-4o-mini", messages=messages, tools=tools_spec)
        msg = r.choices[0].message
        if not msg.tool_calls:
            return msg.content
        for call in msg.tool_calls:
            key = (call.function.name, call.function.arguments)
            if key in seen_calls:
                return "halted: repeated tool call detected"
            seen_calls.add(key)
        messages.append(msg)
        for call in msg.tool_calls:
            args = json.loads(call.function.arguments)
            result = run_tool(call.function.name, args, tools_impl)
            messages.append({"role": "tool", "tool_call_id": call.id,
                             "content": result})
    return "halted: max turns exceeded"

The LangGraph version adds checkpointing and streaming for free. The create_react_agent helper wires input guardrails as pre_model_hook callbacks and output guardrails as post_model_hook callbacks. Adding interrupt points for human review is a single option change.

from langchain_core.tools import tool
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import InMemorySaver

@tool
def search(query: str) -> str:
    """Search the web for a fact."""
    return do_search(query)

def input_guard(state):
    query = state["messages"][-1].content
    if "password" in query.lower():
        raise ValueError("input guardrail: sensitive input blocked")
    return state

agent = create_react_agent(
    model=init_chat_model("gpt-4o-mini"),
    tools=[search],
    checkpointer=InMemorySaver(),
    pre_model_hook=input_guard,
)

result = agent.invoke(
    {"messages": [("user", "GDP of France?")]},
    config={"configurable": {"thread_id": "t1"}},
)

Full runnable versions are at github.com/subodhjena/agentic-patterns under examples/08_react_agent.py and examples/08_react_agent_langgraph.py. Guardrails have a dedicated article later in this series; this article treats them as fixtures of the loop.

Guardrails at entry and exit

An input guardrail runs before the LLM call. It inspects the user's message for policy violations, prompt-injection patterns, or out-of-scope content. Running input guardrails in parallel with a lightweight model is common; the extra latency is hidden behind the main model call.

An output guardrail runs on the final answer, before it reaches the user. It checks for disallowed content, leaked secrets, or policy violations that the main model's output might have crossed. When an output guardrail flags a response, the harness either rewrites it, replaces it with a safe default, or escalates. Guardrails are not a substitute for model safety training; they are a second layer that catches the cases training did not cover.

Handoffs, briefly

A handoff is a special return type: the model indicates that a different agent should take over. The harness swaps the active agent, keeps the conversation history, and restarts the loop. The multi-agent stage of this series covers handoffs in detail under the swarm pattern; this article notes only that the tool-calling loop must accommodate them in step five of the OpenAI SDK contract, and that a handoff resets the guardrail pipeline for the new agent.

Where the loop needs defenses

The tool-calling loop is small enough to memorize and large enough to fail in many distinct ways. The defenses below are the ones that keep production systems quiet.

Repeated tool calls. A confused agent calls the same tool with the same arguments over and over. The raw code above tracks a seen_calls set and halts on a repeat; production frameworks typically hash the call and abort when a hash recurs. Without this, confused agents burn tokens until the turn budget runs out.

Max-turn budget. Every loop must terminate. max_turns is a hard ceiling; it should be set conservatively (eight is a common default) and raised only when a task genuinely needs more depth.

Tool error handling. A tool that throws should not propagate the exception into the model. Wrap tool execution in a try block and return a structured error message that the model can read and reason about.

Tool argument validation. Native tool-calling guarantees a parseable JSON object; it does not guarantee that arguments are semantically valid. Validate required fields and reject early rather than after a downstream service fails.

Observation size. A tool that returns thousands of tokens floods the window. Truncate or summarize before appending to messages. Context engineering, covered earlier in this series, applies inside the loop.

Guardrail bypass. A model instructed to "ignore prior instructions" can sometimes coax a tool into running something it should not. Guardrails must apply to tool inputs and tool outputs, not only to the initial query and the final answer.

Trade against a one-shot prompt

The tool-calling loop inherits the trade a ReAct agent makes: higher cost and variable latency in exchange for grounded, self-correcting behavior on open-ended tasks. The defenses above make the trade bearable at scale.

Axis Single model call Tool-calling loop
Control flow None; author prompts once Emergent across turns
Latency One round-trip Variable, bounded by max_turns
Cost per task Predictable Variable; defenses needed
Grounding Depends on injected context Native via tool calls
Observability One request-response pair Full trace: per turn, per tool call
Safety surface Input, output Input, output, each tool invocation

Neighbors in the series

ReAct, the previous article, is the academic statement of the pattern. This article is the production implementation. Plan-and-execute, the next article, separates planning from acting; ReAct interleaves them. The Agent-Computer Interface article covers tool design, which is the single highest-leverage investment for improving tool-calling agents. Guardrails, covered in the Safety stage, goes deeper on input and output guards and on safety-specific patterns beyond the loop itself. Human-in-the-loop interrupts, also in Safety, adds a pause-and-approve step between tool calls for high-stakes actions.

References

  1. OpenAI. Practices for deploying LLM-based agents. 2024.
  2. Anthropic. Building effective agents. December 2024.
  3. Yao, Shunyu, et al. ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
  4. LangChain. create_react_agent reference. 2024.
  5. Google. Agent Development Kit. 2024.
agentic-patternstool-callingagentsreactproductionaillm
← Back to all posts