AI / LLM
Routing: Classify and Dispatch LLM Requests to Specialists
A system prompt tuned to handle a billing refund well is almost never the same prompt that handles a technical outage well. The two inputs demand different vocabulary, different tools, different escalation paths, and different tolerance for ambiguity. Trying to serve both from a single prompt produces a long, defensive instruction block that performs poorly on the tail of each category while inflating token cost on every request.
Routing separates the two concerns. A small first call classifies the input into a category. A subsequent call, chosen by that classification, handles the request with a focused prompt and a tailored set of tools. Anthropic lists routing as one of the canonical workflow patterns and recommends it for any task with distinct input categories that benefit from different handling (Anthropic, 2024). Routing is the second pattern to reach for whenever a one-shot prompt starts accumulating conditional instructions. The primary cost it pays, one extra model call per request, is almost always worth the separation it buys.
The classifier on top
flowchart TD
IN([Input]) --> R[Router: classify]
R -->|billing| H1[Billing handler: prompt + payment tools]
R -->|technical| H2[Technical handler: prompt + docs, logs]
R -->|general| H3[General handler: FAQ lookup]
R -->|low confidence| E[Escalate or default]
H1 --> OUT([Output])
H2 --> OUT
H3 --> OUT
E --> OUT
The classifier is the load-bearing component. It returns a typed category plus, ideally, a confidence value. Routes with low confidence take a different path than high-confidence routes. The classifier is typically the smallest and cheapest model in the system; its job is to pick a label, not to answer a question.
A useful variation routes by difficulty rather than by topic. Simple questions go to a small, cheap model; complex reasoning goes to a larger model. The same pattern, with a classifier that emits "simple" or "complex" labels, cuts aggregate cost without sacrificing quality on hard inputs.
Specialists underneath
Each handler assumes its category and can be shorter and sharper than any general-purpose prompt. A billing handler knows the refund policy, has access to payment APIs, and speaks the language of charges and credits. A technical handler knows diagnostic flows, has access to logs, and speaks the language of errors and reproduction steps. Neither handler is ever asked to identify itself; the classifier did that already.
Handlers do not call each other. Categories do not share handlers. Routing is a one-level tree, not a graph. The moment handlers need to collaborate, the right pattern is not routing but a supervisor or an orchestrator, both covered later in this series.
Two versions in code
The excerpt below shows routing without a framework. The classifier uses structured output to return a Literal category. The dispatch is a Python match statement that picks a handler.
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Literal
client = OpenAI()
class Route(BaseModel):
category: Literal["billing", "technical", "general"]
confidence: float = Field(ge=0.0, le=1.0)
def classify(query: str) -> Route:
return client.responses.parse(
model="gpt-4o-mini",
instructions="Classify the query into one of the categories.",
input=query, text_format=Route,
).output[0].content[0].parsed
def billing_handler(query: str) -> str:
return client.responses.create(model="gpt-4o-mini",
instructions="Billing specialist. Use payment policy. Be precise.",
input=query).output_text
def technical_handler(query: str) -> str:
return client.responses.create(model="gpt-4o-mini",
instructions="Technical support. Suggest diagnostics before solutions.",
input=query).output_text
def general_handler(query: str) -> str:
return client.responses.create(model="gpt-4o-mini",
instructions="Answer from the FAQ. Keep short.",
input=query).output_text
def answer(query: str) -> str:
route = classify(query)
if route.confidence < 0.5:
return "Forwarding to a human agent."
match route.category:
case "billing": return billing_handler(query)
case "technical": return technical_handler(query)
case "general": return general_handler(query)
The LangGraph version expresses the same shape as a graph with conditional edges. The classifier is a node that returns a route decision; the conditional edge function maps the decision to a downstream node.
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langchain.chat_models import init_chat_model
model = init_chat_model("gpt-4o-mini")
class State(TypedDict):
query: str
route: Route
answer: str
def classify_node(s: State) -> State:
return {**s, "route": model.with_structured_output(Route).invoke(s["query"])}
def route_fn(s: State) -> str:
return "escalate" if s["route"].confidence < 0.5 else s["route"].category
def billing_node(s): return {**s, "answer": model.invoke(f"Billing: {s['query']}").content}
def technical_node(s): return {**s, "answer": model.invoke(f"Technical: {s['query']}").content}
def general_node(s): return {**s, "answer": model.invoke(f"FAQ: {s['query']}").content}
def escalate_node(s): return {**s, "answer": "Forwarding to a human agent."}
graph = (StateGraph(State)
.add_node("classify", classify_node)
.add_node("billing", billing_node).add_node("technical", technical_node)
.add_node("general", general_node).add_node("escalate", escalate_node)
.add_edge(START, "classify")
.add_conditional_edges("classify", route_fn, {
"billing": "billing", "technical": "technical",
"general": "general", "escalate": "escalate"})
.add_edge("billing", END).add_edge("technical", END)
.add_edge("general", END).add_edge("escalate", END)
.compile())
Runnable versions are at github.com/subodhjena/agentic-patterns under examples/05_routing.py and examples/05_routing_langgraph.py.
Where the classifier breaks
Routing introduces a new failure mode: misclassification. Several patterns indicate the router is doing more harm than good.
Ambiguous categories. If a classifier cannot reliably distinguish two categories, the problem is the taxonomy, not the classifier. Collapse the categories or refine their definitions.
Confidence theater. A classifier that always returns a confidence above 0.9 is calibrated badly. Measure confidence distributions on a held-out set before relying on them for escalation.
Over-specialized handlers. Handlers that share most of their prompt and differ only in a sentence or two are an anti-pattern. The router is paying for separation that does not exist.
Cascading classifiers. A classifier whose output is fed into another classifier is usually a sign that the taxonomy is a tree rather than a flat list. If the branching is genuinely hierarchical, consider a supervisor pattern, covered later in the Multi-Agent stage.
Routing with no fallback. Every classifier occasionally produces a category that does not apply. Without a default handler or a human escalation path, these requests produce incoherent output. A route-to-escalation branch is mandatory in production.
Latency stacked twice. Routing adds a round-trip. If the classifier runs on the same model as the handler and adds no specialization, it is pure overhead. Use a small model for the classifier when possible.
What routing is not
Routing is not decomposition. The input is handled by exactly one specialist; nothing is broken into subtasks. When the task needs to be split into pieces that run in parallel, the right pattern is parallelization or orchestrator-workers, both in this stage of the series.
Routing is not planning. The classifier makes one decision and is done. When the number and identity of steps depend on the input at runtime, the right pattern is plan-and-execute, covered in the Agents stage.
Routing is not a multi-agent system. Each handler is an LLM call with its own prompt. Handlers do not hold state across requests, do not consult each other, and do not negotiate. When collaboration is needed, the right pattern is a supervisor with agent workers.
The trade against a single prompt
Routing is cheap to introduce and easy to get wrong. The axes below help decide when the trade is worth it.
| Axis | Single prompt | Routing |
|---|---|---|
| Latency | One round-trip | Two round-trips, usually cheaper per call |
| Cost | Predictable per call | Depends on classifier and handler choice |
| Quality per category | Averaged | Higher on each category |
| Prompt maintainability | Degrades with rules | Cleaner, each handler has a scope |
| Failure mode | Silent degradation | Misclassification plus handler failure |
| Observability | One opaque response | Classification and answer logged separately |
Routing is almost always the correct response to a prompt that is accumulating conditional rules. The cases where a single prompt wins are narrow: a small domain, a small taxonomy, and low variance across categories.
Neighbors in the series
Prompt chaining, the previous article, is the pattern to reach for when the task has sequential steps rather than branching categories. Orchestrator-workers, covered next, generalizes routing to dynamic decomposition: instead of picking one handler from a fixed list, the orchestrator plans a variable-length set of subtasks at runtime. Supervisor and router, covered in the Multi-Agent stage, is routing extended with agents as handlers rather than plain LLM calls. Guardrails, covered in the Safety stage, often live in front of or inside the router and detect out-of-scope inputs before any specialist sees them.
References
- Anthropic. Building effective agents. December 2024.
- OpenAI. Optimizing LLM accuracy. 2024.
- LangChain. Conditional edges in LangGraph. 2024.
- Google Cloud. Orchestration patterns for generative AI. 2024.
- Shen, Yongliang, et al. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends. 2023.