AI / LLM

Routing: Classify and Dispatch LLM Requests to Specialists

6 min readAILLM

A system prompt tuned to handle a billing refund well is almost never the same prompt that handles a technical outage well. The two inputs demand different vocabulary, different tools, different escalation paths, and different tolerance for ambiguity. Trying to serve both from a single prompt produces a long, defensive instruction block that performs poorly on the tail of each category while inflating token cost on every request.

Routing separates the two concerns. A small first call classifies the input into a category. A subsequent call, chosen by that classification, handles the request with a focused prompt and a tailored set of tools. Anthropic lists routing as one of the canonical workflow patterns and recommends it for any task with distinct input categories that benefit from different handling (Anthropic, 2024). Routing is the second pattern to reach for whenever a one-shot prompt starts accumulating conditional instructions. The primary cost it pays, one extra model call per request, is almost always worth the separation it buys.

The classifier on top

flowchart TD
    IN([Input]) --> R[Router: classify]
    R -->|billing| H1[Billing handler: prompt + payment tools]
    R -->|technical| H2[Technical handler: prompt + docs, logs]
    R -->|general| H3[General handler: FAQ lookup]
    R -->|low confidence| E[Escalate or default]
    H1 --> OUT([Output])
    H2 --> OUT
    H3 --> OUT
    E --> OUT

The classifier is the load-bearing component. It returns a typed category plus, ideally, a confidence value. Routes with low confidence take a different path than high-confidence routes. The classifier is typically the smallest and cheapest model in the system; its job is to pick a label, not to answer a question.

A useful variation routes by difficulty rather than by topic. Simple questions go to a small, cheap model; complex reasoning goes to a larger model. The same pattern, with a classifier that emits "simple" or "complex" labels, cuts aggregate cost without sacrificing quality on hard inputs.

Specialists underneath

Each handler assumes its category and can be shorter and sharper than any general-purpose prompt. A billing handler knows the refund policy, has access to payment APIs, and speaks the language of charges and credits. A technical handler knows diagnostic flows, has access to logs, and speaks the language of errors and reproduction steps. Neither handler is ever asked to identify itself; the classifier did that already.

Handlers do not call each other. Categories do not share handlers. Routing is a one-level tree, not a graph. The moment handlers need to collaborate, the right pattern is not routing but a supervisor or an orchestrator, both covered later in this series.

Two versions in code

The excerpt below shows routing without a framework. The classifier uses structured output to return a Literal category. The dispatch is a Python match statement that picks a handler.

from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Literal

client = OpenAI()

class Route(BaseModel):
    category: Literal["billing", "technical", "general"]
    confidence: float = Field(ge=0.0, le=1.0)

def classify(query: str) -> Route:
    return client.responses.parse(
        model="gpt-4o-mini",
        instructions="Classify the query into one of the categories.",
        input=query, text_format=Route,
    ).output[0].content[0].parsed

def billing_handler(query: str) -> str:
    return client.responses.create(model="gpt-4o-mini",
        instructions="Billing specialist. Use payment policy. Be precise.",
        input=query).output_text

def technical_handler(query: str) -> str:
    return client.responses.create(model="gpt-4o-mini",
        instructions="Technical support. Suggest diagnostics before solutions.",
        input=query).output_text

def general_handler(query: str) -> str:
    return client.responses.create(model="gpt-4o-mini",
        instructions="Answer from the FAQ. Keep short.",
        input=query).output_text

def answer(query: str) -> str:
    route = classify(query)
    if route.confidence < 0.5:
        return "Forwarding to a human agent."
    match route.category:
        case "billing":   return billing_handler(query)
        case "technical": return technical_handler(query)
        case "general":   return general_handler(query)

The LangGraph version expresses the same shape as a graph with conditional edges. The classifier is a node that returns a route decision; the conditional edge function maps the decision to a downstream node.

from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-4o-mini")

class State(TypedDict):
    query: str
    route: Route
    answer: str

def classify_node(s: State) -> State:
    return {**s, "route": model.with_structured_output(Route).invoke(s["query"])}

def route_fn(s: State) -> str:
    return "escalate" if s["route"].confidence < 0.5 else s["route"].category

def billing_node(s): return {**s, "answer": model.invoke(f"Billing: {s['query']}").content}
def technical_node(s): return {**s, "answer": model.invoke(f"Technical: {s['query']}").content}
def general_node(s): return {**s, "answer": model.invoke(f"FAQ: {s['query']}").content}
def escalate_node(s): return {**s, "answer": "Forwarding to a human agent."}

graph = (StateGraph(State)
         .add_node("classify", classify_node)
         .add_node("billing", billing_node).add_node("technical", technical_node)
         .add_node("general", general_node).add_node("escalate", escalate_node)
         .add_edge(START, "classify")
         .add_conditional_edges("classify", route_fn, {
             "billing": "billing", "technical": "technical",
             "general": "general", "escalate": "escalate"})
         .add_edge("billing", END).add_edge("technical", END)
         .add_edge("general", END).add_edge("escalate", END)
         .compile())

Runnable versions are at github.com/subodhjena/agentic-patterns under examples/05_routing.py and examples/05_routing_langgraph.py.

Where the classifier breaks

Routing introduces a new failure mode: misclassification. Several patterns indicate the router is doing more harm than good.

Ambiguous categories. If a classifier cannot reliably distinguish two categories, the problem is the taxonomy, not the classifier. Collapse the categories or refine their definitions.

Confidence theater. A classifier that always returns a confidence above 0.9 is calibrated badly. Measure confidence distributions on a held-out set before relying on them for escalation.

Over-specialized handlers. Handlers that share most of their prompt and differ only in a sentence or two are an anti-pattern. The router is paying for separation that does not exist.

Cascading classifiers. A classifier whose output is fed into another classifier is usually a sign that the taxonomy is a tree rather than a flat list. If the branching is genuinely hierarchical, consider a supervisor pattern, covered later in the Multi-Agent stage.

Routing with no fallback. Every classifier occasionally produces a category that does not apply. Without a default handler or a human escalation path, these requests produce incoherent output. A route-to-escalation branch is mandatory in production.

Latency stacked twice. Routing adds a round-trip. If the classifier runs on the same model as the handler and adds no specialization, it is pure overhead. Use a small model for the classifier when possible.

What routing is not

Routing is not decomposition. The input is handled by exactly one specialist; nothing is broken into subtasks. When the task needs to be split into pieces that run in parallel, the right pattern is parallelization or orchestrator-workers, both in this stage of the series.

Routing is not planning. The classifier makes one decision and is done. When the number and identity of steps depend on the input at runtime, the right pattern is plan-and-execute, covered in the Agents stage.

Routing is not a multi-agent system. Each handler is an LLM call with its own prompt. Handlers do not hold state across requests, do not consult each other, and do not negotiate. When collaboration is needed, the right pattern is a supervisor with agent workers.

The trade against a single prompt

Routing is cheap to introduce and easy to get wrong. The axes below help decide when the trade is worth it.

Axis Single prompt Routing
Latency One round-trip Two round-trips, usually cheaper per call
Cost Predictable per call Depends on classifier and handler choice
Quality per category Averaged Higher on each category
Prompt maintainability Degrades with rules Cleaner, each handler has a scope
Failure mode Silent degradation Misclassification plus handler failure
Observability One opaque response Classification and answer logged separately

Routing is almost always the correct response to a prompt that is accumulating conditional rules. The cases where a single prompt wins are narrow: a small domain, a small taxonomy, and low variance across categories.

Neighbors in the series

Prompt chaining, the previous article, is the pattern to reach for when the task has sequential steps rather than branching categories. Orchestrator-workers, covered next, generalizes routing to dynamic decomposition: instead of picking one handler from a fixed list, the orchestrator plans a variable-length set of subtasks at runtime. Supervisor and router, covered in the Multi-Agent stage, is routing extended with agents as handlers rather than plain LLM calls. Guardrails, covered in the Safety stage, often live in front of or inside the router and detect out-of-scope inputs before any specialist sees them.

References

  1. Anthropic. Building effective agents. December 2024.
  2. OpenAI. Optimizing LLM accuracy. 2024.
  3. LangChain. Conditional edges in LangGraph. 2024.
  4. Google Cloud. Orchestration patterns for generative AI. 2024.
  5. Shen, Yongliang, et al. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends. 2023.
agentic-patternsroutingworkflowsclassificationaillm
← Back to all posts