One Model Is the Wrong Default

The mistake. Pick the smartest model. Send everything to it. Pay premium prices to summarise an email. There is a better way, and it is not complicated.

The Single-Model Trap

When people wire up an AI agent, they usually choose one model — normally the most capable one they can afford — and route every task through it.

It feels right. Smartest model, best results.

It is also how you end up paying flagship prices to do a job a cheap model does just as well: extracting a date from an email, deciding which tool to call, summarising a meeting note.

I run a self-hosted agent that does dozens of tasks a day — morning briefings, code edits, research summaries, market analysis. Routing all of that through one model would be slow and expensive. So it does not. It runs a multi-model architecture: different jobs go to different models, deliberately.

1. The Three Jobs Inside Every Agent

Almost everything an agent does falls into one of three buckets. Each wants a different kind of model.

Job	What it looks like	What it needs
Orchestration	Routing, tool calls, “which step next?”, summarising	Fast, cheap, reliable — runs constantly
Execution	Writing code, scripts, automation, transforms	High throughput, strong at structure
Reasoning	Analysis, hard trade-offs, curated reports	Depth and accuracy, used sparingly

The orchestrator is the dispatcher. It runs on almost every turn, so it must be cheap and fast. The reasoning model is the senior specialist — expensive, slow, and worth every cent on the 10% of tasks that actually need it.

Using your reasoning model as your dispatcher is like having a partner at a law firm answer the front desk phone.

2. How My Agent Routes Work

The actual routing in my agent looks like this:

     TASK                         MODEL TIER
     ────                         ──────────
     Tool calls / routing    →    small, fast orchestrator
     Summarise / extract     →    small, fast orchestrator
     Code / scripting        →    high-throughput coding model
     Complex analysis        →    reasoning model (used rarely)
     Huge context (>64k)     →    large-context model

Multi-Model Routing Architecture

A lightweight model handles orchestration and summarisation — the constant background hum. A high-throughput model handles coding and automation, where volume matters more than brilliance. A dedicated reasoning model is reserved for the genuinely hard calls: a market analysis, a trade-off with real consequences, a report someone will act on.

The orchestrator decides, per task, who gets the work.

3. The Cost Argument Is Brutal

This is not a marginal optimisation. The price gap between model tiers is often 10x to 30x+ per token:

Tier / Model	Input Cost (per 1M tokens)	Output Cost (per 1M tokens)	Relative Price Difference
Reasoning Tier (e.g. OpenAI o1)	$15.00	$60.00	100x (Reasoning vs. Small)
Execution / Coding (e.g. GPT-4o / Claude 3.5 Sonnet)	$2.50–$3.00	$10.00–$15.00	16x–20x (Execution vs. Small)
Orchestration (e.g. GPT-4o-mini / Gemini 1.5 Flash)	$0.15	$0.60	Baseline (1x)

     SINGLE FLAGSHIP MODEL          ROUTED MULTI-MODEL
     ─────────────────────          ──────────────────
     Every task at premium      →   Premium only when needed
     Summary = flagship price   →   Summary = cents
     1 model, 1 failure mode    →   Fallbacks across providers
     Cost scales with activity  →   Cost scales with difficulty

When you route by difficulty instead of sending everything to the top tier, the cheap tasks — which are most of them — cost almost nothing. Your spend tracks how hard the work was, not how much of it there was. For an always-on agent doing hundreds of small tasks a day, that is the difference between a hobby and a runaway bill.

4. Resilience Comes Free

There is a second prize beyond cost.

A single-model agent has a single point of failure. When that provider has an outage, rate-limits you, or quietly changes behaviour, your entire agent degrades at once.

A multi-model agent routes around trouble. If the coding model is down, automation can fall back to another. If one provider rate-limits, the orchestrator shifts load. You have already done the integration work to talk to several models — so a provider going dark becomes an inconvenience, not an outage.

In production, “it still works when one vendor breaks” is not a luxury. It is the baseline.

5. The Discipline Is the Moat

I have written before that in regulated AI, the moat is no longer the model — it is the discipline around the model. Routing is that idea made concrete.

Anyone can do	Few actually do
Call the best model	Know which model each task deserves
Pay the bill	Engineer the bill down 10x
Hope the provider stays up	Build fallbacks before they are needed
Demo on one model	Run a fleet in production

The teams that win the next phase of AI are not the ones with access to the best model. Everyone has that. They are the ones who built the routing layer — the judgement about which model to trust with what, at what cost, with what fallback.

The Big Takeaway

You would not hire one person to be your CFO, your developer, and your receptionist. Stop asking one model to be all three. Route by difficulty, pay by difficulty, and survive a provider outage while you are at it.

One model is the wrong default. The right default is a small, fast orchestrator that knows when to call in someone smarter — and a bill that proves it.

Your AI Agent Needs a Soul File — the memory layer that pairs with this routing layer.
I Gave an AI Agent the Keys to My Life. Here Is the Trust Architecture. — the safety layer on top.
MCP Tool Poisoning: The Attack Vector Nobody Is Talking About — why discipline beats raw capability.
The 10-Star Experience: Why Product and Engineering Need Legendary Test Cases — how routing to the right model enables a seamless, fast 10-star experience.

Written by Haris Habib from Sydney, Australia | May 2026