REVIEW

Sonnet 4.6 vs Haiku 4.5: the routing decisions we make every week

Two models, sat side by side, are the workhorses of our 2026 stack. Routing between them is most of the model-cost optimisation work that matters. Here's the honest cheat sheet for which workload pins to which, with the breakeven points and the gotchas.

READ · 8 MIN UPDATED · 2026-04-10 BY · PINTOED AI STUDIO

The two models, side by side

List pricing as of writing:

Both support the same APIs: tool-use, structured output, prompt caching, vision. Both have similar context windows. They differ in capability, not feature surface. The routing question is "for this specific workload, is Sonnet's extra capability worth ~3x the price?"

What we always pin to Haiku

Across these workloads at any meaningful volume, Haiku saves large multiples on the bill with no quality difference we can measure. The mistake is defaulting these to Sonnet because the original prototype used Sonnet.

What we always pin to Sonnet

The decisions where it's actually a question

The grey zone — where the right answer changes weekly based on the specific workload and traffic pattern:

Long-document Q&A

For documents under 100K tokens with focused questions, Haiku is surprisingly capable in 2026. We've migrated several Q&A workloads from Sonnet to Haiku with no measured quality regression. The deciding factor is question complexity — fact retrieval is fine on Haiku, reasoning chains across multiple passages still favours Sonnet.

Internal copilots

For internal tools where users are forgiving (your own employees), Haiku saves real money on volume. For external products where users compare you to ChatGPT, Sonnet is worth the markup. Same workload, different routing decision based on audience tolerance.

Multi-turn agents

Agentic loops with 2-4 turns: borderline. We default to Sonnet and demote to Haiku if eval results hold. Loops with 5+ turns: Sonnet, no question. The error compounds; Haiku's slightly worse planning per turn becomes meaningfully worse outcomes across the loop.

The breakeven heuristic

Quick math we run when deciding: if Haiku's quality is within 5% of Sonnet's on your eval suite, route to Haiku at any meaningful volume. If it's worse than 5%, route to Sonnet. The 3x cost gap doesn't justify a noticeable quality drop on customer-facing work, but a measurable but small drop on internal work is fine.

The eval suite has to actually exist for this to be meaningful. Routing decisions made on vibes are how you end up with the "everything is on Sonnet because it works" stack that costs 2-3x what it should.

The hybrid pattern: Haiku-first, Sonnet-on-fail

For workloads in the grey zone, the routing pattern that wins most often is "try Haiku, escalate to Sonnet on a confidence check or schema validation failure."

Concretely: Haiku attempts the task. The output runs through a cheap validator (Haiku itself with a yes/no judge prompt, or a structured-output check). If it passes, ship. If it fails, retry with Sonnet. We see ~70-85% of traffic stay on Haiku and get the quality of Sonnet on the trickier 15-30% — at a blended cost much closer to Haiku than Sonnet.

The pattern only works when the eval bar is something Haiku can check itself. For tasks where quality is inherently subjective, this approach fails — there's nothing to gate on.

The routing mistake we keep correcting

In every audit we run, the same thing: everything is on Sonnet, even traffic Haiku would handle in its sleep. The team built the prototype on Sonnet, never revisited the routing, and the bill quietly grew. Splitting out the Haiku-eligible 30-50% of traffic typically cuts spend by 40-60% with zero quality regression on the routed traffic.

The fix is mechanical. List your model calls by category. For each one, ask "could Haiku do this?" If you can't answer, run a one-week shadow eval — Haiku and Sonnet on the same inputs, diff the outputs, calibrate the bar. Then route.

The one-line cheat sheet

Haiku for the 30-50% of traffic that's mechanical or classifiable. Sonnet for the rest. Opus for the small slice that needs hard reasoning (covered in our Opus 4.7 writeup). Most teams under-route to Haiku, which is the cheapest leak to fix in any audit we've ever run.

For the cohort cost picture across our client stacks, see what our clients actually pay for Claude; the Haiku-batch profile is the shape we route to most often.

Default everything to Sonnet? Let us audit the routing.

BOOK A ROUTING AUDIT → RUN THE CALCULATOR →