The two models, side by side
List pricing as of writing:
- Sonnet 4.6 — $3/Mtok in, $15/Mtok out. Default reasoning. Default code-edit. Default conversation.
- Haiku 4.5 — $1/Mtok in, $5/Mtok out. Roughly 3x cheaper across the board. Fastest first-token latency.
Both support the same APIs: tool-use, structured output, prompt caching, vision. Both have similar context windows. They differ in capability, not feature surface. The routing question is "for this specific workload, is Sonnet's extra capability worth ~3x the price?"
What we always pin to Haiku
- Classification. "What category is this?" "Is this spam?" "Which queue does this ticket belong in?" Haiku is fully sufficient. Sonnet on classification is a budget leak.
- Extraction with a tight schema. "Pull the dates and dollar amounts from this." Haiku handles cleanly when the schema is well-defined.
- Pre-routing layers. The "should this be answered by docs or escalated to a human" decision is a Haiku call.
- Eval judges. Haiku as the grader on eval suites — see the minimum viable eval. Cheap, reliable, deterministic enough.
- Structured re-formatting. "Rewrite this dump in this format." Mechanical work; Haiku is fine.
Across these workloads at any meaningful volume, Haiku saves large multiples on the bill with no quality difference we can measure. The mistake is defaulting these to Sonnet because the original prototype used Sonnet.
What we always pin to Sonnet
- Customer-facing conversation. The marginal cost is real but the quality difference shows up in CSAT data. Sonnet's tone, refusals, and contextual handling are noticeably better.
- Code generation and editing. Haiku can write code; Sonnet reasons about it. We pay the premium for any code that hits a real codebase.
- Drafting that requires voice and judgement. Personalised outbound, customer support replies, anything where the team's voice matters.
- Tool-use orchestration on agents that loop more than 2-3 times. Sonnet picks better tool sequences. Haiku is fine for single-tool deterministic flows; on harder agentic loops Sonnet's planning is materially better.
- Document analysis where the conclusion has to be defensible. Legal, financial, compliance work. Sonnet's reasoning leaves a cleaner audit trail.
The decisions where it's actually a question
The grey zone — where the right answer changes weekly based on the specific workload and traffic pattern:
Long-document Q&A
For documents under 100K tokens with focused questions, Haiku is surprisingly capable in 2026. We've migrated several Q&A workloads from Sonnet to Haiku with no measured quality regression. The deciding factor is question complexity — fact retrieval is fine on Haiku, reasoning chains across multiple passages still favours Sonnet.
Internal copilots
For internal tools where users are forgiving (your own employees), Haiku saves real money on volume. For external products where users compare you to ChatGPT, Sonnet is worth the markup. Same workload, different routing decision based on audience tolerance.
Multi-turn agents
Agentic loops with 2-4 turns: borderline. We default to Sonnet and demote to Haiku if eval results hold. Loops with 5+ turns: Sonnet, no question. The error compounds; Haiku's slightly worse planning per turn becomes meaningfully worse outcomes across the loop.
The breakeven heuristic
Quick math we run when deciding: if Haiku's quality is within 5% of Sonnet's on your eval suite, route to Haiku at any meaningful volume. If it's worse than 5%, route to Sonnet. The 3x cost gap doesn't justify a noticeable quality drop on customer-facing work, but a measurable but small drop on internal work is fine.
The eval suite has to actually exist for this to be meaningful. Routing decisions made on vibes are how you end up with the "everything is on Sonnet because it works" stack that costs 2-3x what it should.
The hybrid pattern: Haiku-first, Sonnet-on-fail
For workloads in the grey zone, the routing pattern that wins most often is "try Haiku, escalate to Sonnet on a confidence check or schema validation failure."
Concretely: Haiku attempts the task. The output runs through a cheap validator (Haiku itself with a yes/no judge prompt, or a structured-output check). If it passes, ship. If it fails, retry with Sonnet. We see ~70-85% of traffic stay on Haiku and get the quality of Sonnet on the trickier 15-30% — at a blended cost much closer to Haiku than Sonnet.
The pattern only works when the eval bar is something Haiku can check itself. For tasks where quality is inherently subjective, this approach fails — there's nothing to gate on.
The routing mistake we keep correcting
In every audit we run, the same thing: everything is on Sonnet, even traffic Haiku would handle in its sleep. The team built the prototype on Sonnet, never revisited the routing, and the bill quietly grew. Splitting out the Haiku-eligible 30-50% of traffic typically cuts spend by 40-60% with zero quality regression on the routed traffic.
The fix is mechanical. List your model calls by category. For each one, ask "could Haiku do this?" If you can't answer, run a one-week shadow eval — Haiku and Sonnet on the same inputs, diff the outputs, calibrate the bar. Then route.
The one-line cheat sheet
Haiku for the 30-50% of traffic that's mechanical or classifiable. Sonnet for the rest. Opus for the small slice that needs hard reasoning (covered in our Opus 4.7 writeup). Most teams under-route to Haiku, which is the cheapest leak to fix in any audit we've ever run.
For the cohort cost picture across our client stacks, see what our clients actually pay for Claude; the Haiku-batch profile is the shape we route to most often.