CASE

What our clients actually pay for Claude

Ten anonymized client stacks. Real monthly bills. Three profile shapes — Opus-heavy, Sonnet-default, Haiku-batch — and what each one costs at a typical workload size. Includes the leaks we keep finding when we audit incoming engagements.

READ · 11 MIN UPDATED · 2026-04-26 BY · PINTOED AI STUDIO

How we picked the ten

Ten is a sample, not a survey. We picked the ten clients whose stacks we know cold — meaning we built or audited the whole pipeline and have access to monthly invoices. Spend ranges from ~$900/mo (smallest, an internal tool at a 12-person startup) to ~$92K/mo (largest, a customer-facing agent on a B2B platform).

Names are scrubbed. Workload shapes are not. If your stack resembles one of these, you can use the median spend as a sanity check.

Profile A: Opus-heavy

Three of the ten clients sit here. The pattern: a small number of high-stakes calls per day, each requiring the strongest reasoning. Examples — legal-document analysis pipeline, financial-modeling assistant, deep-research agent for an enterprise sales team.

Where the money goes: Opus output tokens. Across this profile, output is ~40% of the bill despite being only ~20% of token volume — the $25/Mtok output rate dominates.

Profile B: Sonnet-default

Four of the ten. The most common shape we see. Customer-facing agents, internal copilots, broad coding-assistance products. Sonnet 4.6 handles the bulk of traffic; Opus is reserved for hard escalations and Haiku for trivial classification.

Where the money goes: input tokens. The bill is overwhelmingly input-driven. The single highest-leverage cost optimisation in this profile is always: trim the system prompt and aggressively cache what remains.

Profile C: Haiku-batch

Three of the ten. High-volume, low-stakes-per-call workloads. Email classification at an MSP, content moderation at a UGC platform, embedding-replacement classification for a search product.

Where the money goes: pure throughput. Per-call cost is tiny; the bill is volume × $1/Mtok input × $5/Mtok output. The optimisations that matter here are not prompt-shaped — they're "do we even need to call the model on this row?" filtering at the application layer.

The leaks we keep finding

When we audit an incoming engagement (someone bringing us in to fix an existing Claude bill), the same five leaks show up:

  1. Missing prompt cache. A 12K-token system prompt sent uncached on every call. Trivial to fix, ~30–60% bill reduction depending on hit rate. (See our 71% bill-cut case study.)
  2. Wrong model on easy traffic. Opus or Sonnet running on traffic Haiku would handle in its sleep. We see this most on classification and routing layers.
  3. No max-tokens cap. Output drift to 4–8K tokens when the answer needed was 200. Setting a sensible max_tokens usually trims 10–20% off the bill.
  4. Re-sending the conversation history uncached. Every turn re-includes the full transcript with no caching strategy. This compounds fast on long sessions.
  5. No throttle on the trigger upstream. An app feature that fires the model on every keystroke, every page-view, every webhook — when a debounce or batch would do.

Across our audits, fixing the top three of these alone usually cuts bills by 40–70% with zero quality regression. We've never finished an audit without finding at least two.

The model-split cheat sheet

Asked at every kickoff: "what's the right split between Opus, Sonnet, and Haiku?" There isn't a universal answer. The shape we use as a starting point:

The mistake we see most: teams default everything to whatever model they tested with first. Usually Sonnet. They never split out the 40% of traffic Haiku could handle for ~5x cheaper.

Plug your own numbers in

Our LLM cost calculator lets you plug in volume, average input/output, model split, and cache hit rate to get to a bill estimate. If you run those numbers and your actual bill is more than ~25% above the estimate, you've got a leak. If it's below, you're probably under-provisioning something — that's a different conversation.

And if you want us to do the audit for you, the engagement is typically 1–2 weeks and pays back in the first month. Book a scoping call.

Think your Claude bill is too high? It almost certainly is.

BOOK A BILL AUDIT → RUN THE CALCULATOR →