PLAYBOOK

AI customer support: the 80/20

A decade ago we'd have written "the 80/20 of AI for customer support." Today, after enough engagements, the framing is the reverse: 80% of what teams ask for is the wrong scope, and the 20% that works isn't what they pitched. Here's what we deploy and what we explicitly leave out.

READ · 9 MIN UPDATED · 2026-04-20 BY · PINTOED AI STUDIO

The brief we usually get

"We want an AI agent that handles all our customer service." Some version of that opens 90% of CS engagement conversations. The buyer has read about deflection rates from a vendor and wants the same number on their dashboard.

The brief is wrong in three ways. The first is that "all" is a goal that wastes 80% of the build budget on the long tail. The second is that "agent" presumes a chatbot UI when triage and routing win bigger than chat. The third is that "customer service" is three different jobs (deflection, drafting, routing) that an agent will badly fuse if you let it.

What we deploy: three components

The shape we install on essentially every CS engagement, in this order:

1. Triage classifier (deploys in week 1)

The first thing we ship is not an agent. It's a Haiku-powered classifier that runs over every incoming ticket and assigns: category, urgency, customer-tier, and a "this is answerable from our help docs" flag. Output goes into the existing CS tool's custom fields.

That alone gets the human team a ~25–35% speedup. They can sort, filter, and prioritise the queue better. No customer ever sees the AI. No risk of the AI saying something wrong. Easy win, fast ROI, foundation for the next two components.

2. Drafting assistant (deploys in week 2-3)

For the tickets a human is going to answer, we drop a "suggested reply" into their workflow. The agent has read the ticket, found the relevant docs and recent similar tickets, and drafted a reply in the team's voice. The human reviews, edits, sends.

Lift here is real and consistent: ~30–50% reduction in average handle time per ticket. Crucially, the human is still in the loop, and they're using the AI as a writing partner, not a decision-maker. Quality stays high. Errors are caught before they reach the customer.

3. Auto-deflection (deploys in week 4-6)

Only after the first two are running do we add the auto-reply layer — and only for the narrow set of tickets where (a) the classifier is highly confident, (b) the docs cleanly answer the question, and (c) the resolution doesn't touch a refund, a payment, or a sensitive-account action.

This is the deflection number the buyer was originally asking about. With the right scope it lands at 40–65% of total ticket volume. The remaining 35–60% — including everything spicy or ambiguous — flows to humans, augmented by component 2.

The 20% nobody asks for

The single most valuable thing we deliver, that no one ever puts in the brief: a feedback loop from the human edits back into the AI's draft prompt. Every time a human edits the AI's suggested reply, we capture (a) the original draft, (b) the sent reply, and (c) the diff. Twice a week we feed that to a Sonnet-powered analyser that proposes prompt improvements.

This is what keeps the system getting better. Without it, the drafting assistant locks in at "decent" and slowly degrades as the product changes. With it, quality compounds. We've measured handle-time reductions accelerating in months 3–6 specifically because of this loop.

What we explicitly don't ship

The numbers we see

Across the last six CS engagements we've shipped, blended outcomes after 90 days of running all three components:

The CSAT result is the one buyers always pre-worry about. Our take: with the architecture above, CSAT is roughly flat. With an "AI agent handles everything" architecture, CSAT drops by 5-10 points and stays there. The architecture is the difference. (For the anti-pattern in detail, see our When NOT to build with AI piece — engagement #1 is exactly this trap.)

What this stack costs

Build time: 4–6 weeks for a mid-market team (50–500K tickets/yr). Ongoing AI cost: ~$0.04–0.12 per ticket processed across all three components, depending on ticket length. For a team handling 30K tickets/mo, that's $1,200–$3,600/mo in model spend, against whatever fraction of FTE time you're freeing up.

The math has not been close on a single engagement we've shipped. Payback is months, not quarters. The reason for the post is: the shape we've described is not what most teams pitch themselves first. Picking the right scope is worth more than picking the right vendor.

Want this stack on your CS pipeline? 4-6 week build, payback in months.

BOOK A SCOPING CALL → SEE SERVICES →