BUILD

Built a grounded outbound-personalisation agent: 200K sends/mo at $80 in API spend

B2B SaaS outbound team — 3 SDRs running ~200K total sends per month across rotating mailboxes. We built the personalisation engine that sits between Clay enrichment and Smartlead delivery. Reply rate up from 2.1% to 14%, total LLM API bill of $80 per month, zero hallucination flags in 60 days of production.

SHIPPED · 2026-02-28 SCOPE · 5 WEEKS STACK · CLAUDE SONNET 4.6 · CLAY · SMARTLEAD · POSTGRES

API SPEND / MONTH

$80

REPLY RATE

2.1% → 14%

HALLUCINATIONS (60d)

0

HUMAN REVIEW / DRAFT

~30s

The problem

Client is a B2B SaaS company with a 3-person SDR team running outbound across 30+ rotating mailboxes (~200K total sends per month). Their existing motion was a generic sequencer with mail-merge variables — sender domains were burning out every 4–6 weeks, reply rate was stuck at 2.1%, and the team was spending most of their day producing "personalised" first lines that all sounded the same.

They'd evaluated three off-the-shelf "AI SDR" tools. Each produced personalisation that looked plausible but referenced things that didn't exist (made-up product features, invented job changes, the wrong company entirely). Reply rates went up 2× on small samples, then collapsed once a hallucination embarrassed a senior buyer in the prospect's company. They wanted a system they could actually ship at full volume without a 3am incident.

The diagnosis

We sampled 800 historical prospects and audited what counted as a "good" personalised first line in their best-performing campaigns. Two patterns:

The job wasn't "make the model creative." It was "give the model grounded inputs and constrain its output."

What we shipped

Three layers, glued to their existing Clay → Smartlead pipeline:

What "working" looks like in the dashboards

The cost math

200K sends/month × ~$0.0014 per draft (selection + composition with cached system prompt) ≈ $280/month in raw token costs. Prompt caching brings the input portion down to about 30% of the uncached number, landing the actual monthly bill at ~$80.

The third-party data layer (Clay credits + Crunchbase API) is the bigger cost line — about $1,200/month at this volume — but the client was already paying for Clay regardless. Use the LLM cost calculator with cache rate set to 70% to reproduce the math at any volume.

What we evaluated continuously

What we'd do differently

  1. Build the evidence-pack inspector first. When the SDRs occasionally rejected a draft, we couldn't easily see which evidence the system had picked from. We built that view in week 4 — should have been week 1. Made the prompt-tuning cycle 5× faster once it existed.
  2. Cap the evidence-pack age tighter. "Last 6 months" let in some stale facts that were technically true but felt outdated. We tightened to 90 days mid-engagement and reply rate ticked up another 1.5 points.
  3. Ship the SDR review UI as a standalone tool. We embedded it inside Smartlead via an iframe initially. The SDRs hated the round-trip. A dedicated lightweight UI took an extra 3 days but cut review time per draft from ~50s to ~30s.

Stack

Related

Tried "AI SDR" tools and got burned? The architecture that actually works.

SCOPE A BUILD → RUN THE NUMBERS →