The three failure modes
1. Generic outreach at scale
The majority of "AI SDR" tools on the market are sequence runners with an LLM stitched in to vary the wording. The contact gets a mail-merge email that sounds slightly different than yesterday's mail-merge email. Reply rates collapse fast — recipients can smell the volume, and so can spam filters. After three months the sender domain is in the bin.
2. Hallucinated personalisation
The fancier tools attempt per-prospect personalisation by feeding the model a LinkedIn profile or company website and asking for "an observation." Without grounding, the model invents things — it compliments a feature that doesn't exist, references a project the person didn't work on, or fabricates company context. One bad hallucination per 50 sends destroys the campaign and burns the list.
3. No human escape valve
The agent treats every reply as a sales objection to handle. When a prospect asks a substantive question, the model responds with "great question, are you free for a 15-minute call?" — which is what kills the conversation. The good intent dies because there's no way for the agent to know it should hand off.
What we build instead
Architecture
Three layers, plus humans:
- Targeting layer. An ICP-aware enrichment pipeline (we use Clay + a custom Claude waterfall) that produces a tight list of accounts and named contacts. Lower volume than typical outbound — ~200 per SDR-week, not 5,000.
- Research layer. Per-prospect, the system pulls grounded context from real sources: recent press releases, the prospect's LinkedIn posts (if accessible), the company's funding history, public job postings. Outputs a 3-bullet "observation pack" with citations.
- Outreach layer. An LLM (we default to Claude Sonnet) drafts the first email using the observation pack as the only personalisation source. The draft goes to a human SDR for ≤30 seconds of review before sending — they're checking for cringe, not editing.
- Reply handling. Replies route by intent: positive replies hit a human immediately, neutral replies get a sequenced follow-up, negative replies trigger a polite unsubscribe. The model never tries to handle a substantive reply alone.
What "working" looks like
- 200 sends/SDR/week (down from 1,000+ on a generic stack).
- ~12-18% reply rate (up from 1-3% on generic tools).
- ~25% of replies become a meeting (up from ~10%).
- Net: roughly the same number of meetings as the high-volume motion, with ~5x lower send volume — meaning the sender domain doesn't get burned, and the reply quality is dramatically higher.
What it costs
For one full-time SDR equivalent worth of pipeline:
- Build cost (one-time): $25-40K to scope, build, and ship the system end-to-end. 4-6 weeks of work.
- Run cost (monthly): ~$2,500/mo across LLM API, enrichment data (Clay/Apollo), email infrastructure (Smartlead), CRM seat costs.
- Human cost: Still need a human reviewer — 0.5-1.0 FTE per system. The system multiplies their output, doesn't replace them.
Common asks we say no to
- "Can it just send without human review?" — No. The 30 seconds of human review is the difference between this working and being spam.
- "Can we do 5,000/week instead of 200?" — Sure, but you'll be back here in two months asking why the reply rate collapsed.
- "Can the agent handle the entire conversation?" — No. The agent owns the open and the disqualify; humans own anything substantive.
Cross-references
- AI SDR vs human cost calculator — model the unit economics of either approach.
- Clay review — the targeting + enrichment layer.
- Smartlead review — the email infrastructure layer.
- Lemlist vs Smartlead — picking the right outbound platform.