TTS API pricing. Every voice provider, verified $/1M characters.
Sixteen text-to-speech models priced live from ElevenLabs, OpenAI, Cartesia, Hume, Inworld, Deepgram, AWS, Azure, Google, PlayAI, and Resemble. Pick the cheapest voice that hits your quality bar — at your real monthly character count.
Compare TTS API costs
INTERACTIVE · 16 MODELS · VOLUME-BASEDTier breakdown. PREMIUM = ElevenLabs Multilingual, Cartesia Sonic 3, Hume Octave 2, OpenAI tts-1-hd, Azure HD V2, Polly Generative, GCP Studio. STANDARD = Polly Neural, GCP WaveNet, Deepgram Aura-2, OpenAI tts-1, Azure Neural, ElevenLabs Flash, PlayAI Dialog. BUDGET = Inworld Mini / Max, gpt-4o-mini-tts, Polly Standard, GCP Standard.
Volume reference. 1M characters ≈ 1,500 minutes (25 hours) of synthesized speech at a standard 150 WPM / ~6.7 chars per word cadence. Audiobook of 80k words ≈ 0.55M chars. A voice agent doing 50k calls/month with 200 chars per response ≈ 10M chars.
Methodology. Prices pulled 2026-04-26 from elevenlabs.io, platform.openai.com, cartesia.ai, hume.ai, inworld.ai, deepgram.com, aws.amazon.com/polly, azure.microsoft.com, cloud.google.com, play.ht, and resemble.ai. The primary unit is USD per 1 million characters of synthesized text. Where vendors publish $/1k chars (ElevenLabs, Deepgram, Cartesia) we multiply by 1,000. Where they publish per-second or per-minute audio (Resemble, OpenAI gpt-4o-mini-tts), we convert at ~1,500 chars/min — your mileage varies by speaking rate. Voice cloning and real-time/streaming filters narrow the list separately because not every model supports both. The OpenAI gpt-4o-realtime line is priced per audio token ($200/M output tokens ≈ $0.24/min) — a different unit; we approximate it at the equivalent character rate but flag it in the caveat. Every model uses its latest released version: ElevenLabs v3 (not v2), Cartesia Sonic 3 (not Sonic-2 or Sonic Turbo, both deprecated), Hume Octave 2 (not Octave 1), Deepgram Aura-2 (not Aura), Inworld TTS-1.5 (not 1.0), AWS Polly Generative (not Long-Form), Azure HD V2 / Neural (not legacy Standard), Google Chirp 3 HD (not Studio-only). Cheapest is not best. Polly Standard at $4/M chars and ElevenLabs v3 at $100/M chars are not the same voice; use the quality filter to narrow first, then pick on listening tests. Prices change frequently — always reverify with the vendor before committing to volume.