Transcription API pricing. Every STT provider, verified $/hour.
Thirty-plus speech-to-text models priced live from Deepgram, AssemblyAI, OpenAI, Groq, Speechmatics, Gladia, Rev.ai, and the AWS / Google / Azure direct APIs. Pick the cheapest model that hits your accuracy and latency bar — at your real monthly hours of audio.
Compare STT API costs
INTERACTIVE · 30+ MODELS · VOLUME-BASEDMode breakdown. BATCH = pre-recorded / async file uploads, cheaper per hour. REALTIME = streaming WebSocket transcription, priced 30–60% higher on most providers. Feature filter narrows to models that ship speaker diarization, word-level timestamps, or 30+ language support in the base price (no add-on fee).
Accuracy pick. Deepgram Nova-3 and AssemblyAI Universal-2 lead public WER benchmarks for English in 2026. Speechmatics Enhanced edges them on heavy accents and code-switching. Whisper Large v3 (Groq / Fireworks / DeepInfra) is the cheapest serious model but lacks native diarization.
Methodology. Prices pulled 2026-04-26 from platform.openai.com, deepgram.com, assemblyai.com, speechmatics.com, gladia.io, rev.ai, cloud.google.com, aws.amazon.com, azure.microsoft.com, groq.com, fireworks.ai, and deepinfra.com. Every model uses its latest released version — Nova-3 (not Nova-2), Universal-2 (not Universal-1), gpt-4o-transcribe and gpt-4o-mini-transcribe (not the legacy whisper-1 alone), Whisper Large v3 Turbo (not v2). Per-hour rates are the primary unit; vendors that publish per-minute ($0.006/min) or per-second ($0.0001/sec) prices are converted (×60 or ×3600). Diarization, word-level timestamps, language detection, and PII redaction are sometimes priced separately — we note this per row. Realtime / streaming is priced 20–60% higher than batch on Deepgram, AssemblyAI, Azure, Speechmatics, and Fireworks; Google bundles streaming with batch at the same Standard rate. Volume tiers, annual commitments, and self-host containers can drop these prices 50–80% — list prices shown. Cheapest is not best. A $0.012/hr DeepInfra Whisper job and a $1.04/hr Speechmatics Enhanced job are not the same transcript; use the feature filter, then pick on accuracy.