[ TOOLS / LLM API ]

LLM API pricing. Every current model, verified $/1M tokens, your volume.

Twenty-five frontier and workhorse LLMs priced live from Anthropic, OpenAI, Google, xAI, DeepSeek, Alibaba, Mistral, Cohere, Meta, and Perplexity. Set your real monthly token volume — input, output, cached share — and see the actual bill, ranked cheapest first.

UPDATED · 2026-04-25 RATES · VERIFIED FROM EACH PROVIDER FREE · NO SIGN-UP

Compare LLM API costs

INTERACTIVE · 27 MODELS · VOLUME-BASED

INPUT TOKENS / MONTH 50M tokens

OUTPUT TOKENS / MONTH 5M tokens

CACHED INPUT RATIO 0%

TIER FILTER

USE CASE

Tier breakdown. FRONTIER = top-of-line ($5+ input or premium reasoning lane: Opus 4.7, GPT-5.5 Pro, Gemini 3.1 Pro 200K+, Grok 4). WORKHORSE = balanced ($1–$5 input: Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro, Mistral Large 3, Command A, Sonar Pro). FAST-CHEAP = sub-$1 input lanes (Haiku 4.5, GPT-5.5 Mini/Nano, Flash, Flash Lite, Grok 4 Fast, DeepSeek V4 Flash, Qwen Plus, Llama 4, Mistral Small).

Cached input ratio models the share of input tokens served from prompt cache. Cache hits cost roughly 10% of regular input on most providers (Anthropic 90% off after 5-min TTL, OpenAI 90% off, Google 90% off, DeepSeek priced separately). Default 0% — turn it up if you reuse system prompts or long context across calls.

CHEAPEST AT YOUR VOLUME

—

FRONTIER PICK IN FILTER

—

MODEL VENDOR MONTHLY IN $/M OUT $/M CTX CAVEAT

Methodology. Prices pulled 2026-04-25 from each vendor's public API pricing page — platform.claude.com, openai.com/api/pricing, ai.google.dev, docs.x.ai, api-docs.deepseek.com, alibabacloud.com, mistral.ai/pricing, cohere.com, together.ai, and docs.perplexity.ai. Every model uses its latest released version as of April 2026 — Claude Opus 4.7 (1M ctx at standard rate, no long-context premium), Sonnet 4.6, Haiku 4.5, GPT-5.5 and 5.5 Pro (released 2026-04-23, rolling out alongside GPT-5.4 family which is still live), Gemini 3.1 Pro / Flash / Flash-Lite, Grok 4 and Grok 4.1 Fast, DeepSeek V4 Pro and V4 Flash, Qwen3 Max / Qwen3.6 Plus / Qwen3 Turbo, Mistral Large 3 / Medium 3 / Small 3.1, Cohere Command A / R+, Llama 4 Maverick / Scout via Together AI, Perplexity Sonar Pro / Sonar / Sonar Deep Research. Reasoning models (Opus 4.7, GPT-5.5 Pro, DeepSeek V4 Pro thinking, Grok 4, Gemini 3.1 Pro thinking) charge thinking tokens at the output rate — extended reasoning can multiply real spend 3–10x on the same prompt; budget accordingly. Gemini 3.1 Pro and Grok 4 Fast price differently above 200K context; rates shown are the standard tier and noted in the caveat. This is the per-API rate. Context-caching strategy, batch discounts (50% across most providers), and provisioned throughput will move the real bill substantially. Prices change frequently — always re-verify before committing.

Sizing a build? We can scope the whole thing.

BOOK A CALL → SEE TOOL REVIEWS →

LLM API pricing. Every current model, verified $/1M tokens, your volume.

Compare LLM API costs

Claude

ChatGPT

Gemini

Perplexity

Sizing a build? We can scope the whole thing.