The frontier moved again. The models people priced a year ago — GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 — are now legacy, retired, or shut down, and the current generation (OpenAI's GPT-5 line, Anthropic's Claude 4.x line, and Google's Gemini 3 line) has reshuffled the cost picture. This page is the single, current per-million-token reference I keep coming back to. Every number below was checked against the provider's official pricing page in June 2026; standard (non-batch) rates are shown, and for Gemini's tiered models the figure is the rate for prompts up to 200K tokens.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| OpenAI | |||
| GPT-5.5 | $5.00 | $30.00 | 400K |
| GPT-5.4 | $2.50 | $15.00 | 400K |
| GPT-5.4 mini | $0.75 | $4.50 | 400K |
| GPT-4o (legacy) | $2.50 | $10.00 | 128K |
| GPT-4o mini (legacy) | $0.15 | $0.60 | 128K |
| Anthropic | |||
| Claude Opus 4.8 | $5.00 | $25.00 | 1M |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M |
| Gemini 3 Flash | $0.50 | $3.00 | 1M |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M |
Sources: OpenAI, Anthropic, and Google pricing pages, June 2026. Always confirm before making production decisions — these rates change often.
Plug in your own token volumes and request rate to see the real monthly cost for any model.
Open the Cost Calculator →Premium reasoning. GPT-5.5 ($5/$30), Claude Opus 4.8 ($5/$25), and Gemini 3.1 Pro ($2/$12) sit at the top. Output is where these get expensive — GPT-5.5's $30/M output is six times its input rate, so an output-heavy workload on a premium model is the fastest way to a surprise bill. Reach for these only when the task genuinely needs frontier reasoning, coding, or analysis.
Workhorse. GPT-5.4 ($2.50/$15) and Claude Sonnet 4.6 ($3/$15) are the models most production apps should default to. They handle the large majority of real coding, writing, and analysis work at a fraction of premium output cost, and both carry large context windows (Sonnet 4.6 now includes the full 1M-token window at standard pricing).
Budget / high-volume. GPT-5.4 mini ($0.75/$4.50), Claude Haiku 4.5 ($1/$5), Gemini 3 Flash ($0.50/$3), and the Flash-Lite tiers are built for classification, routing, extraction, and anything you run at scale. Running a flagship on these tasks wastes 10–50× on cost for no quality gain.
For pure cost-per-token, Google's Flash-Lite tier leads: Gemini 2.5 Flash-Lite at $0.10 input / $0.40 output is the lowest sticker price among current models, with Gemini 3.1 Flash-Lite ($0.25/$1.50) adding newer reasoning. OpenAI's legacy GPT-4o mini ($0.15/$0.60) is still callable and still cheap. But "cheapest per token" rarely equals "cheapest per result" — a smarter model that solves the task in one pass can beat a cheaper one that needs three attempts. Benchmark on your actual task before committing.
The current generation leans heavily on reasoning ("thinking") tokens — internal tokens a model generates before its visible answer. On most current models these are billed at the output rate and can dominate the cost of a single request on hard problems. That shifts the metric that matters from cost per token to cost per correct answer: a pricier reasoning model that gets it right the first time can be cheaper end-to-end than a cheap model you have to re-run. Budget for thinking tokens explicitly on reasoning-heavy workloads.
All three providers now price cached input at roughly 10% of the base input rate (a 90% discount on the cached portion). If your app re-sends a large, stable system prompt, document, or conversation history on every request, caching is the single biggest lever you have — often a 50–70% cut to total input cost with zero quality change. Batch APIs add another ~50% off for work that doesn't need a real-time response.
If you're updating an old cost model, the headline shifts: OpenAI's o-series and GPT-4-class flagships gave way to the GPT-5 line; Anthropic retired the Claude 3.x family in favor of Claude Opus/Sonnet/Haiku 4.x (with Sonnet holding the $3/$15 price point through the generations); and Google discontinued Gemini 1.5 and shut down Gemini 2.0 (June 1, 2026), replacing them with the Gemini 3 line while keeping the 2.5 tier live. Net effect: frontier capability keeps getting cheaper, but reasoning/thinking tokens make worst-case request cost harder to predict.
Compare GPT-5, Claude 4.x and Gemini 3 side by side for your exact usage.
Open the Calculator →