AI API Pricing 2026: GPT-5, Claude 4 & Gemini 3 Cost Per Million Tokens

By Gia Gray · Updated June 2026 · 8 min read

The frontier moved again. The models people priced a year ago — GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 — are now legacy, retired, or shut down, and the current generation (OpenAI's GPT-5 line, Anthropic's Claude 4.x line, and Google's Gemini 3 line) has reshuffled the cost picture. This page is the single, current per-million-token reference I keep coming back to. Every number below was checked against the provider's official pricing page in June 2026; standard (non-batch) rates are shown, and for Gemini's tiered models the figure is the rate for prompts up to 200K tokens.

Current AI API pricing — all major models (per 1M tokens)

ModelInput / 1MOutput / 1MContext
OpenAI
GPT-5.5$5.00$30.00400K
GPT-5.4$2.50$15.00400K
GPT-5.4 mini$0.75$4.50400K
GPT-4o (legacy)$2.50$10.00128K
GPT-4o mini (legacy)$0.15$0.60128K
Anthropic
Claude Opus 4.8$5.00$25.001M
Claude Sonnet 4.6$3.00$15.001M
Claude Haiku 4.5$1.00$5.00200K
Google
Gemini 3.1 Pro$2.00$12.001M
Gemini 3 Flash$0.50$3.001M
Gemini 3.1 Flash-Lite$0.25$1.501M
Gemini 2.5 Pro$1.25$10.001M
Gemini 2.5 Flash$0.30$2.501M
Gemini 2.5 Flash-Lite$0.10$0.401M

Sources: OpenAI, Anthropic, and Google pricing pages, June 2026. Always confirm before making production decisions — these rates change often.

Plug in your own token volumes and request rate to see the real monthly cost for any model.

Open the Cost Calculator →

The three flagship tiers, and what they cost

Premium reasoning. GPT-5.5 ($5/$30), Claude Opus 4.8 ($5/$25), and Gemini 3.1 Pro ($2/$12) sit at the top. Output is where these get expensive — GPT-5.5's $30/M output is six times its input rate, so an output-heavy workload on a premium model is the fastest way to a surprise bill. Reach for these only when the task genuinely needs frontier reasoning, coding, or analysis.

Workhorse. GPT-5.4 ($2.50/$15) and Claude Sonnet 4.6 ($3/$15) are the models most production apps should default to. They handle the large majority of real coding, writing, and analysis work at a fraction of premium output cost, and both carry large context windows (Sonnet 4.6 now includes the full 1M-token window at standard pricing).

Budget / high-volume. GPT-5.4 mini ($0.75/$4.50), Claude Haiku 4.5 ($1/$5), Gemini 3 Flash ($0.50/$3), and the Flash-Lite tiers are built for classification, routing, extraction, and anything you run at scale. Running a flagship on these tasks wastes 10–50× on cost for no quality gain.

The cheapest models right now

For pure cost-per-token, Google's Flash-Lite tier leads: Gemini 2.5 Flash-Lite at $0.10 input / $0.40 output is the lowest sticker price among current models, with Gemini 3.1 Flash-Lite ($0.25/$1.50) adding newer reasoning. OpenAI's legacy GPT-4o mini ($0.15/$0.60) is still callable and still cheap. But "cheapest per token" rarely equals "cheapest per result" — a smarter model that solves the task in one pass can beat a cheaper one that needs three attempts. Benchmark on your actual task before committing.

Watch the new cost driver: thinking tokens

The current generation leans heavily on reasoning ("thinking") tokens — internal tokens a model generates before its visible answer. On most current models these are billed at the output rate and can dominate the cost of a single request on hard problems. That shifts the metric that matters from cost per token to cost per correct answer: a pricier reasoning model that gets it right the first time can be cheaper end-to-end than a cheap model you have to re-run. Budget for thinking tokens explicitly on reasoning-heavy workloads.

Prompt caching still pays for itself

All three providers now price cached input at roughly 10% of the base input rate (a 90% discount on the cached portion). If your app re-sends a large, stable system prompt, document, or conversation history on every request, caching is the single biggest lever you have — often a 50–70% cut to total input cost with zero quality change. Batch APIs add another ~50% off for work that doesn't need a real-time response.

Quick rule of thumb: default to a workhorse model (GPT-5.4 or Claude Sonnet 4.6), route simple/high-volume calls down to a Flash-Lite or mini tier, reserve premium models for genuinely hard reasoning, and turn on prompt caching the moment you have a repeated context.

What changed since 2024–2025

If you're updating an old cost model, the headline shifts: OpenAI's o-series and GPT-4-class flagships gave way to the GPT-5 line; Anthropic retired the Claude 3.x family in favor of Claude Opus/Sonnet/Haiku 4.x (with Sonnet holding the $3/$15 price point through the generations); and Google discontinued Gemini 1.5 and shut down Gemini 2.0 (June 1, 2026), replacing them with the Gemini 3 line while keeping the 2.5 tier live. Net effect: frontier capability keeps getting cheaper, but reasoning/thinking tokens make worst-case request cost harder to predict.

Compare GPT-5, Claude 4.x and Gemini 3 side by side for your exact usage.

Open the Calculator →