AI API Pricing 2026: GPT-5, Claude 4 & Gemini 3 Cost Per Million Tokens

By Gia Gray · Updated June 2026 · 8 min read

The frontier moved again. The models people priced a year ago — GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 — are now legacy, retired, or shut down, and the current generation (OpenAI's GPT-5 line, Anthropic's Claude 4.x line, and Google's Gemini 3 line) has reshuffled the cost picture. This page is the single, current per-million-token reference I keep coming back to. Every number below was checked against the provider's official pricing page in June 2026; standard (non-batch) rates are shown, and for Gemini's tiered models the figure is the rate for prompts up to 200K tokens.

Current AI API pricing — all major models (per 1M tokens)

Model	Input / 1M	Output / 1M	Context
OpenAI
GPT-5.5	$5.00	$30.00	400K
GPT-5.4	$2.50	$15.00	400K
GPT-5.4 mini	$0.75	$4.50	400K
GPT-4o (legacy)	$2.50	$10.00	128K
GPT-4o mini (legacy)	$0.15	$0.60	128K
Anthropic
Claude Opus 4.8	$5.00	$25.00	1M
Claude Sonnet 4.6	$3.00	$15.00	1M
Claude Haiku 4.5	$1.00	$5.00	200K
Google
Gemini 3.1 Pro	$2.00	$12.00	1M
Gemini 3 Flash	$0.50	$3.00	1M
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M
Gemini 2.5 Pro	$1.25	$10.00	1M
Gemini 2.5 Flash	$0.30	$2.50	1M
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M

Sources: OpenAI, Anthropic, and Google pricing pages, June 2026. Always confirm before making production decisions — these rates change often.

Plug in your own token volumes and request rate to see the real monthly cost for any model.

Open the Cost Calculator →

The three flagship tiers, and what they cost

Premium reasoning. GPT-5.5 ($5/$30), Claude Opus 4.8 ($5/$25), and Gemini 3.1 Pro ($2/$12) sit at the top. Output is where these get expensive — GPT-5.5's $30/M output is six times its input rate, so an output-heavy workload on a premium model is the fastest way to a surprise bill. Reach for these only when the task genuinely needs frontier reasoning, coding, or analysis.

Workhorse. GPT-5.4 ($2.50/$15) and Claude Sonnet 4.6 ($3/$15) are the models most production apps should default to. They handle the large majority of real coding, writing, and analysis work at a fraction of premium output cost, and both carry large context windows (Sonnet 4.6 now includes the full 1M-token window at standard pricing).

Budget / high-volume. GPT-5.4 mini ($0.75/$4.50), Claude Haiku 4.5 ($1/$5), Gemini 3 Flash ($0.50/$3), and the Flash-Lite tiers are built for classification, routing, extraction, and anything you run at scale. Running a flagship on these tasks wastes 10–50× on cost for no quality gain.

The cheapest models right now

For pure cost-per-token, Google's Flash-Lite tier leads: Gemini 2.5 Flash-Lite at $0.10 input / $0.40 output is the lowest sticker price among current models, with Gemini 3.1 Flash-Lite ($0.25/$1.50) adding newer reasoning. OpenAI's legacy GPT-4o mini ($0.15/$0.60) is still callable and still cheap. But "cheapest per token" rarely equals "cheapest per result" — a smarter model that solves the task in one pass can beat a cheaper one that needs three attempts. Benchmark on your actual task before committing.

Watch the new cost driver: thinking tokens

The current generation leans heavily on reasoning ("thinking") tokens — internal tokens a model generates before its visible answer. On most current models these are billed at the output rate and can dominate the cost of a single request on hard problems. That shifts the metric that matters from cost per token to cost per correct answer: a pricier reasoning model that gets it right the first time can be cheaper end-to-end than a cheap model you have to re-run. Budget for thinking tokens explicitly on reasoning-heavy workloads.

Prompt caching still pays for itself

All three providers now price cached input at roughly 10% of the base input rate (a 90% discount on the cached portion). If your app re-sends a large, stable system prompt, document, or conversation history on every request, caching is the single biggest lever you have — often a 50–70% cut to total input cost with zero quality change. Batch APIs add another ~50% off for work that doesn't need a real-time response.

    Quick rule of thumb: default to a workhorse model (GPT-5.4 or Claude Sonnet 4.6), route simple/high-volume calls down to a Flash-Lite or mini tier, reserve premium models for genuinely hard reasoning, and turn on prompt caching the moment you have a repeated context.

What changed since 2024–2025

If you're updating an old cost model, the headline shifts: OpenAI's o-series and GPT-4-class flagships gave way to the GPT-5 line; Anthropic retired the Claude 3.x family in favor of Claude Opus/Sonnet/Haiku 4.x (with Sonnet holding the $3/$15 price point through the generations); and Google discontinued Gemini 1.5 and shut down Gemini 2.0 (June 1, 2026), replacing them with the Gemini 3 line while keeping the 2.5 tier live. Net effect: frontier capability keeps getting cheaper, but reasoning/thinking tokens make worst-case request cost harder to predict.

Compare GPT-5, Claude 4.x and Gemini 3 side by side for your exact usage.

Open the Calculator →