Claude API Pricing Breakdown (2026)

By Gia Gray · Updated June 2026 · 7 min read

Anthropic retired the Claude 3.x family — the current line is Claude Opus 4.8, Sonnet 4.6, and Haiku 4.5. After running Claude across a few production contexts, here's where I've landed: the instruction-following is more reliable than the comparable GPT tier for anything that needs structured output, the context window is now a full 1M tokens on Opus and Sonnet, and the prompt caching discount — 90% off cached tokens — is still the most aggressive in the industry and not talked about enough.

That said, output pricing is where Claude costs add up: Sonnet 4.6's $15/M output matches GPT-5.4, and Opus 4.8's $25/M is real money on output-heavy workloads. Whether Claude ends up cheaper than the GPT-5 tier for your use case depends mostly on one thing: how much repeated context you're sending. Here's the breakdown.

Claude Model Lineup and Current Pricing

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context window
Claude Opus 4.8	$5.00	$25.00	1M
Claude Sonnet 4.6	$3.00	$15.00	1M
Claude Haiku 4.5	$1.00	$5.00	200K

Source: platform.claude.com/docs pricing, June 2026. Claude 3.5 Sonnet, 3.5 Haiku, 3 Haiku and 3 Opus are now retired or deprecated. For the full cross-provider table, see the 2026 AI API pricing guide.

Claude Sonnet 4.6: The Workhorse

Claude Sonnet 4.6 is the model most teams should default to, priced at $3.00/$15.00 per million tokens — marginally above GPT-5.4 on input and level on output. Where it consistently shines in practice: complex instruction-following, long-document reasoning, and careful adherence to structured output formats. Teams building on Sonnet tend to need fewer retries and less prompt engineering to get consistent structured outputs.

The headline upgrade this generation is context: Sonnet 4.6 now includes the full 1M-token window at standard pricing, so entire contracts, technical manuals, or codebases fit in a single request without chunking.

Claude Haiku 4.5: The Value Model Worth Paying Attention To

Claude Haiku 4.5 at $1.00/$5.00 per million tokens is where things get interesting. It's more expensive than the cheapest budget models (Gemini Flash-Lite, legacy GPT-4o mini) but delivers significantly higher quality — closer to Sonnet than the usual gap between a flagship and its mini tier.

If your workload needs better instruction-following or structured-output quality than a bottom-tier model delivers, but you can't justify full Sonnet pricing, Haiku 4.5 is worth evaluating seriously. It's especially strong at coding tasks relative to its price point.

Claude Opus 4.8: Frontier Reasoning, Now Much Cheaper

Opus used to be a luxury — Claude 3 Opus launched at $15/$75 per million tokens. Opus 4.8 lands at $5/$25, a dramatic drop that makes top-tier reasoning far more practical. It's the one to reach for on genuinely hard reasoning, coding, and analysis; for the other 90% of production work, start with Sonnet 4.6 and only escalate when quality demands it.

Anthropic's Prompt Caching: 90% Off Cached Input

This is Anthropic's biggest competitive pricing advantage and it's often overlooked. A cache hit costs just 10% of the standard input token rate — a 90% discount on cached tokens. (OpenAI and Google now price cache reads at roughly the same 10%, so the gap has narrowed, but Claude's caching remains best-in-class in practice.)

For applications with large, consistent system prompts or document contexts that repeat across requests, this is a massive lever. A 3,000-token system prompt on Sonnet 4.6 costs $0.30 per million tokens on cache hits vs $3.00 uncached.

Concrete example: RAG application with a large context

Imagine you're building a RAG system that sends a 4,000-token document context plus a 500-token system prompt with each request. At 30,000 requests/month on Claude Sonnet 4.6 ($3/$15, unchanged from the previous Sonnet generation):

Scenario	Input cost	Output cost (300 tokens avg)	Monthly total
No caching	$405.00	$135.00	$540.00
With caching (system prompt + doc)	$57.00	$135.00	$192.00

That's roughly a 65% reduction just from caching, with no quality change. This is one of the most impactful optimizations available for high-volume Claude applications.

When Claude Is Cheaper Than GPT-5

The headline comparison — Sonnet 4.6 at $3/$15 vs GPT-5.4 at $2.50/$15 — makes Claude look slightly pricier on input and level on output. But with caching, that can reverse:

Applications with long, stable system prompts (caching cuts the cached input cost to 10%)
Document Q&A where the same document context repeats across sessions
High-volume coding assistants where language context and coding guidelines repeat

For workloads with minimal repeated context and short prompts, GPT-5.4 mini or a Gemini Flash tier is usually cheaper. For workloads with large reusable context, Claude's caching can make it the better deal.

Compare Claude 4.x vs GPT-5 vs Gemini 3 for your exact token volumes and request rate.

Open the Calculator →