OpenAI API Pricing Explained (2026)

By Gia Gray · Updated June 2026 · 8 min read

OpenAI's model lineup has changed enough times in the past year that I've seen teams accidentally running on deprecated models, paying for caching they never enabled, and defaulting to o1 for tasks that GPT-4o mini handles identically at 13× lower cost. The pricing structure is genuinely complex and the documentation doesn't make it easy to figure out what actually applies to your situation.

Here's what I actually use to think through which model to pick and what it'll cost — with real numbers instead of abstract pricing tables.

OpenAI's Current Model Lineup and Pricing

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context window
GPT-4o	$2.50	$10.00	128K
GPT-4o mini	$0.15	$0.60	128K
o1	$15.00	$60.00	200K
o3-mini	$1.10	$4.40	200K
GPT-3.5 Turbo	$0.50	$1.50	16K

Source: platform.openai.com/docs/pricing, May 2026. Always verify before production use.

GPT-4o: The Flagship for Most Production Use Cases

GPT-4o is OpenAI's general-purpose workhorse — fast, capable, multimodal (text and vision), and at $2.50/$10.00 per million tokens, significantly cheaper than GPT-4 Turbo was when it launched. For most text-based production workloads, GPT-4o is the default sensible choice when you need high quality.

The output pricing ($10.00/M) is where the cost accumulates. If your application generates long responses — think document drafting, detailed code explanations, multi-step reasoning — GPT-4o gets expensive fast. A workload generating 500 tokens per response at 50,000 requests/month is already $250 in output costs alone.

GPT-4o is worth paying for when: response quality directly affects user experience, when you're doing complex reasoning or instruction-following, or when multimodal input (images, documents) is part of the pipeline.

GPT-4o Mini: The Right Default for Most High-Volume Use Cases

GPT-4o mini is one of the best deals in AI APIs right now. At $0.15/$0.60 per million tokens — roughly 16× cheaper than GPT-4o — it handles the majority of straightforward tasks well: classification, extraction, summarization, basic Q&A, translation, simple code completion.

The common mistake is treating mini as "GPT-4o but worse." It's better framed as a different category — good enough for 80% of tasks, drastically cheaper for the ones where quality differences don't matter to users. Most teams that route aggressively to mini save 60–80% on their AI bill without meaningful user-facing impact.

    Strategy that works: Start with GPT-4o for everything. After a few weeks in production, identify the request types where mini performs acceptably in your evals. Route those to mini. This tiered approach typically cuts costs by 40–70% without a quality regression.
  

The Reasoning Models: o1 and o3-mini

The o-series models are fundamentally different from GPT-4o. They spend additional time "thinking" before responding — running internal chain-of-thought reasoning that isn't visible in the output but consumes significant tokens internally. This makes them much better at hard math, complex coding problems, and multi-step logic.

But that internal reasoning is expensive. o1 at $15/$60 per million tokens is 6× more expensive than GPT-4o on input and output. For tasks where o1 is dramatically better, that's worth it. For tasks where the quality difference is marginal, it's not.

o3-mini is the better entry point into reasoning models — $1.10/$4.40 per million tokens, much more accessible, and competitive with GPT-4o on many reasoning tasks. If you're evaluating whether reasoning models make sense for your use case, start with o3-mini.

Prompt Caching: 50% Off Repeated Input

OpenAI offers prompt caching on GPT-4o and GPT-4o mini. When you send the same prefix (like a long system prompt or document) across multiple requests, OpenAI caches it and charges 50% of the normal input rate for cache hits.

This matters a lot for applications with large, consistent system prompts. A 2,000-token system prompt on 100,000 requests/month is 200 million input tokens. Without caching: $500 on GPT-4o. With caching (assuming high hit rate): $250. Just from the system prompt.

Caching kicks in automatically for prompts that share a common prefix of at least 1,024 tokens. You don't need to do anything special — just structure your prompts so the stable parts come first.

Batch API: 50% Off Non-Realtime Requests

OpenAI's Batch API processes requests asynchronously (within 24 hours) at half the normal price. If you have workloads that don't need immediate responses — bulk document processing, overnight analysis runs, generating content in bulk — the Batch API cuts your cost in half with no quality tradeoff.

Batch pricing on GPT-4o: $1.25 input / $5.00 output per million tokens. On GPT-4o mini: $0.075 / $0.30. Those are extremely low rates for capable models.

What Model Should You Use?

Use case	Recommended model	Why
General chatbot, Q&A	GPT-4o mini	Fast, cheap, capable enough
Complex reasoning, code	GPT-4o or o3-mini	Quality matters here
Hard math, logic	o3-mini or o1	Reasoning models built for this
Classification/extraction	GPT-4o mini	Overkill to use anything larger
Bulk async processing	GPT-4o mini (Batch API)	$0.075/M input, hard to beat
Vision / multimodal	GPT-4o	Mini supports vision too, try it first

Calculate your real monthly OpenAI cost by entering your token estimates and request volume.

Open the Calculator →