OpenAI's model lineup has changed enough times in the past year that I've seen teams accidentally running on deprecated models, paying for caching they never enabled, and defaulting to o1 for tasks that GPT-4o mini handles identically at 13× lower cost. The pricing structure is genuinely complex and the documentation doesn't make it easy to figure out what actually applies to your situation.
Here's what I actually use to think through which model to pick and what it'll cost — with real numbers instead of abstract pricing tables.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context window |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4o mini | $0.15 | $0.60 | 128K |
| o1 | $15.00 | $60.00 | 200K |
| o3-mini | $1.10 | $4.40 | 200K |
| GPT-3.5 Turbo | $0.50 | $1.50 | 16K |
Source: platform.openai.com/docs/pricing, May 2026. Always verify before production use.
GPT-4o is OpenAI's general-purpose workhorse — fast, capable, multimodal (text and vision), and at $2.50/$10.00 per million tokens, significantly cheaper than GPT-4 Turbo was when it launched. For most text-based production workloads, GPT-4o is the default sensible choice when you need high quality.
The output pricing ($10.00/M) is where the cost accumulates. If your application generates long responses — think document drafting, detailed code explanations, multi-step reasoning — GPT-4o gets expensive fast. A workload generating 500 tokens per response at 50,000 requests/month is already $250 in output costs alone.
GPT-4o is worth paying for when: response quality directly affects user experience, when you're doing complex reasoning or instruction-following, or when multimodal input (images, documents) is part of the pipeline.
GPT-4o mini is one of the best deals in AI APIs right now. At $0.15/$0.60 per million tokens — roughly 16× cheaper than GPT-4o — it handles the majority of straightforward tasks well: classification, extraction, summarization, basic Q&A, translation, simple code completion.
The common mistake is treating mini as "GPT-4o but worse." It's better framed as a different category — good enough for 80% of tasks, drastically cheaper for the ones where quality differences don't matter to users. Most teams that route aggressively to mini save 60–80% on their AI bill without meaningful user-facing impact.
The o-series models are fundamentally different from GPT-4o. They spend additional time "thinking" before responding — running internal chain-of-thought reasoning that isn't visible in the output but consumes significant tokens internally. This makes them much better at hard math, complex coding problems, and multi-step logic.
But that internal reasoning is expensive. o1 at $15/$60 per million tokens is 6× more expensive than GPT-4o on input and output. For tasks where o1 is dramatically better, that's worth it. For tasks where the quality difference is marginal, it's not.
o3-mini is the better entry point into reasoning models — $1.10/$4.40 per million tokens, much more accessible, and competitive with GPT-4o on many reasoning tasks. If you're evaluating whether reasoning models make sense for your use case, start with o3-mini.
OpenAI offers prompt caching on GPT-4o and GPT-4o mini. When you send the same prefix (like a long system prompt or document) across multiple requests, OpenAI caches it and charges 50% of the normal input rate for cache hits.
This matters a lot for applications with large, consistent system prompts. A 2,000-token system prompt on 100,000 requests/month is 200 million input tokens. Without caching: $500 on GPT-4o. With caching (assuming high hit rate): $250. Just from the system prompt.
Caching kicks in automatically for prompts that share a common prefix of at least 1,024 tokens. You don't need to do anything special — just structure your prompts so the stable parts come first.
OpenAI's Batch API processes requests asynchronously (within 24 hours) at half the normal price. If you have workloads that don't need immediate responses — bulk document processing, overnight analysis runs, generating content in bulk — the Batch API cuts your cost in half with no quality tradeoff.
Batch pricing on GPT-4o: $1.25 input / $5.00 output per million tokens. On GPT-4o mini: $0.075 / $0.30. Those are extremely low rates for capable models.
| Use case | Recommended model | Why |
|---|---|---|
| General chatbot, Q&A | GPT-4o mini | Fast, cheap, capable enough |
| Complex reasoning, code | GPT-4o or o3-mini | Quality matters here |
| Hard math, logic | o3-mini or o1 | Reasoning models built for this |
| Classification/extraction | GPT-4o mini | Overkill to use anything larger |
| Bulk async processing | GPT-4o mini (Batch API) | $0.075/M input, hard to beat |
| Vision / multimodal | GPT-4o | Mini supports vision too, try it first |
Calculate your real monthly OpenAI cost by entering your token estimates and request volume.
Open the Calculator →