Every major AI API ranked by cost per token. Updated June 2026 with the latest pricing from OpenAI, Anthropic, Google, Meta, and Mistral.
Prices are per million tokens (1M tokens ≈ 750,000 words). Sorted by input token cost, lowest first.
| # | Model | Provider | Input /1M | Output /1M | Best For |
|---|---|---|---|---|---|
| 1 | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | Ultra budget | |
| 2 | GPT-4o mini | OpenAI | $0.15 | $0.60 | High volume |
| 2 | Gemini 2.5 Flash | $0.15 | $0.60 | Long context | |
| 3 | Llama 3.1 8B | Meta (hosted) | $0.18 | $0.18 | Simple tasks |
| 4 | Mistral 7B | Mistral | $0.25 | $0.25 | EU data |
| 5 | Claude 3.5 Haiku | Anthropic | $0.80 | $4.00 | Quality budget |
| 6 | Llama 3.1 70B | Meta (hosted) | $0.90 | $0.90 | Open source |
| 7 | Gemini 2.5 Pro | $2.00 | $12.00 | Premium tasks | |
| 8 | GPT-4o | OpenAI | $2.50 | $10.00 | Balanced |
| 9 | Claude 3.5 Sonnet | Anthropic | $3.00 | $15.00 | Complex tasks |
| 10 | Claude 3 Opus | Anthropic | $15.00 | $75.00 | Max quality |
Google's most affordable model in 2026. Flash-Lite is designed for simple, repetitive tasks at extreme scale. It handles classification, extraction, summarization, and basic Q&A well. Not suitable for complex multi-step reasoning or nuanced writing.
OpenAI's budget model punches well above its price point. GPT-4o mini delivers strong performance on structured tasks, follows instructions reliably, and integrates seamlessly with OpenAI's tools ecosystem (function calling, Assistants API, fine-tuning).
Same price as GPT-4o mini but with a massive 1M token context window — making it the best budget choice for long-document processing. Summarizing a 200-page PDF costs less than $0.05 in input tokens.
Assume a customer support chatbot processing 1 million requests/month with 500 input + 200 output tokens each:
| Model | Input Cost | Output Cost | Monthly Total | vs GPT-4o |
|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $50 | $80 | $130 | -94% |
| GPT-4o mini | $75 | $120 | $195 | -91% |
| Claude Haiku 3.5 | $400 | $800 | $1,200 | -44% |
| GPT-4o | $1,250 | $2,000 | $3,250 | — |
| Claude 3.5 Sonnet | $1,500 | $3,000 | $4,500 | +38% |
Meta's Llama and Mistral's open-weight models can be significantly cheaper when self-hosted, but running your own infrastructure adds complexity and fixed costs. Through hosted providers, the pricing above is competitive with the cheapest closed-source options.
Self-hosting makes sense if you process more than 50M tokens/month and have engineering capacity. Below that threshold, hosted APIs are simpler and often cheaper when you factor in GPU costs and maintenance.
Use our free calculator to estimate exactly what your workload will cost across all major models.
Calculate My AI API Cost →