Cost Per Million Tokens Explained

By Gia Gray · Updated June 2026 · 7 min read

I built this calculator partly because I kept having the same conversation. Someone asks how much their AI project will cost, I say "it depends on your tokens," and they look at me like I said something in another language. Which is fair — tokens are genuinely confusing if you haven't encountered them before, and "per million" pricing makes the numbers feel fake until they show up on your invoice.

I've watched people dramatically underprice what they're building because they estimated in "words" instead of tokens. I've also seen teams pick the wrong model because they couldn't translate the pricing page into what a single request actually costs them. This guide fixes both of those problems.

What Is a Token?

A token is the basic unit of text that language models work with. It's not a word, and it's not a character — it's somewhere in between. As a rough rule of thumb:

1 token ≈ 4 characters of English text
1 token ≈ 0.75 words
100 tokens ≈ 75 words ≈ a short paragraph
1,000 tokens ≈ 750 words ≈ about 1.5 pages of text

Tokenization varies by model. OpenAI uses its own tokenizer (tiktoken), Anthropic uses a different one for Claude, and so on. The differences are small enough that the "~0.75 words per token" rule holds reasonably well across providers for standard English text.

Non-English languages often require more tokens per word. Code can be more or less token-efficient depending on the language. JSON and XML tend to be token-heavy relative to the information they contain.

Why Input and Output Are Priced Separately

Every AI API request has two sides: what you send in (input tokens) and what the model sends back (output tokens). These are priced differently because they involve fundamentally different amounts of computation.

Input tokens are processed in parallel — the model reads your entire prompt at once. It's computationally intensive but relatively fast and efficient.

Output tokens are generated one at a time, sequentially. Each token requires a full forward pass through the model to predict the next one. This is significantly more expensive per token than reading input.

That's why output tokens typically cost 3–5× more than input tokens across providers. If you see a model priced at "$2.50 / $10.00 per million tokens," that means $2.50 per million input and $10.00 per million output.

Current Pricing for Major Models (May 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Output/Input ratio
GPT-4o	$2.50	$10.00	4×
GPT-4o mini	$0.15	$0.60	4×
Claude 3.5 Sonnet	$3.00	$15.00	5×
Claude 3.5 Haiku	$0.80	$4.00	5×
Gemini 2.0 Flash	$0.10	$0.40	4×
o3-mini	$1.10	$4.40	4×

Verify current rates at AIModelCalc or directly with each provider — these change frequently.

Translating Per-Token Pricing to Monthly Costs

Here's the formula you need:

    Monthly cost = (avg input tokens × requests/month × input price per token) + (avg output tokens × requests/month × output price per token)

    Where "price per token" = (price per million tokens) ÷ 1,000,000

A real example: customer support chatbot

Let's say you're building a customer support chatbot on GPT-4o. Your average request looks like this:

System prompt: ~400 tokens (product context, instructions)
Conversation history: ~600 tokens average
User message: ~50 tokens
Total input: ~1,050 tokens per request
Response length: ~200 tokens average

At 10,000 requests per month on GPT-4o ($2.50 input / $10.00 output per million):

Component	Tokens	Monthly requests	Cost
Input	1,050	10,000	$26.25
Output	200	10,000	$20.00
Total	—	—	$46.25/month

The same workload on GPT-4o mini

Component	Tokens	Monthly requests	Cost
Input	1,050	10,000	$1.58
Output	200	10,000	$1.20
Total	—	—	$2.78/month

That's a 94% cost reduction. Whether GPT-4o mini gives you acceptable quality for customer support is a separate question — but the cost difference is why model selection matters so much.

What Makes Token Costs Spike in Practice

The biggest surprise for teams that haven't modeled costs carefully is usually one of these:

System prompts are charged every single request. A 1,000-token system prompt on 100,000 requests/month is 100 million input tokens — just from your instructions. With prompt caching (available on Claude and OpenAI), repeated system prompt tokens can be cached at a fraction of the cost. This is often the single biggest optimization available.

Conversation history compounds fast. In a multi-turn chatbot, you typically send the entire prior conversation as context with each request. A conversation that's 5 turns deep might have 3,000+ input tokens before the user even types their next message. Users who have long sessions cost significantly more than the per-request average.

Output length is harder to control than you think. You can request shorter responses, but models don't always comply precisely. A "keep responses under 150 words" instruction might average 180 tokens. At scale, that drift adds up.

Error handling and retries. Failed requests still often consume input tokens before failing. Retry logic without proper backoff can spike your costs unexpectedly.

The Input-Heavy vs Output-Heavy Distinction

Different use cases have very different input/output ratios, and this significantly affects which model is cheapest for your workload:

Document summarization: High input (the document), short output (the summary). Input-token-heavy. Benefits most from cheaper input pricing or caching.
Code generation: Moderate input, long output. Output-heavy. Models with a lower output/input price ratio are better here.
Classification/extraction: High input, tiny output (a label or JSON field). Extremely input-heavy. Model choice matters less; volume does.
Open-ended chat: Balanced, but output grows with conversation depth. The longer users talk, the more output-weighted it gets.

Want to run these numbers for your specific use case? Enter your token estimates and request volume in the calculator.

Open the Calculator →