Cheapest AI API 2026: Lowest Cost LLMs Ranked by Price

Cheapest AI APIs Ranked by Input Price (May 2026)

Prices are per million tokens (1M tokens ≈ 750,000 words). Sorted by input token cost, lowest first.

#	Model	Provider	Input /1M	Output /1M	Best For
1	Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	Ultra budget
2	GPT-4o mini	OpenAI	$0.15	$0.60	High volume
2	Gemini 2.5 Flash	Google	$0.15	$0.60	Long context
3	Llama 3.1 8B	Meta (hosted)	$0.18	$0.18	Simple tasks
4	Mistral 7B	Mistral	$0.25	$0.25	EU data
5	Claude 3.5 Haiku	Anthropic	$0.80	$4.00	Quality budget
6	Llama 3.1 70B	Meta (hosted)	$0.90	$0.90	Open source
7	Gemini 2.5 Pro	Google	$2.00	$12.00	Premium tasks
8	GPT-4o	OpenAI	$2.50	$10.00	Balanced
9	Claude 3.5 Sonnet	Anthropic	$3.00	$15.00	Complex tasks
10	Claude 3 Opus	Anthropic	$15.00	$75.00	Max quality

      Key insight: The price gap between cheapest and most expensive is 50x on input tokens ($0.10 vs $5.00). For high-volume applications, choosing the right budget model can cut your API bill by 95%.
    

Top 3 Cheapest AI APIs — Detailed Breakdown

🥇 #1: Gemini 2.5 Flash-Lite — $0.10/M input

Google's most affordable model in 2026. Flash-Lite is designed for simple, repetitive tasks at extreme scale. It handles classification, extraction, summarization, and basic Q&A well. Not suitable for complex multi-step reasoning or nuanced writing.

Best for: Content moderation, data extraction, simple classification, FAQ bots
Avoid for: Creative writing, complex coding, legal/medical analysis
1M token cost: $0.10 input + $0.40 output = $0.50 total for 500K in / 500K out split

🥈 #2: GPT-4o mini — $0.15/M input

OpenAI's budget model punches well above its price point. GPT-4o mini delivers strong performance on structured tasks, follows instructions reliably, and integrates seamlessly with OpenAI's tools ecosystem (function calling, Assistants API, fine-tuning).

Best for: High-volume customer support, data processing, content generation at scale
Avoid for: Tasks requiring frontier-level reasoning (use GPT-4o or Claude Sonnet instead)
Sweet spot: Applications where you'd use GPT-4o but cost is a constraint

🥈 #2 (tied): Gemini 2.5 Flash — $0.15/M input

Same price as GPT-4o mini but with a massive 1M token context window — making it the best budget choice for long-document processing. Summarizing a 200-page PDF costs less than $0.05 in input tokens.

Best for: Long document analysis, RAG pipelines, multi-document summarization
Unique advantage: 1M context window at budget pricing

How Much Can You Save by Switching Models?

Assume a customer support chatbot processing 1 million requests/month with 500 input + 200 output tokens each:

Model	Input Cost	Output Cost	Monthly Total	vs GPT-4o
Gemini 2.5 Flash-Lite	$50	$80	$130	-94%
GPT-4o mini	$75	$120	$195	-91%
Claude Haiku 3.5	$400	$800	$1,200	-44%
GPT-4o	$1,250	$2,000	$3,250	—
Claude 3.5 Sonnet	$1,500	$3,000	$4,500	+38%

Should You Use Open-Source Models (Llama, Mistral)?

Meta's Llama and Mistral's open-weight models can be significantly cheaper when self-hosted, but running your own infrastructure adds complexity and fixed costs. Through hosted providers, the pricing above is competitive with the cheapest closed-source options.

Self-hosting makes sense if you process more than 50M tokens/month and have engineering capacity. Below that threshold, hosted APIs are simpler and often cheaper when you factor in GPU costs and maintenance.

Tips for Minimizing AI API Costs

Use prompt caching — Anthropic and OpenAI both offer 50–90% discounts on repeated input context
Trim system prompts — Every token in your system prompt is charged on every request
Match model to task — Use small models for simple tasks, save premium models for complex work
Batch non-urgent requests — OpenAI's Batch API offers 50% off for async workloads
Monitor output length — Output tokens cost 4–5x more than input; set max_tokens limits

Use our free calculator to estimate exactly what your workload will cost across all major models.

Calculate My AI API Cost →