AI model guide · Updated May 2026
Cheapest AI API Models (2026)
Your API bill is probably your biggest variable cost. Frontier prices dropped 5-10× since 2024, so the cheapest model that passes your eval is almost always the right answer. Below: who's actually cheapest in 2026, and where each one breaks.
Price ranking — input + output per 1M tokens
- Gemini 2.5 Flash — ~$0.10 / $0.40. Fastest, cheapest top-tier. Weak at hard reasoning.
- GPT-5 mini — ~$0.15 / $0.60. Strong instruction following, good for tools.
- DeepSeek R1 — ~$0.55 / $2.19. Best quality-per-dollar for reasoning and code.
- Qwen3 Max — ~$0.80 / $2.40. Strong multilingual, low latency in Asia.
- Claude Haiku 4.5 — ~$0.80 / $4.00. Best small Claude — good tool use.
- Mistral Large — ~$2 / $6. EU-hosted option, decent quality.
- GPT-5 — ~$2.50 / $10. Frontier reasoning baseline.
- Claude Sonnet 4.6 — ~$3 / $15. Premium for agent work.
Prices are list rates without caching or batch discounts. Real spend can be 30-70% lower with optimization.
Three discount levers most teams forget
- Prompt caching (Anthropic, OpenAI, DeepSeek): cached prefix is 10-50% of normal input cost. If you reuse a system prompt or document, you save 50-90% on input tokens. Single biggest lever for chat products.
- Batch API (OpenAI, Anthropic): submit jobs that complete within 24 hours, get 50% off. Perfect for backfills, scoring, content generation.
- Output token discipline. Output is 4-5× the price of input on most APIs. Asking for structured JSON instead of prose can cut output by 70%.
max_tokensis your friend.
When cheap is too cheap (where quality breaks)
Cheap models fail on: long agentic loops (5+ tool calls), nuanced reasoning, ambiguous instructions, code refactors that span files, and content where tone matters. A "cheap" model that loops 5× costs more than Claude doing it once. Always measure cost-per-resolved-task, not cost-per-token.
Cheap models excel at: classification, sentiment, extraction (NER, structured output), translation, summarization at known length, and any task you can grade with a string match.
Recommended cheap stack for indie products
- Hot path (user-facing): Gemini 2.5 Flash or GPT-5 mini. Sub-second latency, < $0.001 per request typical.
- Reasoning fallback: DeepSeek R1 or GPT-5. Only invoke when small model confidence is low.
- Batch jobs: Always use the batch API. 50% off is free money.
- Embeddings: OpenAI
text-embedding-3-smallor openbge-large.
One API key for all cheap models — OpenRouter
OpenRouter lets you route to DeepSeek, Gemini Flash, GPT-5 mini, Claude Haiku and more with a single OpenAI-compatible endpoint. Useful for A/B testing models without 6 signups.
OpenRouter has no public affiliate program — link is plain attribution.
FAQ
Cheapest API in 2026? Gemini 2.5 Flash for input-heavy, DeepSeek R1 for reasoning quality.
Is DeepSeek really cheaper than GPT-5? Yes — about 5× cheaper input, 5× cheaper output, with comparable quality on most coding and reasoning tasks.
Should I use Claude Haiku? If your task already works on Sonnet, Haiku usually works too at 1/4 the cost. Always test.
Where do I see real-time prices? Check.AI tracks list prices weekly. Provider pages have authoritative pricing.