LLM API Pricing Calculator

Sticker prices do not tell you what you will actually pay. Input your real task shape -- tokens in, tokens out, cache hit ratio -- and see cost per call across every major provider, including what prompt caching actually saves.

Last updated: 2026-05-25. For a broader model comparison, see AI Models Guide.

Cost Calculator

Enter your task shape to see cost per call across providers. Cache write cost excluded (amortized). Gemini storage billed separately.

~750 words = 1,000 tokens

Typical reply length

0 = no cache; 80 = high reuse

Current API Prices

Prices in USD per 1M tokens. Source links open the vendor pricing page. Rows marked "unverified" could not be confirmed by automated fetch.

ModelInput /1MOutput /1MCache read /1MCache write /1MVerified
Claude Opus 4.7
Anthropic
$5.00$25.00$0.5$6.252026-05-25
Claude Sonnet 4.6
Anthropic
$3.00$15.00$0.3$3.752026-05-25
Claude Haiku 4.5
Anthropic
$1.00$5.00$0.1$1.252026-05-25
GPT-4o
OpenAI
$2.50$10.00$1.25--unverified -- check
GPT-4o mini
OpenAI
$0.15$0.60$0.075--unverified -- check
Gemini 2.5 Pro
Google
$1.25$10.00$0.125--2026-05-25
Gemini 2.5 Flash
Google
$0.30$2.50$0.03--2026-05-25

How Prompt Caching Works

Every provider calls it something different and prices it differently. Here is what each model actually does -- with a worked example using the same scenario: a customer support bot with a 10,000-token system prompt, called 1,000 times.

Anthropic -- explicit cache control

You mark cache breakpoints in your prompt. Anthropic stores everything up to that point. Write cost: 1.25x input price (one-time per TTL window). Read cost: 0.1x input price. Default TTL is 5 minutes; extended caching available.

Sonnet 4.6 -- 10k token system prompt, 1,000 calls

Without cache: 1,000 x (10,000 x $3.00/1M) = $30.00

Cache write (once): 10,000 x $3.75/1M = $0.038

Cache reads (x1,000): 1,000 x (10,000 x $0.30/1M) = $3.00

Total: $3.04 -- 90% savings on the cached portion

OpenAI -- automatic prefix caching

No setup required. For prompts over 1,024 tokens, OpenAI automatically caches the longest cacheable prefix. You pay 50% of the input price for cached tokens. Cache hits are visible in your usage response.

GPT-4o -- 10k token system prompt, 1,000 calls (prices unverified)

Without cache: 1,000 x (10,000 x $2.50/1M) = $25.00

With cache (50% rate): 1,000 x (10,000 x $1.25/1M) = $12.50

50% savings on the input portion

Gemini -- explicit context caching with storage fee

You create a cache object via the API. Cached input tokens cost $0.125/1M (Gemini 2.5 Pro, under 200K context) vs $1.25/1M normal -- a 90% reduction per token. Storage is billed separately at $4.50/1M tokens/hour. For high-frequency workloads the token savings exceed the storage cost quickly.

Gemini 2.5 Pro -- 10k token system prompt, 1,000 calls, 1 hour

Without cache: 1,000 x (10,000 x $1.25/1M) = $12.50

Cache reads: 1,000 x (10,000 x $0.125/1M) = $1.25

Storage (1 hr): 10,000 x $4.50/1M/hr = $0.045

Total: $1.30 -- 90% savings on the input portion

Frequently Asked Questions

What is prompt caching and why does it reduce costs?

Prompt caching lets a provider store a fixed portion of your prompt (like a long system prompt) so it does not have to be re-processed on every call. You pay a lower "cache read" rate instead of the full input rate. Anthropic charges 10% of the input price for cache reads. OpenAI applies 50% of the input price automatically for eligible cached prefixes.

How does Anthropic prompt caching work?

Anthropic uses explicit cache control markers. You write tokens to the cache at 1.25x the input price (one-time), then read them back at 0.1x the input price on subsequent calls within a 5-minute TTL window. For repeated workloads with a stable system prompt, savings of 80-90% on the input portion are typical.

How does OpenAI automatic caching work?

OpenAI automatically caches the prefix of prompts longer than 1024 tokens. There is no explicit setup -- you pay 50% of the normal input price for any tokens served from cache. Cache hits are reflected in your usage data.

How does Gemini context caching work?

Gemini context caching stores content explicitly via the API and charges a reduced per-token rate when that content is served. You also pay a separate hourly storage fee per cached token. For high-frequency use cases with large stable contexts, the storage cost is small relative to the per-token savings.

Which provider is cheapest for high-volume API calls?

Gemini 2.5 Flash is consistently the lowest cost per token for input-heavy workloads. Claude Haiku 4.5 and GPT-4o mini are competitive on cost. However, cost per token is only one dimension -- output quality and latency matter too. Use the calculator above with your actual task shape to compare.

Why are some prices marked as unverified?

OpenAI's pricing page blocked automated fetches on the date this page was last updated. Prices for OpenAI models are based on the most recently available public data but may be out of date. Always confirm at openai.com/api/pricing before making production budget decisions.

Related Resources