Question 1

What is prompt caching and why does it reduce costs?

Accepted Answer

Prompt caching lets a provider store a fixed portion of your prompt (like a long system prompt) so it does not have to be re-processed on every call. You pay a lower "cache read" rate instead of the full input rate. Anthropic charges 10% of the input price for cache reads. OpenAI applies 50% of the input price automatically for eligible cached prefixes.

Question 2

How does Anthropic prompt caching work?

Accepted Answer

Anthropic uses explicit cache control markers. You write tokens to the cache at 1.25x the input price (one-time), then read them back at 0.1x the input price on subsequent calls within a 5-minute TTL window. For repeated workloads with a stable system prompt, savings of 80-90% on the input portion are typical.

Question 3

How does OpenAI automatic caching work?

Accepted Answer

OpenAI automatically caches the prefix of prompts longer than 1024 tokens. There is no explicit setup -- you pay 50% of the normal input price for any tokens served from cache. Cache hits are reflected in your usage data.

Question 4

How does Gemini context caching work?

Accepted Answer

Gemini context caching stores content explicitly via the API and charges a reduced per-token rate when that content is served. You also pay a separate hourly storage fee per cached token. For high-frequency use cases with large stable contexts, the storage cost is small relative to the per-token savings.

Question 5

Which provider is cheapest for high-volume API calls?

Accepted Answer

Gemini 2.5 Flash is consistently the lowest cost per token for input-heavy workloads. Claude Haiku 4.5 and GPT-4o mini are competitive on cost. However, cost per token is only one dimension -- output quality and latency matter too. Use the calculator above with your actual task shape to compare.

Question 6

Why are some prices marked as unverified?

Accepted Answer

OpenAI's pricing page blocked automated fetches on the date this page was last updated. Prices for OpenAI models are based on the most recently available public data but may be out of date. Always confirm at openai.com/api/pricing before making production budget decisions.

Model	Input /1M	Output /1M	Cache read /1M	Cache write /1M	Verified
Claude Opus 4.7 Anthropic	$5.00	$25.00	$0.5	$6.25	2026-05-25
Claude Sonnet 4.6 Anthropic	$3.00	$15.00	$0.3	$3.75	2026-05-25
Claude Haiku 4.5 Anthropic	$1.00	$5.00	$0.1	$1.25	2026-05-25
GPT-4o OpenAI	$2.50	$10.00	$1.25	--	unverified -- check
GPT-4o mini OpenAI	$0.15	$0.60	$0.075	--	unverified -- check
Gemini 2.5 Pro Google	$1.25	$10.00	$0.125	--	2026-05-25
Gemini 2.5 Flash Google	$0.30	$2.50	$0.03	--	2026-05-25

LLM API Pricing Calculator

Cost Calculator

Current API Prices

How Prompt Caching Works

Anthropic -- explicit cache control

OpenAI -- automatic prefix caching

Gemini -- explicit context caching with storage fee

Frequently Asked Questions

What is prompt caching and why does it reduce costs?

How does Anthropic prompt caching work?

How does OpenAI automatic caching work?

How does Gemini context caching work?

Which provider is cheapest for high-volume API calls?

Why are some prices marked as unverified?

Related Resources

AI Models Guide

AI Prompt Generator

AI Glossary