Claude Prompt Cost Calculator
Work out what a Claude API call will cost — one call, a thousand, or a million — across Haiku 4.5, Sonnet 4.6, and Opus 4.7.
Pricing verified April 17, 2026 · source
~400 tokens
~267 tokens
Claude Sonnet 4.6
Per call
$0.520¢
Per 1,000 calls
$5.20
Per month (1,000 calls)
$5.20
Same prompt, all three models
Monthly cost at your volume (1,000 calls)
Claude Opus 4.7
$5/M input · $25/M output
$8.67
$0.867¢/call
Claude Sonnet 4.6
$3/M input · $15/M output
$5.20
$0.520¢/call
Claude Haiku 4.5
$1/M input · $5/M output
$1.73
$0.173¢/call
Four ways to cut Claude API costs
1. Pick the smallest model that works
Start with Haiku. Escalate only when quality suffers. Dropping from Opus 4.7 to Sonnet 4.6 cuts cost by 40%; dropping from Sonnet to Haiku cuts another 67%.
2. Use prompt caching for repeated context
If you send the same system prompt or document on every call, enable prompt caching. Cached input drops to about 10% of normal rate on subsequent hits within 5 minutes.
3. Cap output with max_tokens
Claude stops as soon as it hits your cap. Setting max_tokens to what you actually need (not a big safe number) saves money on every single call.
4. Trim conversation history
Long chats re-send the whole history every turn. Summarise or truncate old turns when they are no longer relevant to the current question.
Frequently Asked Questions
How many tokens is my prompt?+
Roughly 1 token per 0.75 English words, or 4 characters per token. So a 500-word prompt is about 670 tokens. The exact count depends on punctuation, non-English characters, and code. For precise counts, use Anthropic's tokenizer in the SDK. For rough planning, the word-count estimator on this page is within about 10%.
What counts as 'output tokens'?+
Every token Claude generates in its reply, including thinking tokens if extended thinking is enabled. A typical 300-word response is about 400 output tokens. Output is priced 3-5x higher than input across all Claude models, so long replies cost more than long prompts of the same word count.
Does the system prompt count toward input?+
Yes. Every token you send — system prompt, conversation history, user message, attached documents — counts toward input tokens on every call. This is why long chat histories get expensive: each turn re-sends the whole conversation. Use prompt caching (available on the Anthropic API) to cut repeated-input costs by up to 90%.
What is prompt caching and how much does it save?+
Prompt caching lets you mark large static parts of a prompt (system prompts, long documents) for caching. On subsequent calls within 5 minutes, cached tokens cost about 10% of normal input rate. For applications that send the same large context repeatedly (agents, document Q&A, code assistants), savings of 50-90% are normal.
How much cheaper is Haiku than Opus?+
Haiku 4.5 is 5x cheaper than Opus 4.7 on both input and output — $1/M in vs $5/M in, and $5/M out vs $25/M out. For high-volume routine tasks (classification, short Q&A, simple transforms), running Haiku instead of Opus is one of the biggest cost optimisations available. The quality difference is often invisible for easy tasks. (Note: older Opus 4 and 4.1 were $15/$75 per MTok — a 15x gap — but current-generation Opus dropped to $5/$25 with the 4.5 release.)
How do I cut my Claude API costs?+
Four main levers. First, pick the smallest model that handles the task — start with Haiku, escalate only when quality suffers. Second, use prompt caching for static context. Third, trim the system prompt and conversation history to the minimum needed. Fourth, set max_tokens on the output to the true cap you need — Claude will stop early and you pay only for what is generated.
Next tool
Now that you know the per-call cost, the model selector maps tasks to the right Claude model, and the Claude Code vs Cursor calculator tells you whether API billing or a flat subscription wins for your workflow.