Question 1

How many tokens is my prompt?

Accepted Answer

Roughly 1 token per 0.75 English words, or 4 characters per token. So a 500-word prompt is about 670 tokens. The exact count depends on punctuation, non-English characters, and code. For precise counts, use Anthropic's tokenizer in the SDK. For rough planning, the word-count estimator on this page is within about 10%.

Question 2

What counts as 'output tokens'?

Accepted Answer

Every token Claude generates in its reply, including thinking tokens if extended thinking is enabled. A typical 300-word response is about 400 output tokens. Output is priced 3-5x higher than input across all Claude models, so long replies cost more than long prompts of the same word count.

Question 3

Does the system prompt count toward input?

Accepted Answer

Yes. Every token you send — system prompt, conversation history, user message, attached documents — counts toward input tokens on every call. This is why long chat histories get expensive: each turn re-sends the whole conversation. Use prompt caching (available on the Anthropic API) to cut repeated-input costs by up to 90%.

Question 4

What is prompt caching and how much does it save?

Accepted Answer

Prompt caching lets you mark large static parts of a prompt (system prompts, long documents) for caching. On subsequent calls within 5 minutes, cached tokens cost about 10% of normal input rate. For applications that send the same large context repeatedly (agents, document Q&A, code assistants), savings of 50-90% are normal.

Question 5

How much cheaper is Haiku than Opus?

Accepted Answer

Haiku 4.5 is 5x cheaper than Opus 4.8 on both input and output — $1/M in vs $5/M in, and $5/M out vs $25/M out. For high-volume routine tasks (classification, short Q&A, simple transforms), running Haiku instead of Opus is one of the biggest cost optimisations available. The quality difference is often invisible for easy tasks. (Note: older Opus 4 and 4.1 were $15/$75 per MTok — a 15x gap — but current-generation Opus dropped to $5/$25 with the 4.5 release.)

Question 6

How do I cut my Claude API costs?

Accepted Answer

Four main levers. First, pick the smallest model that handles the task — start with Haiku, escalate only when quality suffers. Second, use prompt caching for static context. Third, trim the system prompt and conversation history to the minimum needed. Fourth, set max_tokens on the output to the true cap you need — Claude will stop early and you pay only for what is generated.

Claude Prompt Cost Calculator

Same prompt, all three models

Four ways to cut Claude API costs

1. Pick the smallest model that works

2. Use prompt caching for repeated context

3. Cap output with max_tokens

4. Trim conversation history

Frequently Asked Questions

Next tool