Claude · API

Anthropic API Pricing: Per-Model Token Rates and Real Cost Examples

How Anthropic API pricing actually works — what each model costs per token, how a bill is calculated, worked monthly examples, and the two levers that cut spend the most.

Pricing verified June 9, 2026 · source

The short answer

The Anthropic API is pay-as-you-go by token — no monthly fee, no seats. You pay separately for input tokens (what you send) and output tokens (what Claude generates), priced per million tokens and differing by model. Current rates run from $1/$5 per MTok on Claude Haiku 4.5 up to $5/$25 on Claude Opus 4.8. Output costs 5x input on every model. Prompt caching (0.1x on cache reads) and the Batch API (50% off) are the biggest cost levers.

How much does each Claude model cost per token?

Standard API rates, quoted per million tokens (MTok). Input and output are billed separately.

ModelTierInput / MTokOutput / MTokContext windowBest for
Claude Fable 5frontier$10$501,000KMost capable widely released model for the most demanding reasoning and long-horizon agentic work
Claude Opus 4.8flagship$5$251,000KMost capable Opus-tier model for complex reasoning and agentic coding
Claude Sonnet 4.6workhorse$3$151,000KBest combination of speed and intelligence
Claude Haiku 4.5fast$1$5200KFastest model with near-frontier intelligence

Rates verified against the Anthropic model overview on 2026-06-09. Model lineup and pricing change often — check the source before forecasting a large budget.

How does Anthropic API billing work?

Every API request has two billable parts. Input tokens are everything you send to the model on that call: your prompt, the system instructions, and — in a multi-turn conversation — the entire prior history you replay. Output tokens are everything Claude writes back.

Both are counted in tokens (roughly 3-4 characters of English each) and billed per million. The formula for a single call is simple:

call cost = (input tokens ÷ 1,000,000 × input rate)
          + (output tokens ÷ 1,000,000 × output rate)

The thing most people miss: output is 5x the price of input on every model. Sending a 50,000-token document is cheap; asking for a 5,000-token essay back is where the bill grows. In a chat app, the replayed history also compounds — each new turn re-sends every previous turn as input.

There is no monthly minimum and no per-seat cost. You add credit to a billing account at console.anthropic.com and tokens draw it down. This is the opposite model from the Pro, Max, and Team subscriptions, which are flat monthly prices for personal use of the Claude apps.

What does the API actually cost? Three worked examples

Numbers below are calculated from the live rates above, so they stay correct when the data updates.

1. Support chatbot on Claude Sonnet 4.6

Assume 2,000 input tokens (system prompt + question + a little history) and 500 output tokens per reply, at 10,000 replies a month.

  • Per reply: $0.01
  • 10,000 replies/month: $135/mo

Swap to Claude Haiku 4.5 for the same volume and it drops to $45/mo — the biggest single saving on most apps is using the smallest model that holds quality.

2. Long-document summariser on Claude Opus 4.8

A 50,000-token document in, a 2,000-token summary out, 1,000 documents a month.

  • Per document: $0.30
  • 1,000 documents/month: $300/mo

3. The same chatbot, with prompt caching

Of those 2,000 input tokens, say 1,800 are a fixed system prompt sent on every call. Cache it and cache reads cost 0.1x the input rate.

  • System prompt, uncached, 10k calls: $54/mo
  • System prompt, cached, 10k calls: $5.4/mo
  • Saving on that portion: ~90%

Two levers that cut the bill

Prompt caching

Cache a large, reused chunk of context — a system prompt, a knowledge base, a style guide — and every subsequent call reads it at 0.1x the input price.

  • Cache read: 0.1x input rate (a 90% discount)
  • 5-minute cache write: 1.25x input rate
  • 1-hour cache write: 2x input rate

Worth it whenever the same context is reused more than a couple of times before it expires.

Message Batches API

Submit requests asynchronously (results within 24 hours) for 50% off standard rates on both input and output.

Use it for anything that does not need a live reply: bulk classification, content pipelines, dataset labelling, overnight reports. Batch and caching stack — a cached batch job is cheaper again.

Caching and batch multipliers per Anthropic's prompt-caching and Batches API docs · source

Estimate your own bill: a fill-in-the-blank template

Drop your own numbers into the blanks. The arithmetic is the same per-call formula from above, scaled to monthly volume.

Model: ______ (input rate $___ / MTok, output rate $___ / MTok)
Avg input tokens per call: ______
Avg output tokens per call: ______
Calls per month: ______

Per-call cost = (input ÷ 1,000,000 × input rate)
              + (output ÷ 1,000,000 × output rate)
Monthly cost = per-call cost × calls per month

If a fixed chunk of input repeats every call:
  cached portion cost × 0.1 (cache read)
If replies can wait up to 24h: × 0.5 (Batch API)

Failure mode to avoid: people forecast on input tokens and forget that output is 5x the price, then get a bill double their estimate. Always size the output side first, and cap max_tokens so a runaway generation can't blow the budget.

API or subscription — which do you actually need?

These are different products for different jobs. The Pay-as-you-go API is for software you build: apps, agents, automations, pipelines. You pay per token and control cost per request.

For your own daily Claude use — chatting, Claude Code in the terminal — a flat subscription wins almost every time. Metering your personal use through the API is usually more expensive and far more stressful.

Most builders run both. If you are weighing the consumer tiers, the Claude Pro vs Max vs API comparison breaks down when each subscription wins, and the Claude Code pricing guide covers what running Claude Code costs on each plan.

Token rates differ by model, so the choice drives your bill. See Sonnet vs Opus for the price-versus-capability trade-off, and the Claude context window guide for how prompt size affects input cost.

Frequently asked questions

How does Anthropic API pricing work?+

The Anthropic API is pay-as-you-go by token. You are billed separately for input tokens (everything you send: your prompt, system instructions, and conversation history) and output tokens (everything Claude generates). Prices are quoted per million tokens (MTok) and differ by model. There is no monthly fee, no minimum spend, and no seat cost — you pay only for the tokens each request consumes.

How much does the Claude API cost per token?+

Per the current model lineup: Claude Opus 4.8 is $5 per million input tokens and $25 per million output tokens. Claude Sonnet 4.6 is $3 input / $15 output. Claude Haiku 4.5 is $1 input / $5 output. Output tokens cost 5x input across every model, so terse responses are cheaper than verbose ones.

Is the Anthropic API cheaper than a Claude subscription?+

For building applications, yes — the API is the only option, and you pay only for what you ship. For your own personal Claude use (chat, Claude Code), no. A flat $20/mo Pro subscription almost always beats metered API tokens once you use Claude daily. Most developers run both: a subscription for personal work, API billing for the apps they build.

How much can prompt caching save?+

Cache reads cost 0.1x (10%) of the base input price, so re-sending a large fixed context — a system prompt, a knowledge base, a long instruction set — drops to a tenth of its normal cost on every cached call. Writing to the cache carries a small premium (1.25x for the 5-minute cache, 2x for the 1-hour cache), so caching pays off whenever the same context is reused more than a couple of times. In the worked example on this page, caching a 1,800-token system prompt cuts that portion of the bill by about 90%.

What is the Message Batches API discount?+

The Message Batches API processes requests asynchronously (results within 24 hours) at 50% off standard token rates — input and output both. Use it for anything that does not need a real-time reply: bulk classification, content generation pipelines, dataset labelling, overnight summarisation. Batch and prompt caching stack, so a cached batch job is cheaper still.

Do input and output tokens cost the same?+

No. Output tokens are priced at 5x input tokens on every current model. That is why long input context (documents, history) is relatively cheap to send, but asking for long generated answers is the expensive part of a bill. Capping max output tokens and prompting for concise responses is the single biggest lever on cost.

How do I estimate my monthly Claude API bill?+

Estimate average input tokens per call, average output tokens per call, and calls per month. Multiply input tokens by the model input rate (per MTok) and output tokens by the output rate, add them for a per-call cost, then multiply by monthly call volume. The fill-in-the-blank template on this page walks through the arithmetic with a real example.

The cheapest token is the one you don't waste

Most API bills are inflated by sloppy prompts — bloated system instructions, vague asks that force long outputs, retries from prompts that didn't work the first time. PromptWritingStudio teaches the prompt patterns that get the right answer on the first call, which is the same thing as cutting your token spend. Run the numbers with the calculators, then tighten the prompts.