Claude · Reference

Claude Context Window: Token Limits by Model and How to Manage It

What the context window actually is, the exact token limit for every current Claude model, and how to work inside it without wasting tokens or losing your thread.

Model limits verified June 9, 2026 · source

The short answer

The Claude context window is the total text Claude can hold in working memory for one request — your prompt, attached files, the conversation so far, and Claude's reply all count. It is measured in tokens (≈0.75 of a word each). Claude's top models reach 1M tokens (~750K words); the fastest model holds 200K. A bigger window lets you feed in more, but it does not make Claude smarter — focused prompts still win.

What is a context window?

Think of the context window as Claude's short-term memory for a single request. Everything you put in front of it — the instruction you type, any documents or code you paste, the back-and-forth of the conversation, and the answer it writes — has to fit inside one budget.

That budget is counted in tokens, not words or characters. A token is a chunk of text — often a short word or part of a longer one. Anthropic's rough rule of thumb is that one token is about 0.75 of an English word, so 1,000 tokens is roughly 750 words.

The single most useful thing to understand: input and output share the same window. If a model has a 200,000-token window and your prompt uses 180,000 of them, Claude has only 20,000 tokens left to write its answer. Long inputs squeeze the room available for the reply.

One-line definition: the context window is the maximum amount of text — prompt plus reply — that Claude can process in a single request, measured in tokens.

Context window by Claude model

Current models, with the token limit and the rough word equivalent. Figures pulled live from our model data, verified 2026-06-09.

Model	Tier	Context window	≈ Words	Max output
Claude Fable 5	frontier	1,000,000 tokens	~750K words	128,000 tokens
Claude Opus 4.8	flagship	1,000,000 tokens	~750K words	128,000 tokens
Claude Sonnet 4.6	workhorse	1,000,000 tokens	~750K words	64,000 tokens
Claude Haiku 4.5	fast	200,000 tokens	~150K words	64,000 tokens

Word counts are approximate (0.75 words/token). Older models — Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.5 — mostly held 200K tokens. Limits change with each release; verified against Anthropic's model docs on 2026-06-09.

The 1M-token context window, explained

Claude's flagship and workhorse models — Claude Fable 5 and Claude Opus 4.8 and Claude Sonnet 4.6 — carry a 1,000,000-token context window. That is about 750,000 words: seven to nine full-length novels, or a substantial software codebase, in a single request.

It unlocks genuinely new workflows. You can drop an entire repository in and ask for an architecture review, paste a 300-page contract and ask for the risky clauses, or feed a year of meeting notes and ask for the decisions that were never followed up.

Two caveats keep it honest. First, more context is not free: on the API you pay per input token, so filling a 1M window is expensive, and even on subscriptions it counts harder against your limits. Second, large contexts can dilute focus — burying your real question inside 800,000 tokens of background often produces a worse answer than a tight, relevant prompt.

Rule of thumb: use the big window to include material Claude genuinely needs to read — not as an excuse to skip deciding what is relevant.

How to manage the context window

Five habits that keep your prompts sharp, cheap, and inside the limit.

1. Send only what matters

Paste the relevant section, not the entire document. A focused 3,000-token prompt almost always outperforms a 300,000-token data dump — and it is cheaper and faster.

2. Start fresh conversations often

In a long chat, old turns quietly drop out of scope. When a thread drifts, open a new conversation and paste a two-line summary of what Claude needs to remember.

3. Summarise instead of re-pasting

Rather than re-attaching a long file every turn, ask Claude to summarise it once, then carry the summary forward. You preserve the signal and reclaim the tokens.

4. Put the question last

When you do load a large context, place your actual instruction at the end of the prompt. Models attend strongly to the most recent text, so the ask should be the last thing Claude reads.

5. Match the model to the job

You do not need a 1M-token window to rewrite an email. Reserve the large-context models for whole-codebase or whole-book tasks; use the fast model for short, high-volume work.

A prompt template for large-context tasks

When you do need to load a long document, structure the prompt so Claude knows what to ignore and what to act on. Fill in the slots:

You are reviewing the material below. Do not summarise all of it. GOAL: [the one thing you want — e.g. "find every clause that limits liability"] SCOPE: [which part to focus on — e.g. "sections 4 to 9 only"] FORMAT: [how to answer — e.g. "a bullet list, clause number then plain-English risk"] --- MATERIAL START --- [paste your document, code, or notes here] --- MATERIAL END --- Now do the GOAL. Quote the exact text you are referring to.

Why it works: the goal and format sit at the top and bottom — the two positions models attend to most — and the explicit "do not summarise all of it" stops Claude from spending its output budget restating what you pasted.

Want hundreds more fill-in-the-blank prompts like this, organised by job to be done? That is the core of the Prompt Grader: paste your prompt and get a scored critique plus a tighter rewrite.

Frequently asked questions

What is the Claude context window?+

The context window is the total amount of text Claude can hold in working memory for a single request — your prompt, any files or pasted text, the conversation history, and Claude's own reply all count against it. It is measured in tokens, not words. One token is roughly 0.75 of an English word, so a 200,000-token window holds about 150,000 words. When a conversation exceeds the window, the oldest content drops out and Claude can no longer 'see' it.

How big is Claude's context window?+

It depends on the model. Claude Fable 5 has a 1M-token window, Claude Opus 4.8 has a 1M-token window, Claude Sonnet 4.6 has a 1M-token window, Claude Haiku 4.5 has a 200K-token window. The flagship and workhorse models reach 1,000,000 tokens — roughly 750,000 words, or several full-length books — while the fastest model holds 200,000 tokens. Limits change as new models ship, so always check the live figure rather than memorising a number.

How many words fit in a 1M token context window?+

About 750,000 words, using Anthropic's rough guide of 0.75 words per token. That is the length of roughly seven to nine full novels, or a mid-sized codebase. In practice you rarely want to fill it — large contexts cost more, run slower, and can dilute Claude's focus on the part that matters.

Does a bigger context window make Claude smarter?+

No. The context window controls how much Claude can read at once, not how well it reasons. Reasoning quality is a function of the model tier. A larger window lets you feed in more source material, but stuffing it with irrelevant text usually hurts answer quality — a focused 5,000-token prompt beats a bloated 500,000-token one. Use the window for relevant material, not as a dumping ground.

What happens when I exceed Claude's context window?+

In the API you get an error if a single request is too large. In the Claude.ai chat, long conversations are truncated — the earliest messages silently fall out of scope, so Claude appears to 'forget' things you said earlier. The fix is to start a fresh conversation, summarise the important points into a new prompt, or use Projects to keep reference material attached.

Does the context window affect cost?+

On the API, yes — you pay per input token, so a request that fills a 1M-token window costs far more than a tight prompt. On consumer Pro and Max subscriptions there is no per-token charge, but very large contexts count more heavily against your usage limits. Either way, the cheapest and fastest prompt is the one that includes only what Claude needs.

Keep going

Claude Sonnet vs Opus

Which tier to pick — and why the bigger window does not always mean the better model.

Claude Code guide

How context management plays out in agentic coding sessions.

Claude Pro vs Max vs API

How large contexts affect cost and usage limits across plans.

Claude prompt cost calculator

See what a large-context request actually costs in input tokens.

Anthropic API pricing

Per-token input and output rates by model, where context size hits the bill.

Claude vs Gemini

How the two stack up on context window and long-document handling.

Stop wasting tokens on weak prompts

Knowing the context window is the easy part. Writing prompts that get a great answer in the fewest tokens is the skill. The free Prompt Grader scores your prompt on five criteria and rewrites it to work the first time.

Grade Your Prompt Free