400Any provider· Model limits

“Maximum context length exceeded” — what it means & how to fix

This model's maximum context length is N tokens

Quick answer

Your prompt plus the requested completion exceeds the model’s context window. Trim the input, summarize history, or use a longer-context model. It’s a hard model limit, not a billing or rate issue.

What causes it

Long conversation history resent in full every request.
Stuffing whole files, docs, or codebases into the prompt.
max_tokens set so high that input + output exceeds the window.

How to fix it

Trim the prompt and lower max_tokens so input + output fits the window.
Summarize or truncate old messages instead of resending everything.
Chunk large documents and retrieve only the relevant parts (RAG).
Use a model with a larger context window for genuinely large inputs.

Running an agent?

Coding and research agents trip this constantly by re-sending big contexts each step — compaction and a large-context route both help.

The permanent fix

Stop hitting this entirely

Standard Compute routes to large-context frontier models and applies smart prompt compaction, trimming redundant context automatically so you hit the wall less often.

Get a free API key →How it connects →

FAQ

Does context length include the response?

Yes. The window covers input tokens plus the tokens the model generates. If max_tokens is large, your input has to be proportionally smaller.

“Maximum context length exceeded” — what it means & how to fix

What causes it

How to fix it

Stop hitting this entirely

FAQ

Related errors