This model's maximum context length is N tokens
Your prompt plus the requested completion exceeds the model’s context window. Trim the input, summarize history, or use a longer-context model. It’s a hard model limit, not a billing or rate issue.
Coding and research agents trip this constantly by re-sending big contexts each step — compaction and a large-context route both help.
Standard Compute routes to large-context frontier models and applies smart prompt compaction, trimming redundant context automatically so you hit the wall less often.
Yes. The window covers input tokens plus the tokens the model generates. If max_tokens is large, your input has to be proportionally smaller.