← All fixes
429OpenAI· Rate limits

OpenAI tokens-per-minute (TPM) rate limit

Rate limit reached … tokens per min (TPM)
Quick answer

You’re pushing more tokens per minute than your tier allows — usually from large prompts or long contexts, not from too many requests. OpenAI returns 429 until the per-minute token window resets.

What causes it

How to fix it

  1. Trim context: send only the messages and files the model actually needs for this turn.
  2. Summarize or compact long histories instead of resending the full transcript every request.
  3. Spread large jobs over time, and back off + retry on 429.
  4. Move up a usage tier for a higher TPM ceiling.
Running an agent?

Coding and research agents hit TPM limits fast because they re-send big contexts on every step. Compaction and a per-minute cap that doesn’t exist both help here.

The permanent fix

Stop hitting this entirely

There’s no tokens-per-minute cap on Standard Compute, and smart prompt compaction trims redundant context automatically — so token-heavy agent workloads don’t hit a wall.

Get a free API key →How it connects →

FAQ

What’s the difference between RPM and TPM limits?

RPM caps how many requests per minute you can send; TPM caps how many tokens per minute. Big prompts hit TPM first; many small calls hit RPM first.

Related errors

OpenAI “Rate limit reached for requests” (429)OpenAI · 429“Maximum context length exceeded” — what it means & how to fixAny provider · 400