Rate limit reached … tokens per min (TPM)
You’re pushing more tokens per minute than your tier allows — usually from large prompts or long contexts, not from too many requests. OpenAI returns 429 until the per-minute token window resets.
Coding and research agents hit TPM limits fast because they re-send big contexts on every step. Compaction and a per-minute cap that doesn’t exist both help here.
There’s no tokens-per-minute cap on Standard Compute, and smart prompt compaction trims redundant context automatically — so token-heavy agent workloads don’t hit a wall.
RPM caps how many requests per minute you can send; TPM caps how many tokens per minute. Big prompts hit TPM first; many small calls hit RPM first.