429OpenAI· Rate limits

OpenAI tokens-per-minute (TPM) rate limit

Rate limit reached … tokens per min (TPM)

Quick answer

You’re pushing more tokens per minute than your tier allows — usually from large prompts or long contexts, not from too many requests. OpenAI returns 429 until the per-minute token window resets.

What causes it

Large prompts or long conversation histories — token volume, not request count, blows the TPM limit.
Streaming many long completions in parallel.
Stuffing entire files or codebases into context on every call (common in coding agents).

How to fix it

Trim context: send only the messages and files the model actually needs for this turn.
Summarize or compact long histories instead of resending the full transcript every request.
Spread large jobs over time, and back off + retry on 429.
Move up a usage tier for a higher TPM ceiling.

Running an agent?

Coding and research agents hit TPM limits fast because they re-send big contexts on every step. Compaction and a per-minute cap that doesn’t exist both help here.

The permanent fix

Stop hitting this entirely

There’s no tokens-per-minute cap on Standard Compute, and smart prompt compaction trims redundant context automatically — so token-heavy agent workloads don’t hit a wall.

Get a free API key →How it connects →

FAQ

What’s the difference between RPM and TPM limits?

RPM caps how many requests per minute you can send; TPM caps how many tokens per minute. Big prompts hit TPM first; many small calls hit RPM first.

OpenAI tokens-per-minute (TPM) rate limit

What causes it

How to fix it

Stop hitting this entirely

FAQ

Related errors