← Blog/Guide

My OpenClaw agent looked idle overnight and still burned through tokens

Daniel NguyenMay 12, 2026 · 8 min read

Overnight Token Drain

Idle-looking agent, rising spend

Cause

Poll loop

Agent UI

Idle

Loop checks

Running

Token spend

+9.4k

The clearest answer from a 14-upvote, 29-comment r/openclaw thread is this: most surprise token burn comes from heartbeats repeatedly resending full context, not from “real work.” The fixes people trust most are 1-hour heartbeats, short sessions with a handoff file, and pushing routine jobs into scripts or cron instead of keeping one giant OpenClaw thread alive.

The clearest answer from a 14-upvote, 29-comment r/openclaw thread is this: most surprise token burn comes from heartbeats repeatedly resending full context, not from “real work.” The fixes people trust most are 1-hour heartbeats, short sessions with a handoff file, and pushing routine jobs into scripts or cron instead of keeping one giant OpenClaw thread alive.

I came across this thread on r/openclaw while researching why so many OpenClaw users say the same thing in slightly different words: my agent didn’t even do much, so why did my bill explode?

And honestly, the thread is great because nobody is pretending this is a glamorous problem.

It’s not “how do I build AGI.” It’s “why did I go to sleep and wake up poorer?”

That’s a much more useful question.

The original post only had 14 upvotes and 29 comments, but it hit a nerve because the comments converged fast. Not on some exotic prompt trick. Not on a magical memory architecture. On one boring, brutal culprit.

Heartbeats.

One commenter put it better than I could: “heartbeats. that's almost always the culprit when you wake up to a drained budget and nothing useful happened overnight. every heartbeat call passes your entire conversation context through the API”.

That one sentence explains an enormous amount of pain.

Your OpenClaw agent looks idle. But under the hood, it’s not idle at all. It’s checking in. And every check-in can drag the whole conversation history back through GPT-5, Claude, Qwen, or whatever model you wired up. The longer the thread gets, the more expensive each “tiny” heartbeat becomes.

That was the first big reveal in the thread. The second one was even more interesting: the fix most people trust is not smarter memory. It’s less romance, more discipline.

The real villain was the thread itself

A lot of people assume token burn comes from dramatic moments. Big code generation. Long planning chains. Huge browser tasks.

Sometimes, sure.

But in this discussion, the community kept pointing to a more embarrassing truth: long-lived sessions quietly rot your budget.

Every extra turn makes the next turn heavier. Then heartbeats make that heaviness recur on a schedule. What looked like “persistent context” turns into a tax.

And this gets nastier in multi-agent setups. A separate r/openclaw post mentioned a custom dashboard managing 14 agents. That’s a useful gut check. One bloated thread is annoying. Fourteen bloated threads is how people end up babysitting budgets instead of building things.

Why “idle” agents still cost money

If your OpenClaw loop is structured like this:

heartbeat -> send current state + prior conversation -> model responds -> wait -> repeat

then your cost is tied to context size, not just visible activity.

That means an agent can appear asleep while still chewing through tokens because each heartbeat is carrying yesterday’s baggage, last hour’s notes, half-finished plans, and every prior tool result back into the model again.

That’s the part people miss until they see the invoice.

So should you compact the context harder?

You’d think the obvious answer would be aggressive summarization.

The thread mostly said: not so fast.

One of the strongest comments argued for a completely different shape of work: “Break tasks into short sessions with a brief state handoff file at the end. Compaction while mid-task actually burns more”.

I think that commenter is basically right.

Because compaction sounds elegant until you map the real token flow:

The model reads the giant thread
The model writes a summary
The next model call reads the summary
The model realizes something got lost
The model asks follow-up questions
You spend more tokens repairing the summary

That’s not compression. That’s a second conversation about the first conversation.

And when you’re mid-task, losing one detail can be fatal. A file path disappears. A constraint gets softened. A previous decision vanishes. Now Claude or GPT-5 has to reconstruct intent from partial notes, which is expensive and weirdly fragile.

The workflow Reddit liked better

Instead of carrying one immortal thread, several commenters preferred this rule:

finish a bounded task
write a short handoff file
start a fresh session
only bring forward what actually matters

That is less magical and more annoying. It also works.

Here’s the comparison the thread was really circling around:

Approach	What actually happens
Keep one long-lived session	Simple to manage, but context bloat grows silently and heartbeat costs compound because prior context keeps getting resent
Short sessions plus handoff file	Lower context per call, requires explicit state management, and Reddit users report it beats mid-task compaction
Supervisor agent plus scripts/cron jobs	Best for repetitive workflows, moves deterministic work out of the LLM loop, and cuts expensive reasoning on routine tasks

That last row is where the conversation got much smarter.

What if the agent should stop doing the boring parts?

One pattern showed up in this thread and in a related OpenClaw budget discussion: stop making the model do deterministic work.

This sounds obvious. Almost nobody actually does it soon enough.

In the related discussion about operating within the weekly budget of the Codex Plus package, users described OpenClaw acting more like an executive. Python scripts gathered data. Cron jobs handled routine checks. The agent reviewed outputs, made decisions, and stepped in when judgment was needed.

That architecture matters because LLMs are bad employees for repetitive chores. They’re expensive, distractible, and too willing to “reason” about work that a 12-line Python script could finish before Claude has finished warming up.

One user in the broader OpenClaw community said they moved heartbeat-style work into isolated cron jobs with light context. That’s a real shift in architecture, not a prompt tweak. Instead of one always-on OpenClaw agent dragging a giant thread behind it, you run small background jobs that check specific conditions and only wake the main agent when something actually changed.

That is the difference between an assistant and a nervous intern who keeps tapping your shoulder every five minutes.

Are cheaper models the boring answer nobody wants to hear?

Also yes.

One commenter offered a setup that was refreshingly unromantic: “Qwen 9b default, 1hour heartbeats, sonnet for heavy tasks, opus for mission critical tasks.”

That’s not sexy. It’s correct.

A lot of OpenClaw users wire one premium model into everything because it feels cleaner. Then they discover they’re paying Claude Opus or GPT-5 rates for heartbeat checks, task polling, and low-stakes orchestration.

That’s absurd.

Use the expensive brain for expensive thinking.

Use cheaper models like Qwen 9B or local models for maintenance work, lightweight routing, or compression. Save Claude Sonnet for serious tasks. Save Claude Opus or GPT-5 for moments where being right matters more than being cheap.

One commenter even said they use Granite 3B locally to compress roughly 65k of context so they aren’t repeatedly shipping everything back to a cloud API. They weren’t even claiming perfect savings math. That almost made the comment more believable. It was just a real person saying, basically, “I got tired of paying premium rates for something a local model could do well enough.”

A practical default might look like this:

heartbeat_interval: 1h
orchestrator_model: qwen-9b
heavy_reasoning_model: claude-sonnet
mission_critical_model: claude-opus
local_compression_model: granite-3b
session_rule: "end task -> write handoff.md -> start fresh session"

Will this eliminate token burn? No. And one commenter was blunt about that: “You can't, easy way is use deepseek.”

Crude answer. Not wrong.

Sometimes the fix really is “pick a cheaper model” instead of fantasizing about perfect memory management.

Why does this feel worse than a normal cloud bill?

Because this isn’t just a technical problem. It’s a psychological one.

The original poster said they had “burned so much money just overnight.” That sentence carries a very specific kind of stress. Not the stress of a buggy script. The stress of a meter running while you sleep.

Another commenter said it even more sharply: “Running model api keys with a blank cheque is setting fire to your cash at this point.”

That’s why “token optimization” discussions so often become behavior discussions. People start shortening prompts, disabling retries, reducing monitoring, avoiding experiments, and keeping agents on a leash because they’re scared of invisible spend.

That fear changes what gets built.

And you can see the opposite in the related Codex Plus thread, where one user said 55–60 percent of their weekly package was still left by the end of the week after restructuring so AI handled supervision rather than every step. Same ambition. Different architecture. Totally different emotional experience.

My take after reading the whole thing

The best comment in the thread was not “use compaction.” It was not “buy a bigger context window.” It was not “find the perfect prompt.”

It was the people saying, in different ways, stop treating one OpenClaw session like a forever-brain.

That design is what burns you.

If I were setting up OpenClaw today and wanted to stop the dumbest token waste first, I’d do exactly this:

Stretch heartbeat frequency way out — start with 1-hour heartbeats, not constant polling.
End tasks aggressively instead of carrying giant live threads forward.
Write a tiny handoff file at the end of each session.
Move deterministic work into Python, bash, cron, or n8n.
Tier models by value, with Qwen or DeepSeek for cheap maintenance and Claude or GPT-5 for hard reasoning.
Use local summarization if you really need compression — Granite 3B, Llama, whatever runs well on your box.

That won’t make token burn disappear.

But it does something better. It turns the problem back into engineering instead of gambling.

And that, more than any clever prompt trick, was the real lesson buried inside this little r/openclaw thread: if your agent costs surprise you, the problem usually isn’t intelligence.

It’s shape.

Change the shape, and the bill changes with it.

Frequently Asked Questions

Why is OpenClaw burning tokens when my agent looks idle?

The most common reason is heartbeat calls that keep sending the full conversation history back to the model. Even if the agent is not doing visible work, repeated low-value API calls can become expensive once the context gets long.

Do heartbeats really cause most overnight token drain?

In the r/openclaw discussion, multiple commenters said heartbeats were the main culprit behind surprise overnight costs. The issue is not the heartbeat itself, but that each one may reprocess the entire prior thread.

Is context compaction the best way to reduce token usage?

Not always. Several OpenClaw users argued that mid-task compaction can backfire because the model must summarize, reread the summary, and often ask follow-up questions when details are lost.

What is the best workflow to stop burning tokens in OpenClaw?

A practical pattern is short sessions plus a brief handoff file at the end of each task. That keeps context small, avoids giant persistent threads, and makes heartbeat or follow-up calls much cheaper.

Should I use one premium model for everything in OpenClaw?

Usually no. A common recommendation is model tiering: use cheaper models like Qwen or DeepSeek for orchestration and maintenance, then reserve Claude Sonnet, Claude Opus, or GPT-5 for heavy reasoning and mission-critical tasks.