My OpenClaw agent looked idle overnight and still burned through tokens

Sarah MitchellMay 12, 2026 · 9 min read

I found a small r/openclaw thread the other day that explained a problem I keep seeing in way too many OpenClaw setups: the agent looks quiet, nothing dramatic happened, and somehow the bill still gets ugly by morning.

The post itself wasn’t huge. It had 14 upvotes and 29 comments. But it hit on something painfully real: a lot of surprise token burn doesn’t come from your agent doing impressive work. It comes from your agent doing almost nothing, over and over, while quietly resending a bloated context window.

That’s such an annoying answer that I almost trust it more.

Nobody in the thread was pretending this was some grand frontier-AI puzzle. The vibe was much more practical: why did I go to sleep and wake up poorer? Honestly, that’s the kind of question I want from OpenClaw users, because it gets to the part that actually changes whether people keep building agents or stop using them.

The strongest comments converged fast, and they converged on one word: heartbeats.

One commenter said it better than most docs ever do: “heartbeats. that's almost always the culprit when you wake up to a drained budget and nothing useful happened overnight. every heartbeat call passes your entire conversation context through the API.” Once you read that, a lot of weird OpenClaw billing suddenly makes sense.

Your agent may look idle from the outside. But under the hood, it’s checking in, polling, keeping the loop alive, and dragging the whole thread back through GPT-5, Claude, Qwen, or whatever model you connected. If the thread is long, each tiny check-in stops being tiny.

That was the first thing that clicked for me. The second was even more useful: the fix people trusted most was not some elegant memory architecture. It was discipline.

A lot of people assume token burn comes from the flashy parts of agent work. Big code generation, long planning traces, giant browser runs, complex chains of thought. And sure, sometimes it does.

But that wasn’t the pattern this thread kept circling. The pattern was that long-lived OpenClaw sessions quietly rot your budget. Every new turn makes the next turn heavier, and then heartbeats turn that heaviness into a recurring charge.

That’s the part that gets people. What feels like persistent context starts acting like a tax.

And this gets worse fast if you’re running multiple agents. In a related OpenClaw discussion, someone mentioned a custom dashboard managing 14 agents. One bloated thread is annoying. Fourteen bloated threads is how you end up managing spend instead of building the thing you wanted in the first place.

If your loop looks something like heartbeat, send current state plus prior conversation, get a response, wait, repeat, then your cost is tied to context size as much as it is to visible activity. That’s why an agent can look asleep while still chewing through tokens.

It isn’t sleeping. It’s carrying baggage.

Yesterday’s notes, half-finished plans, old tool outputs, stale decisions, random side quests from six hours ago — all of that keeps getting shoved back into the model. And the longer you let the thread live, the more expensive every “small” call becomes.

At first, you might think the obvious fix is aggressive summarization. Compress the context, keep the thread alive, move on. That sounds clean.

But the thread was surprisingly skeptical of that approach, and I think the commenters were right.

One of the best comments argued for a different workflow entirely: break tasks into short sessions, write a brief handoff file at the end, and start fresh. Their point was that compaction in the middle of a task often burns more than it saves.

That matches what I’ve seen too. Summarization sounds efficient until you look at the real token path: the model reads the giant thread, writes a summary, the next call reads the summary, then something important is missing, and suddenly you’re spending more tokens repairing the summary than you would have spent just structuring the work better in the first place.

That’s not really compression. It’s a second conversation about the first conversation.

And mid-task, that can get fragile fast. A file path disappears. A small constraint gets softened. A previous decision gets flattened into something vague. Then Claude or GPT-5 has to reconstruct intent from incomplete notes, which is both expensive and weirdly error-prone.

The workflow Reddit seemed to trust more was much less magical. Finish a bounded task. Write a short handoff file. Start a fresh session. Only carry forward what actually matters.

It’s not elegant in the sci-fi sense. It is elegant in the engineering sense.

Here’s the real tradeoff the thread was pointing at:

Keep one long-lived session

Easy to manage at first
Context bloat grows quietly over time
Heartbeat costs compound because old context keeps getting resent

Use short sessions plus a handoff file

Lower context on each call
Requires explicit state management
Usually beats mid-task compaction because you avoid repeated repair work

Use a supervisor agent plus scripts or cron jobs

Best for repetitive workflows
Moves deterministic work out of the LLM loop
Cuts expensive reasoning on tasks that never needed a frontier model in the first place

That last pattern is where the discussion got much smarter.

One thing I wish more OpenClaw users admitted sooner is that a lot of agent workloads are full of boring, deterministic steps. Poll a source. Check if a file changed. Gather data. Run a script. Compare output. Trigger a condition.

Those are not good jobs for Claude Opus or GPT-5. Those are jobs for Python, bash, cron, or n8n.

Several people in this thread and related OpenClaw budget discussions were basically saying the same thing in different ways: stop making the model do routine labor. Let the model supervise. Let scripts handle the chores.

That architecture shift matters more than most prompt tweaks. Instead of one always-on OpenClaw agent dragging a massive thread around all day, you run small background jobs with tiny context and only wake the main agent when something actually changed.

That is a much saner design. It’s the difference between an assistant who gets called when needed and an intern who interrupts you every five minutes just to say nothing happened.

The thread also had one of my favorite kinds of advice: boring and correct. One commenter described their setup like this: Qwen 9B as the default, 1-hour heartbeats, Claude Sonnet for heavy tasks, and Claude Opus for mission-critical work.

That is not glamorous advice. It is very good advice.

A lot of OpenClaw users wire one expensive model into every part of the loop because it feels simpler. Then they realize they’re paying premium rates for heartbeat checks, low-stakes orchestration, and maintenance tasks that do not need premium reasoning.

That’s just bad architecture.

Use the expensive brain when the work is actually expensive. Use cheaper models for lightweight routing, maintenance, and routine checks. Save Claude Sonnet, Claude Opus, or GPT-5 for moments where quality really changes the outcome.

One commenter even mentioned using Granite 3B locally to compress around 65k of context so they weren’t repeatedly shipping everything to a cloud API. I liked that comment because it didn’t sound like a guru trying to sell a perfect system. It sounded like someone who got tired of paying premium prices for a task a local model could do well enough.

A practical setup might look something like this in spirit: 1-hour heartbeats, Qwen 9B as the orchestrator, Claude Sonnet for heavier reasoning, Claude Opus for high-stakes tasks, Granite 3B or another local model for compression, and a hard rule that each bounded task ends with a handoff file and a fresh session.

Will that eliminate token burn? No. But it will stop a lot of the dumbest token burn.

And yes, there was also the blunt version of the answer in the thread: sometimes the fix is just using a cheaper model. One commenter basically said, you can’t fully stop it, easy way is use DeepSeek. Crude, but not wrong.

Sometimes people chase perfect memory systems when the simpler answer is that they’re using an expensive model for work that doesn’t justify it. Not every problem needs premium inference.

What makes this issue sting, though, is that it’s not just technical. It’s emotional.

The original poster said they had “burned so much money just overnight,” and that line captures the whole thing. This isn’t like a normal cloud bill where you know a server was running. This feels like a meter spinning while you sleep.

Another commenter put it even harder: “Running model api keys with a blank cheque is setting fire to your cash at this point.” That’s exactly why token-based pricing changes developer behavior in ways people don’t always admit.

Once you’ve been burned a few times, you start building scared. You shorten prompts. You reduce retries. You avoid experiments. You cut monitoring. You stop letting agents run freely because every extra loop feels like financial risk.

That fear shapes the product just as much as the model does.

And that’s the part I keep coming back to. The real damage from surprise token burn isn’t just the invoice. It’s the way it trains people to build smaller, more timid systems than they actually want.

That’s also why Standard Compute feels relevant here. If you’re running OpenClaw for always-on agents, background automations, or workflows that naturally spike and drift in usage, per-token billing creates constant pressure to optimize for cost instead of outcome. Standard Compute takes the opposite approach: flat monthly pricing, OpenAI-compatible API access, and dynamic routing across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20 so you can stop obsessing over every heartbeat and every retry.

That doesn’t remove the need for good architecture. You should still shorten sessions, use handoff files, and move repetitive work into scripts. But it does remove the weird psychological tax of wondering whether your agent is quietly setting money on fire while you sleep.

After reading the whole thread, my main takeaway was simple: stop treating one OpenClaw session like a forever-brain.

That design is what burns people.

If I were setting up OpenClaw today and wanted to cut the dumbest waste first, I’d stretch heartbeat frequency way out and start with 1-hour heartbeats. I’d end tasks aggressively instead of dragging giant live threads forward. I’d write a tiny handoff file at the end of each session, move deterministic work into Python, bash, cron, or n8n, and tier models by value instead of prestige.

And if I knew the workload was going to be persistent, spiky, or hard to predict, I’d seriously consider whether per-token billing was the wrong foundation from the start.

That was the real lesson buried in this little r/openclaw thread. If your OpenClaw bill keeps surprising you, the problem usually isn’t intelligence.

It’s shape.

Change the shape of the work, and the bill changes too. And if you’re tired of shaping everything around token fear, that’s probably your sign to change the billing model as well.

My OpenClaw agent looked idle overnight and still burned through tokens

Keep reading

My OpenClaw agent looked idle overnight and still burned through tokens

Why does nobody talk about how expensive idle OpenClaw agents are?