← Blog/Engineering

My telegram bot not replying turned out to be a full disk, not a bad model

Daniel NguyenJune 9, 2026 · 8 min read

I love how quickly we all reach for the most interesting explanation.

A Telegram bot stops replying, and suddenly the whole investigation turns into model forensics. Maybe GPT-5 is flaky. Maybe Claude Opus 4.6 is having a weird day. Maybe your OpenAI-compatible endpoint is routing badly. Maybe the prompt drifted. Maybe the context window got weird after an update.

And then you check the logs, and the machine is just out of disk.

That was the part that stuck with me while reading through OpenClaw issues this week. I found a thread on r/openclaw where someone upgraded to OpenClaw v2026.6.1 and suddenly their bot stopped replying in both Telegram and the TUI. If you’ve ever run an always-on agent, you already know how this story feels: the visible symptom screams “model problem,” but the real cause is much more embarrassing.

The TUI kept showing "assistant turn failed before producing content." The session status showed openai/gpt-5.5, a 272k context window, and a local websocket endpoint at ws://127.0.0.1:18789. If that’s all you saw, you’d absolutely start swapping models, tweaking prompts, and blaming the provider layer.

But the actual error was this:

openclaw tui - ws://127.0.0.1:18789 - agent main - session main

run error: ENOSPC: no space left on device, write

That is not GPT-5 failing. That is your agent smashing into the storage layer before it can say a word.

I think this is one of the most common debugging mistakes in agent stacks right now. The smarter the stack gets, the easier it is to blame intelligence when the real problem is plumbing.

When a Telegram bot goes silent, you usually don’t get a nice clean message saying “disk usage is at 100%.” You get silence, generic runtime errors, or a TUI that looks half alive. In OpenClaw, it can look like the assistant never got its turn out at all.

So people do the obvious thing. They rotate providers, retry with GPT-5, retry with Claude Opus 4.6, maybe test Grok or a local model, and start treating the outage like a model quality issue. That’s a reasonable instinct, but it’s often the wrong first move.

Always-on agents accumulate mess slowly and very effectively. Sessions grow, logs grow, SQLite files grow, plugin state grows, Telegram history grows, and all of that quietly eats the machine underneath you. If you’re running on a VPS, a tiny cloud instance, or some neglected box you set up months ago, storage becomes a delayed-action failure.

One commenter in that same OpenClaw thread said it plainly: they had the same problem because OpenClaw backed up and ran out of space. The original poster replied that their sessions had basically eaten through storage. That’s the whole story in one sentence.

Not “the model got worse.” Not “Telegram broke.” Not “the router is bad.” The agent kept living until the floor gave out.

What made the thread more interesting is that disk wasn’t even the only issue. There were also warnings pointing to a second class of problem: local state and migration damage after the upgrade.

One warning mentioned conflicting plugin metadata in SQLite and included this line: “Left plugin install index in place because shared SQLite state has conflicting plugin install metadata for: codex.” That’s exactly the kind of message that tells you the update may have exposed old state problems, not just created a new one.

This is where debugging gets annoying. You free up disk, restart the process, and the bot still doesn’t behave normally. That’s the moment people decide they were right all along and go back to blaming GPT-5.5.

But it still might not be the model.

I found another r/openclaw thread about v2026.6.1 that made the pattern even clearer. One user explained that recent versions moved from bundled providers to plugins, so the configuration changed and you now need to install the plugins properly. They also said OpenClaw doctor fixes it.

That one comment is more useful than a lot of dramatic speculation. It shifts the whole conversation out of prompt theater and back into operations, where most of these failures actually live.

Here’s the practical breakdown.

Storage exhaustion (ENOSPC)

What it looks like: assistant fails before producing content, Telegram goes quiet, TUI throws vague runtime errors
Where it shows up: local runtime logs, write failures, swollen session or log directories
What fixes it: free disk space first, then inspect what’s growing and why

Plugin or provider migration issues after an update

What it looks like: things break right after a version bump, provider config suddenly seems wrong, Telegram and TUI both act unstable
Where it shows up: OpenClaw doctor output, migration warnings, missing provider plugins, SQLite metadata conflicts
What fixes it: run OpenClaw doctor, reinstall or reconfigure plugins, verify the new provider architecture

Model or context misconfiguration

What it looks like: “context too large” errors, execution failures, weird behavior that appears model-specific
Where it shows up: provider settings, model config, context registration after upgrades
What fixes it: verify the configured context size and make sure it matches what the provider plugin actually supports

That’s the lesson I wish more teams internalized. Check the machine, the state, and the migration path before you start blaming the model.

The v2026.6.1 discussion had more clues that point in the same direction. People reported channel dropouts for specific agents, compaction running six times in a row on active sessions, and some users reverting back to 5.20 because the newer build felt unstable. There was also confusion caused by the move from bundled providers to plugins.

None of that sounds like GPT-5 suddenly forgetting how to answer a Telegram message. It sounds like a local runtime that changed under load, with state and configuration issues layered on top.

And honestly, this matters even more if you’re using an OpenAI API-compatible service behind OpenClaw. The temptation is to blame the upstream router first. Sometimes that’s fair. A lot of the time, the fire started on your own box.

This is one reason I think predictable infrastructure matters more than people admit. Teams spend so much time optimizing prompts and model choice, but the real pain often comes from not knowing whether the failure is billing, throttling, local state, provider setup, or resource exhaustion. If you’re running agents in OpenClaw, n8n, Make, Zapier, or custom workflows, the best setup is the one that removes whole categories of uncertainty.

That’s also why Standard Compute is such a useful fit for agent-heavy workloads. It gives you an OpenAI-compatible endpoint, but with flat monthly pricing instead of per-token billing, so you can stop treating every retry, long session, or always-on automation like a financial event. When you’ve ruled out local disk, state corruption, and migration issues, it’s nice not to have token anxiety mixed into the debugging process.

If your bot goes silent, here’s the order I’d actually use.

First, check disk space immediately. Look for full volumes, oversized logs, huge session stores, and runaway SQLite files. If you see ENOSPC anywhere, stop and fix storage before touching prompts.

Second, run OpenClaw doctor. Especially after upgrading to 2026.6.1 or later, because the provider architecture changed and missing plugins can make healthy models look dead.

Third, inspect migration and plugin warnings. If you see SQLite conflicts, legacy migration messages, or plugin install metadata issues around codex or provider plugins, assume local state is suspect.

Fourth, verify provider and model config. Make sure the installed plugins match your actual config and confirm context size settings after the update.

Only after that would I spend time comparing GPT-5, Claude Opus 4.6, Grok, Qwen, or Llama variants. Model testing is useful, but only once the machine underneath the agent is healthy.

To be fair, sometimes it really is the model configuration. In the OpenClaw update thread, some users also reported “context too large” errors after updating. One commenter said it came from a bug in how context size was read or registered, and that setting a smaller or correct context size in config could work around it.

But notice what kind of problem that still is. It’s not “Claude got worse” or “GPT-5 became unreliable.” It’s a configuration mismatch between what OpenClaw thinks the model can handle and what the provider setup actually supports.

That distinction matters a lot. Too many teams treat every failure like a model intelligence problem, when the real causes are usually much less glamorous: disk, state, migrations, sockets, plugin installs, stale credentials, bad context registration.

The weird thing is that these boring failures become more common as your stack gets more advanced. The more autonomous your agents are, the more humiliating the outages become.

What I liked about those OpenClaw threads is that they show the truth in public. One build identifier, 2e08f0f. One local websocket endpoint, ws://127.0.0.1:18789. One failing session on openai/gpt-5.5. One brutally ordinary runtime error: no space left on device.

That’s not glamorous. It’s better than glamorous, because it’s fixable.

So if your Telegram bot stops replying right after an upgrade, assume operations first and models second. Check the disk. Check plugin state. Check migration warnings. Then look at provider config.

Because if your agent can’t write to disk, GPT-5 never even gets a chance to be wrong.

My telegram bot not replying turned out to be a full disk, not a bad model

Keep reading

My telegram bot not replying turned out to be a full disk, not a bad model

I got excited about free Nemotron and Kimi too, then my always-on agent started falling apart