The 3 AM alert that started it all
It was 3 AM when the monitoring ping came in. Our document review agent had processed a new batch of contracts but the output was incomplete and the source files had vanished from the shared drive. I pulled up the logs and saw the telltale signs of an OpenClaw update that had triggered gateway freezes and file handling regressions.
We had been trying to automate the repetitive legal work that eats up associate hours: scanning intake forms for key details, reviewing documents for specific clauses, and running background case research. Persistent agents running 24/7 seemed like the obvious next step.
That night I realized the latest OpenClaw version was not our friend.
Locking the version after the deletions
Community reports had already flagged post-4.23 updates for breaking multi-step research workflows and causing CPU spikes. One firm saw files deleted outright after the 4.29 release. We ran the rollback immediately.
npm install -g openclaw@2026.4.23
The gateway stabilized and file operations returned to normal. But the experience left us wary of any future updates. We decided any new deployment would stay pinned to 2026.4.23 until we saw long-term stability data.
That fix bought us time, but it was only the first layer of problems.
Local models looked perfect until they weren't
For client files we could not risk sending anywhere, we started with local Qwen3.5 9b and 32b models through Ollama. Privacy was absolute. Nothing left the office network.
The models handled basic summarization fine. Then we asked them to chain tools for a full intake workflow: extract data, cross-reference against templates, flag inconsistencies, and log results. Tool use became inconsistent and multi-step reasoning collapsed. Context limits on smaller local setups like Qwen 8b running on 2070 GPUs with 8GB VRAM capped everything under 16k tokens, triggering constant compaction and session drift.
A colleague's firm with unlimited budget tried 20-30B parameter local models with built-in verification. They were advised to switch to AWS Bedrock frontier models instead because full local hardware would lock them into obsolete machines within three years while still failing complex document tasks.
We kept local models for initial screening but added hybrid fallbacks to cloud models with data processing agreements.
The $35 daily surprise that changed the budget
Even with the stable OpenClaw version, background heartbeats and plugin loading kept running. Without any active queries, we were seeing $35+ in tokens consumed every day. Over a month that added up fast.
Before we disabled the heartbeats, one multi-agent research setup was hitting $700 monthly on Anthropic models alone. Per-token pricing made forecasting impossible for always-on legal agents. An audit of our skills showed 70% had made zero API calls in the prior 30 days yet still carried full outbound permissions. That created both maintenance overhead and a lingering security risk with privileged documents.
We edited the config to turn the background noise off.
Cloud delays that killed timely intake
When we needed reliable reasoning we hit the next wall. OAuth connections to ChatGPT Pro and Claude frequently ran into Cloudflare blocks, adding 20+ second delays on every request. That killed any hope of smooth client intake automation where quick responses matter.
A test setup on a Raspberry Pi Model B with free-tier Gemini successfully automated Gmail triage and Google Drive uploads for daily digests. Browser automation kept crashing on Chromium during research tasks until we switched to Firefox headless. A separate multi-agent setup on a VPS using n8n for orchestration handled central research well but required separate local clients per team member to keep sensitive intake data off the shared server.
Side-by-side comparison of the real options
We laid out the tradeoffs in a simple table to decide what to run where.
| Criteria | Local Qwen models | AWS Bedrock Claude/GPT |
|---|---|---|
| --- | --- | --- |
| Data residency and compliance guarantees | Data never leaves the firm | Requires explicit data processing agreements |
| --- | --- | --- |
| Multi-step agent reliability for document review | Frequently fails on tool chaining and long context | Consistent performance on complex legal reasoning |
| --- | --- | --- |
| Hardware and operational costs | High upfront GPU spend with VRAM limits | Subscription model but predictable after optimization |
Local models won on privacy but lost on reliability. Cloud options won on reasoning depth but needed careful contract language and monitoring for delays.
The config changes that finally made it stick
After the rollback we ran the setup command and edited openclaw.json.
openclaw config
Under agents.defaults.model we pointed to local Ollama Qwen for privacy-sensitive steps and openrouter fallback for complex research. We added the line to disable background costs.
heartbeat.every: off
For Ollama we confirmed the base URL was http://127.0.0.1:11434 (not localhost) and kept ollama serve running in its own terminal before starting the gateway. This combination kept intake forms local while routing document review and research through stable cloud models only when needed.
The real surprise was how much of the budget and reliability problems came from background processes and unused permissions rather than the core model choice. Locking the OpenClaw version, pruning unused skills regularly, and accepting hybrid routing for different task types turned the agents into something we could actually trust with client work. We still audit permissions every month and keep the stable version pinned. That combination has kept the workflows running without further regressions or unexpected bills.
