Multi agent orchestration works better when agents pass structured handoff notes instead of chatting freely. The most useful pattern I found while researching OpenClaw was simple: Agent 1 writes an explicit proposal, Agent 2 reviews it fresh, and many OpenClaw contexts already enforce that separation — even cron jobs start a new session per run.
I used to think the “advanced” version of multi agent orchestration was obvious.
More agents. More channels. More live chatter.
If one GPT-5 agent is useful, then surely two GPT-5 agents arguing in Discord is better. Add Claude for review, maybe Qwen for code cleanup, and now you’ve got a little AI company inside your laptop.
That idea sounds great for about ten minutes.
Then one agent starts echoing the other. The channel fills with fluff. Nobody knows which answer is final. You come back two hours later and realize the hardest part is no longer generation. It’s supervision.
While researching OpenClaw setups, I came across a thread on r/openclaw where someone was trying to make two OpenClaw agents collaborate in a Telegram group. The post itself had a score of 8, which felt about right: not a giant viral thread, but exactly the kind of practical question people ask once they stop playing with demos and start trying to make agents do real work.
And the best answer was not “use more chat.”
It was the opposite.
A commenter put it perfectly: “Works better if they don't actually chat in real time. Have Agent 1 write a structured note (its full thinking + proposal), then trigger Agent 2 to review it fresh — no shared conversation history. Each agent reading the other's output without inheriting all the prior context tends to produce better feedback than live back-and-forth, where they start agreeing with each other too quickly.”
That comment only had a score of 5, but honestly it should be the default design doc for half the people trying to build AI agent workflows right now.
The weird part is OpenClaw already nudges you this way
This is what made the Reddit advice click for me.
OpenClaw’s own session model is not built around one giant immortal shared chat. It already pushes toward isolation.
Its docs say direct messages are shared by default, but group chats, rooms and channels, webhooks, and cron jobs are isolated. And cron jobs get a fresh session per run. That matters more than it sounds.
Because the Reddit advice — “trigger Agent 2 fresh” — is not some hacky workaround. It lines up with how OpenClaw already treats many execution contexts.
Here’s the kind of config OpenClaw documents for session isolation:
{ "session": { "dmScope": "per-channel-peer" } }
And there’s another little detail I loved because it’s so concrete: OpenClaw’s daily session reset defaults to creating a new session at 4:00 AM local time on the gateway host.
That is not the design of a framework that wants your agents marinating forever in one soup of accumulated context.
It’s the design of a framework that assumes boundaries are healthy.
Why do live agent conversations feel smart and perform dumb?
Because live chat creates fake progress.
When two agents bounce messages back and forth in Discord or Telegram, it looks like collaboration. But a lot of the time it’s just convergence theater. They start aligning too early, repeating assumptions, and reinforcing the same mistakes.
The Reddit comment above nailed the failure mode: agents with shared conversation history start agreeing with each other too quickly.
That should make every builder pause.
If your goal is critique, you do not want premature agreement. You want distance. You want a second pass that arrives a little skeptical and a little annoying.
That’s why the supervisor agent pattern is so much better than free-form bot banter. One agent produces an artifact. Another agent reviews it from a cleaner starting point. If needed, a third agent checks for policy, tests, or edge cases.
That is orchestration.
Not a room full of bots roleplaying a startup.
The real problem isn’t sociability — it’s drift
The second Reddit thread is where this stopped being a style preference and started looking like an operational rule.
In another discussion on r/openclaw, users were talking about long-running workflows getting harder to supervise over time. Multiple terminals. Unfinished runs. Half-done coding tasks. Research sessions that made sense yesterday and felt cryptic today.
That is the actual pain.
Not “my agents need to socialize better.”
One commenter said it bluntly: “Drift happens very quickly. Markdown files are not enough, you need to have recursive loops that check against known good states.”
Yes. Exactly.
That is a checkpoints problem. A verification problem. A bounded retries problem.
It is not a “put Claude and GPT-5 in a Telegram room and let them vibe” problem.
What drift looks like in real life
If you’ve run a serious OpenClaw workflow, this probably feels familiar:
- one agent is coding
- another is running automation
- a third is summarizing research
- one run failed quietly
- one run is technically still active but no longer useful
- you come back later and can’t tell what state is trustworthy
At that point, shared chat history becomes a liability.
You don’t want more transcript. You want the latest approved artifact, the current task state, and a reviewer who can compare output against something stable.
Discord and Telegram can work — but should they be your default?
This is where people get defensive, so I’ll be precise.
Real-time agent chat is possible.
In the same r/openclaw thread, one commenter said they had Discord channels with multiple OpenClaw agents collaborating in one channel. Another person mentioned building a custom Web2 group chat where agents can alternate replies or even run in “chaos mode.”
That sounds fun. It might even be useful for brainstorming.
But “possible” and “good default” are not the same thing.
Another commenter in that thread said, “telegram doesnt allow bots to speak directly together. Discord does apparently but it wrecked my head trying to set it up. Just gave up and used the method ultrathink suggested below.”
That sentence contains more truth than most architecture diagrams.
External chat apps add friction. They also add token waste.
One Reddit comment with a score of 3 warned that Telegram bots should only consume messages they’re explicitly tagged in, “otherwise you're just burning tokens.” That’s exactly right. Shared channels are expensive not only because of cost, but because they encourage agents to ingest irrelevant chatter.
Here’s the practical comparison:
| Approach | What actually happens |
|---|---|
| Real-time agent chat in Discord or Telegram | Shared live conversation history, higher supervision overhead, more token burn, and faster convergence/agreement |
| Structured reviewer handoff | Agent 1 writes an explicit note or proposal, Agent 2 reviews with fresh context, better critique, checkpoints, and auditability |
OpenClaw internal coordination via session_send() or files | Internal direct communication, less platform friction than Discord or Telegram, fits deterministic routing and workspace-based workflows |
That last row matters most.
What should you do instead?
Use artifacts. Use explicit notes. Use fresh sessions.
OpenClaw is actually pretty good at this if you stop trying to force it into “everyone hangs out in one room” mode. Its routing and storage model already supports artifact-based handoffs better than chat-room improvisation.
Sessions are keyed by channel or thread context. Transcripts are stored as JSON and JSONL. Routing is deterministic instead of guessed by a model. Session data lives in paths like:
~/.openclaw/agents/<agentId>/sessions/sessions.json
That makes file-based and session-based coordination much easier to reason about.
OpenClaw users also pointed out that agents can communicate internally with session_send(), and one Reddit comment mentioned telling agents to populate files in other workspaces. That is much closer to how adults hand off work: here’s the brief, here’s the output, here’s what I’m unsure about, your turn.
My preferred handoff pattern
If I were setting up a production workflow today, I’d do this:
- Worker agent does the first pass and writes a structured note.
- The note includes goal, assumptions, proposed output, open questions, and failure risks.
- Reviewer agent gets only the note and the artifact, not the full chat history.
- Reviewer either approves, rejects, or requests a bounded revision.
- A supervisor agent pattern handles retries and checks against a known good state.
The key is that every step leaves a visible artifact.
Not vibes. Not chatter. Evidence.
So when is agent-to-agent chat actually worth it?
A few cases.
Brainstorming and exploration
If you want GPT-5, Claude, and Llama to generate lots of divergent ideas fast, a temporary shared channel can be useful. You’re not looking for auditability there. You’re looking for breadth.
Cross-server or human-adjacent coordination
If agents live on different machines, or need to interact with other people, Telegram or Discord may still help. One commenter specifically recommended Telegram channels for agents on other servers or with other people, while also calling the setup finicky.
That feels right to me: useful at the edges, annoying at the center.
Demo value
A room full of agents talking is impressive. It demos well. It makes people feel like they’re seeing the future.
But if your goal is reliable output instead of a cool screen recording, I would not start there.
The best multi-agent teams act less like a group chat and more like a newsroom
This was the surprising part for me.
The best pattern is not “everyone talk constantly.” It’s closer to an editor workflow.
A reporter files a draft. An editor reviews it fresh. Fact-checking happens against explicit claims. Revisions are bounded. The final version is approved and archived.
That structure is less magical than autonomous bot chatter.
It is also better.
If you’re trying to build AI agent systems that survive contact with real work, that’s the design lesson I’d steal from these OpenClaw threads. Don’t optimize for sociable agents. Optimize for legible handoffs, fresh review, and checkpoints that stop drift before it becomes lore.
Because the failure mode in multi-agent work is rarely silence.
It’s two agents confidently talking each other into the same mistake.
