The supervisor agent pattern works better because it limits worker authority, centralizes memory ownership, and adds review checkpoints. CrewAI, OpenAI Agents SDK, AutoGen, and OpenClaw all point the same way: narrow workers plus one coordinator beat peer-to-peer agent teams, especially once tasks span 2 or more delegated steps.
I used to think most multi-agent loops were prompt failures.
You know the scene. The Research Agent asks the Analyst Agent for clarification. The Analyst Agent asks the Writer Agent to summarize. The Writer Agent asks the Reviewer Agent whether the summary matches the brief. The Reviewer Agent asks the Research Agent for one more source. Forty turns later, everybody is “collaborating” and nobody is finishing.
At first, that looks like a model problem. Maybe GPT-5 got chatty. Maybe Claude Opus 4.6 was too cautious. Maybe Qwen or Llama would have behaved better.
But after reading framework docs, watching real builds, and falling into a couple of very good Reddit threads, I think the bigger problem is uglier and more boring: most agent loops are management failures.
While researching this, I came across a thread on r/openclaw where one user put it better than most docs do: “The pattern that kept reappearing: when agents are designed as peers (researcher talks to analyst, analyst talks to writer, writer hands back to reviewer), nobody clearly owns the outcome.”
That line stuck with me because it explains why so many impressive demos turn into weird little bureaucracies. The agents are not under-prompted. They’re over-empowered.
And once I saw that, a bunch of vendor guidance suddenly lined up.
The weird part is the vendors mostly agree
Anthropic says the quiet part out loud. In its guidance on building agents, it says the most successful implementations it saw used simple, composable patterns instead of giant complex frameworks, and it recommends starting with the simplest solution possible before adding more agentic behavior.
That is not a small point. It means some of what people call “autonomy” is just architecture debt with a better logo.
CrewAI says the same thing, but in code. It has 2 process modes in this context: sequential is the default, and hierarchical only happens if you explicitly enable Process.hierarchical. In hierarchical mode, a manager agent coordinates workflow, delegates tasks, and validates outcomes. Delegation is even disabled by default.
That default matters. Framework authors don’t disable delegation by accident. They disable it because they know what happens when every worker starts freelancing.
Here’s the shape of it in CrewAI:
from crewai import Crew, Process, Agent
crew = Crew(
agents=[researcher],
process=Process.hierarchical,
manager_llm="gpt-4"
)
You have to choose hierarchy on purpose. You have to set a manager LLM on purpose. You have to make authority explicit.
That’s not just implementation detail. That’s org design.
What actually breaks when every agent is a peer?
Three things, almost every time.
- Authority gets blurry
- Memory gets polluted
- Stopping conditions get treated like an afterthought
The first one is the killer.
If the Research Agent, Analyst Agent, and Writer Agent can all re-scope the task, reopen completed work, or delegate sideways, then nobody owns the final answer. You don’t have a team. You have a Slack channel with no manager.
Microsoft AutoGen shows this tension really clearly. Its Group Chat pattern uses a Group Chat Manager to select the next speaker in a shared thread, and the process continues until a termination condition is met. That can work. It’s flexible. It feels smart.
It’s also exactly the kind of setup that drifts if you don’t lock it down.
AutoGen’s Mixture of Agents pattern is the cleaner alternative. It uses 2 agent types: worker agents and a single orchestrator agent. Workers are organized into layers, and the orchestrator aggregates results at the end. That is less magical, but much easier to reason about.
I’ll take easier to reason about every time.
Because once the novelty wears off, what you want from a multi-agent system is not “conversation.” You want ownership.
Who should own memory?
This is where a lot of builds quietly go off the rails.
I found another excellent r/openclaw discussion about memory writing where one user wrote: “Worker agents do not write memory at all. They emit structured memory events with a proposed scope and evidence. A separate Memory Curator agent validates, redacts, deduplicates, and routes the event... or discards it outright.”
That sounds strict. Good. It should.
If every worker can write directly into shared memory, then bad guesses become durable facts. A speculative web-search result turns into a stored preference. A one-off exception becomes policy. Yesterday’s wrong assumption becomes tomorrow’s context.
OpenClaw’s design is a nice real-world clue here. It does not treat agents like one giant shared mind. It isolates them.
Per-agent state lives under:
~/.openclaw/agents/<agentId>/...
Sessions live under:
~/.openclaw/agents/<agentId>/sessions
And auth profiles live under:
~/.openclaw/agents/<agentId>/agent/auth-profiles.json
That’s not cosmetic. It tells you how the system expects you to think: each agent is a scoped unit with its own workspace, state, auth, and session history. More org chart, less hive mind.
You can even see the operational shape from the CLI:
openclaw agents add work
openclaw agents list --bindings
Bindings route work inward. State stays local unless something with authority moves it.
That is the right instinct.
Why the supervisor agent pattern feels slower but wins anyway
The knock against the supervisor agent pattern is always the same: “Isn’t this less autonomous?”
Yes. That’s the point.
OpenAI’s Agents SDK gets this right with handoffs. Delegation is explicit, not free-form peer chatter. A triage agent can hand off to a Billing agent or Refund agent using transfer tools like transfer_to_refund_agent.
That design choice matters more than it looks.
A handoff is not a vibe. It’s a boundary.
And OpenAI’s guardrail model makes the same argument in a more technical way. It splits guardrails into 3 categories:
- input guardrails
- output guardrails
- tool guardrails
Here’s the part I think many people miss: input guardrails run only for the first agent, output guardrails only for the final agent, and tool guardrails run on every custom tool invocation.
That means if you want review checkpoints at every delegated step, you cannot just say “we added guardrails.” You need them at the tool layer, where the work actually crosses boundaries.
OpenAI also supports 2 execution modes for input guardrails: parallel execution by default with run_in_parallel=True, and blocking execution with run_in_parallel=False. In blocking mode, a tripped guardrail can prevent token consumption and tool execution.
That is exactly what a real manager does. Stop bad work before the team burns time on it.
So what should the org chart look like?
Not fancy. That’s the whole point.
Here’s the version I trust now:
1. One supervisor owns the outcome
Only one agent can reopen work, terminate the run, or approve the final result.
That line from the r/openclaw thread nails it: “Manager-only authority to reopen or terminate. Memory lives at the authority layers, specialists get scoped context only.”
That is the pattern.
2. Workers get narrow jobs
A Web Search Agent searches. A Data Analyst Agent analyzes. A Refund Agent handles refunds. A Writer Agent drafts.
They do not redefine the business objective. They do not negotiate with each other forever. They do not get to improvise across the whole task graph just because they technically can.
3. Workers should return artifacts, not opinions about what happens next
Give me a ranked source list. Give me extracted entities. Give me a diff. Give me a draft with confidence notes.
Don’t give me “I think we should ask the Planning Agent another question unless the Reviewer Agent disagrees.” That’s how you end up paying for a synthetic middle-management retreat.
4. Memory needs an owner
Either the supervisor writes memory, or a dedicated curator does.
Workers can propose memory events. They should not casually commit them.
5. Every boundary needs a checkpoint
If a worker uses a tool, writes a file, calls an API, or triggers another agent, that boundary should be inspectable and stoppable.
Not eventually. Right there.
Which pattern is actually better?
Here’s the short version.
| Approach | What it gets right |
|---|---|
| CrewAI Hierarchical Process | Manager agent coordinates and validates; delegation disabled by default; requires explicit Process.hierarchical and manager_llm |
| OpenAI Agents SDK Handoffs | Delegation via explicit transfer tools; guardrails have defined workflow boundaries; sessions provide persistent working context |
| Microsoft AutoGen Group Chat vs Mixture of Agents | Group chat uses shared thread plus manager-selected speaker; Mixture uses single orchestrator plus worker layers; termination and aggregation logic are explicit rather than implicit |
If I’m building something that has to run reliably at 2 a.m. in n8n, Make, Zapier, OpenClaw, or a custom Python worker, I’m picking hierarchical orchestration over peer collaboration almost every time.
Not because it’s more elegant. Because it fails in ways I can debug.
That’s a massively underrated feature.
But aren’t some loops still prompt problems?
Absolutely.
Not every loop is an org-design failure. Some are still caused by weak termination conditions, bad tool schemas, or plain model unreliability. AutoGen’s group chat docs explicitly rely on a termination condition, and if that condition is sloppy, even a good hierarchy can wander.
And yes, there are cases where more autonomous agents are justified. Anthropic makes a useful distinction between workflows and agents. If you need flexibility and model-driven decision-making at scale, broader autonomy can be worth it.
But that does not mean every worker should get broad autonomy by default.
That’s the leap people keep making, and it’s where the trouble starts.
If you want to build AI agent systems that survive contact with production, the boring answer is usually the right one: smaller jobs, clearer ownership, tighter agent routing, stricter memory rules.
The counterintuitive fix is giving agents less freedom
This was the part that surprised me.
The fix for endless loops is usually not a smarter prompt. It’s a smaller org chart.
Less peer-to-peer chatter. More explicit handoffs. Less shared memory. More scoped state. Fewer agents allowed to decide what happens next.
That can feel like you’re making the system less intelligent.
In practice, you’re making it less political.
And once you see multi-agent failures as org-design failures, a lot of decisions get easier. You stop asking, “How do I make my Research Agent more proactive?” and start asking, “Why is my Research Agent allowed to reopen the plan?”
That second question is much better.
It leads to systems that stop when they should, remember only what matters, and don’t turn a three-step task into a committee meeting.
If your agents keep looping, I’d stop tuning personality first.
I’d redraw the org chart.
