← Blog/Guide

I stopped blaming prompts for agent loops when I realized my agents had a management problem

Elena VasquezMay 24, 2026 · 9 min read

Agent Loop Diagnosis

The problem wasn’t prompts. It was management.

Work owner

Supervisor

Shared memory

Scoped

Stop condition

Defined

The supervisor agent pattern works better because it limits worker authority, centralizes memory ownership, and adds review checkpoints. CrewAI, OpenAI Agents SDK, AutoGen, and OpenClaw all point the same way: narrow workers plus one coordinator beat peer-to-peer agent teams, especially once tasks span 2 or more delegated steps.

The supervisor agent pattern works better because it limits worker authority, centralizes memory ownership, and adds review checkpoints. CrewAI, OpenAI Agents SDK, AutoGen, and OpenClaw all point the same way: narrow workers plus one coordinator beat peer-to-peer agent teams, especially once tasks span 2 or more delegated steps.

I used to think most multi-agent loops were prompt failures.

You know the scene. The Research Agent asks the Analyst Agent for clarification. The Analyst Agent asks the Writer Agent to summarize. The Writer Agent asks the Reviewer Agent whether the summary matches the brief. The Reviewer Agent asks the Research Agent for one more source. Forty turns later, everybody is “collaborating” and nobody is finishing.

At first, that looks like a model problem. Maybe GPT-5 got chatty. Maybe Claude Opus 4.6 was too cautious. Maybe Qwen or Llama would have behaved better.

But after reading framework docs, watching real builds, and falling into a couple of very good Reddit threads, I think the bigger problem is uglier and more boring: most agent loops are management failures.

While researching this, I came across a thread on r/openclaw where one user put it better than most docs do: “The pattern that kept reappearing: when agents are designed as peers (researcher talks to analyst, analyst talks to writer, writer hands back to reviewer), nobody clearly owns the outcome.”

That line stuck with me because it explains why so many impressive demos turn into weird little bureaucracies. The agents are not under-prompted. They’re over-empowered.

And once I saw that, a bunch of vendor guidance suddenly lined up.

The weird part is the vendors mostly agree

Anthropic says the quiet part out loud. In its guidance on building agents, it says the most successful implementations it saw used simple, composable patterns instead of giant complex frameworks, and it recommends starting with the simplest solution possible before adding more agentic behavior.

That is not a small point. It means some of what people call “autonomy” is just architecture debt with a better logo.

CrewAI says the same thing, but in code. It has 2 process modes in this context: sequential is the default, and hierarchical only happens if you explicitly enable Process.hierarchical. In hierarchical mode, a manager agent coordinates workflow, delegates tasks, and validates outcomes. Delegation is even disabled by default.

That default matters. Framework authors don’t disable delegation by accident. They disable it because they know what happens when every worker starts freelancing.

Here’s the shape of it in CrewAI:

from crewai import Crew, Process, Agent

crew = Crew(
    agents=[researcher],
    process=Process.hierarchical,
    manager_llm="gpt-4"
)

You have to choose hierarchy on purpose. You have to set a manager LLM on purpose. You have to make authority explicit.

That’s not just implementation detail. That’s org design.

What actually breaks when every agent is a peer?

Three things, almost every time.

Authority gets blurry
Memory gets polluted
Stopping conditions get treated like an afterthought

The first one is the killer.

If the Research Agent, Analyst Agent, and Writer Agent can all re-scope the task, reopen completed work, or delegate sideways, then nobody owns the final answer. You don’t have a team. You have a Slack channel with no manager.

Microsoft AutoGen shows this tension really clearly. Its Group Chat pattern uses a Group Chat Manager to select the next speaker in a shared thread, and the process continues until a termination condition is met. That can work. It’s flexible. It feels smart.

It’s also exactly the kind of setup that drifts if you don’t lock it down.

AutoGen’s Mixture of Agents pattern is the cleaner alternative. It uses 2 agent types: worker agents and a single orchestrator agent. Workers are organized into layers, and the orchestrator aggregates results at the end. That is less magical, but much easier to reason about.

I’ll take easier to reason about every time.

Because once the novelty wears off, what you want from a multi-agent system is not “conversation.” You want ownership.

Who should own memory?

This is where a lot of builds quietly go off the rails.

I found another excellent r/openclaw discussion about memory writing where one user wrote: “Worker agents do not write memory at all. They emit structured memory events with a proposed scope and evidence. A separate Memory Curator agent validates, redacts, deduplicates, and routes the event... or discards it outright.”

That sounds strict. Good. It should.

If every worker can write directly into shared memory, then bad guesses become durable facts. A speculative web-search result turns into a stored preference. A one-off exception becomes policy. Yesterday’s wrong assumption becomes tomorrow’s context.

OpenClaw’s design is a nice real-world clue here. It does not treat agents like one giant shared mind. It isolates them.

Per-agent state lives under:

~/.openclaw/agents/<agentId>/...

Sessions live under:

~/.openclaw/agents/<agentId>/sessions

And auth profiles live under:

~/.openclaw/agents/<agentId>/agent/auth-profiles.json

That’s not cosmetic. It tells you how the system expects you to think: each agent is a scoped unit with its own workspace, state, auth, and session history. More org chart, less hive mind.

You can even see the operational shape from the CLI:

openclaw agents add work
openclaw agents list --bindings

Bindings route work inward. State stays local unless something with authority moves it.

That is the right instinct.

Why the supervisor agent pattern feels slower but wins anyway

The knock against the supervisor agent pattern is always the same: “Isn’t this less autonomous?”

Yes. That’s the point.

OpenAI’s Agents SDK gets this right with handoffs. Delegation is explicit, not free-form peer chatter. A triage agent can hand off to a Billing agent or Refund agent using transfer tools like transfer_to_refund_agent.

That design choice matters more than it looks.

A handoff is not a vibe. It’s a boundary.

And OpenAI’s guardrail model makes the same argument in a more technical way. It splits guardrails into 3 categories:

input guardrails
output guardrails
tool guardrails

Here’s the part I think many people miss: input guardrails run only for the first agent, output guardrails only for the final agent, and tool guardrails run on every custom tool invocation.

That means if you want review checkpoints at every delegated step, you cannot just say “we added guardrails.” You need them at the tool layer, where the work actually crosses boundaries.

OpenAI also supports 2 execution modes for input guardrails: parallel execution by default with run_in_parallel=True, and blocking execution with run_in_parallel=False. In blocking mode, a tripped guardrail can prevent token consumption and tool execution.

That is exactly what a real manager does. Stop bad work before the team burns time on it.

So what should the org chart look like?

Not fancy. That’s the whole point.

Here’s the version I trust now:

1. One supervisor owns the outcome

Only one agent can reopen work, terminate the run, or approve the final result.

That line from the r/openclaw thread nails it: “Manager-only authority to reopen or terminate. Memory lives at the authority layers, specialists get scoped context only.”

That is the pattern.

2. Workers get narrow jobs

A Web Search Agent searches. A Data Analyst Agent analyzes. A Refund Agent handles refunds. A Writer Agent drafts.

They do not redefine the business objective. They do not negotiate with each other forever. They do not get to improvise across the whole task graph just because they technically can.

3. Workers should return artifacts, not opinions about what happens next

Give me a ranked source list. Give me extracted entities. Give me a diff. Give me a draft with confidence notes.

Don’t give me “I think we should ask the Planning Agent another question unless the Reviewer Agent disagrees.” That’s how you end up paying for a synthetic middle-management retreat.

4. Memory needs an owner

Either the supervisor writes memory, or a dedicated curator does.

Workers can propose memory events. They should not casually commit them.

5. Every boundary needs a checkpoint

If a worker uses a tool, writes a file, calls an API, or triggers another agent, that boundary should be inspectable and stoppable.

Not eventually. Right there.

Which pattern is actually better?

Here’s the short version.

Approach	What it gets right
CrewAI Hierarchical Process	Manager agent coordinates and validates; delegation disabled by default; requires explicit `Process.hierarchical` and `manager_llm`
OpenAI Agents SDK Handoffs	Delegation via explicit transfer tools; guardrails have defined workflow boundaries; sessions provide persistent working context
Microsoft AutoGen Group Chat vs Mixture of Agents	Group chat uses shared thread plus manager-selected speaker; Mixture uses single orchestrator plus worker layers; termination and aggregation logic are explicit rather than implicit

If I’m building something that has to run reliably at 2 a.m. in n8n, Make, Zapier, OpenClaw, or a custom Python worker, I’m picking hierarchical orchestration over peer collaboration almost every time.

Not because it’s more elegant. Because it fails in ways I can debug.

That’s a massively underrated feature.

But aren’t some loops still prompt problems?

Absolutely.

Not every loop is an org-design failure. Some are still caused by weak termination conditions, bad tool schemas, or plain model unreliability. AutoGen’s group chat docs explicitly rely on a termination condition, and if that condition is sloppy, even a good hierarchy can wander.

And yes, there are cases where more autonomous agents are justified. Anthropic makes a useful distinction between workflows and agents. If you need flexibility and model-driven decision-making at scale, broader autonomy can be worth it.

But that does not mean every worker should get broad autonomy by default.

That’s the leap people keep making, and it’s where the trouble starts.

If you want to build AI agent systems that survive contact with production, the boring answer is usually the right one: smaller jobs, clearer ownership, tighter agent routing, stricter memory rules.

The counterintuitive fix is giving agents less freedom

This was the part that surprised me.

The fix for endless loops is usually not a smarter prompt. It’s a smaller org chart.

Less peer-to-peer chatter. More explicit handoffs. Less shared memory. More scoped state. Fewer agents allowed to decide what happens next.

That can feel like you’re making the system less intelligent.

In practice, you’re making it less political.

And once you see multi-agent failures as org-design failures, a lot of decisions get easier. You stop asking, “How do I make my Research Agent more proactive?” and start asking, “Why is my Research Agent allowed to reopen the plan?”

That second question is much better.

It leads to systems that stop when they should, remember only what matters, and don’t turn a three-step task into a committee meeting.

If your agents keep looping, I’d stop tuning personality first.

I’d redraw the org chart.

Frequently Asked Questions

What is the supervisor agent pattern?

The supervisor agent pattern is a multi-agent design where one coordinator agent owns the outcome, delegates narrow tasks to worker agents, reviews results, and decides when to stop. Workers do specialized jobs but do not freely redefine goals, reopen tasks, or write to shared memory without approval.

Why do multi-agent systems get stuck in loops?

Many loops happen because authority is unclear, memory is shared too loosely, and termination conditions are weak. When peer agents can all delegate, revise scope, and keep talking in a shared thread, nobody clearly owns the final answer or the stop decision.

Is the supervisor agent pattern better than peer-to-peer agents?

For most production workflows, yes. Hierarchical designs like CrewAI's manager-led process or OpenAI Agents SDK handoffs are easier to debug, easier to constrain, and less likely to drift than peer-style group chats where agents keep negotiating in a shared conversation.

Should worker agents write memory directly?

Usually no. A safer pattern is to let workers emit structured memory proposals while a supervisor or memory curator validates, deduplicates, redacts, and stores only what should persist. This reduces bad facts, stale assumptions, and cross-task contamination.

How should I build AI agent workflows without runaway delegation?

Start with the simplest structure that works: one supervisor, narrow workers, explicit handoffs, scoped context, and clear termination rules. Add review checkpoints at tool boundaries, not just at the first or last agent, so every delegated step can be inspected or blocked.