I hired a CEO via Paperclip. Not a person. An agent. Paperclip is built for exactly this, a CEO that delegates, follows up, tracks progress. And early on, it worked. Then I sprawled. I gave it a company to run, a sales and marketing play for a local meeting recording app, with content automation as part of it. That company peaked at eleven agents, each with a role, each with a workstream. Pretty soon I couldn't track any of them, and the CEO orchestrator that should have been handling it just wasn't keeping up.
It handed tasks off but didn't follow up. I gave it a big task. It delegated bits and pieces. An hour later, nothing was done. A couple of agents weren't even running. A health check I'd added had broken and started spiraling, burning through my tokens. I shut that off, asked the CEO to investigate. It couldn't come up with an answer. I was doing the diagnosing. The agent was an expensive middleman burning my API budget while I did its job. I stopped the whole thing right then so I could evaluate.
This CEO reminded me of Larry. You know the type. Delegates everything, has no idea what's actually happening, sends you to follow up with the team yourself. At least human-Larry doesn't charge by the token.
That was the most recent addition to what I've started calling the Museum of Dead Agents.
The Museum of Dead Agents
I have a graveyard. You probably do too, if you've been building with agents for more than a few months. Mine includes experiments with Paperclip, OpenClaw, Hermes agent harnesses, and various custom harnesses. Each one started with ambition and an architecture diagram that looked great on a whiteboard. Each one ended with me tearing it out and muttering something about simplicity.
The Museum of Dead Agents isn't a folder. It's a feeling. It's the moment you realize the orchestrator you spent a week wiring up is now the thing you spend all your time debugging. It's the moment you find a broken watchdog that had accumulated 19GB of failed heartbeat logs before you noticed. Nineteen gigabytes. Of a process telling no one that it was failing. That's not an AI problem. That's an infrastructure problem wearing an AI costume.
I kept thinking the lesson was "simplify." Fewer agents, smaller scope, less ambition. Partly true. But after enough autopsies, I realized the real failures were more specific than that. And more useful.
Pets, Cattle, and the Wrong Axis
There's a framing from the DevOps world that I kept reaching for: pets vs. cattle (Randy Bias). Cattle are interchangeable, replaceable, numbered not named. Pets are precious, hand-configured, mourned when they die. The lesson is supposed to be: stop building pets.
I tried applying this to agents. Make them disposable. Stateless. Replaceable. It got me halfway. But it's the wrong axis.
The Paperclip experiment didn't fail for just one reason. When I did the autopsy, the failures were all infrastructure. The CEO wasn't following up on delegated tasks. Agents weren't running and nobody noticed. A health check broke and spiraled tokens. The system was expensive even when it was working. I was in the weeds reading every task, unblocking every agent, catching issues before the agents even realized there were issues. The agents didn't fail because AI wasn't good enough. They failed because I was nursing a fleet of pets instead of operating a small herd of cattle.
But there was an architectural problem I only saw after comparing the Paperclip wreckage to the agents that actually worked. It was about how state and capability flowed between agents.
The CEO agent needed to know what every other agent was doing. Every agent needed to share progress, status, blockers. So I gave them shared state. A common context. A place where everyone could see everything.
Shared state turned infrastructure problems into cascading failures. I was sharing the wrong things and isolating the wrong things.
Each agent in that system had domain-specific context: the CMO knew about marketing strategy and content plans; the CTO knew about the technical build; the engineers knew about implementation details; the research agent knew about market data and competitor analysis. That context belongs to that agent alone. When you share it across the whole system, you pollute every agent's context window with information it doesn't need, can't use, and will occasionally hallucinate connections to.
Meanwhile, there were generic capabilities that every agent needed: how to report status, how to handle errors, how to format output for my review pipeline, how to escalate when stuck. Those capabilities were duplicated, slightly differently, in every single agent's configuration. Isolated where they should have been shared.
The architecture was exactly backwards. I was sharing state that needed isolation. I was isolating capability that needed sharing.
The Actual Architecture Insight
Each agent gets its own sealed context for the work only it does. The CMO doesn't see the CTO's error logs or the engineer's implementation details. The research agent doesn't get marketing strategy context clogging its window. Each one operates in its own domain, with its own state, accountable for its own outputs.
But the shared layer, the stuff every agent needs to do, gets built once and passed in. Status reporting. Error handling. Output formatting. Escalation patterns. You build those as capability, not as duplicated config pasted into eleven different system prompts and slowly drifting apart until you're debugging why Agent 7 formats errors differently than Agent 3.
If the pets-vs-cattle framing gives you one axis (replaceable vs. precious), this gives you the second axis: the Seal/Share Rule. Seal the domain state. Share the generic capability. You need both axes. But the second one is the one nobody talks about, and it's the one that actually killed my orchestrators.
The Thing That Actually Works
I've had the best results using NanoClaw. It's just a much smaller codebase than OpenClaw, Paperclip, all these others. I can spin one up in an hour and have it running a task.
That's not a knock on the bigger tools. OpenClaw has real power. Paperclip solved a real problem in orchestration. But the pattern I keep falling into is this: I reach for the ambitious tool, wire up something impressive, watch it work for a day or two, and then spend the next week managing the tool instead of getting value from it.
NanoClaw works because its architecture enforces the insight by default. It's a single Node.js process. Each agent runs in its own isolated container with its own filesystem, its own memory, its own Claude session. Groups cannot access other groups' data. That's domain-state isolation baked into the design, not something I have to remember to enforce.
If I need shared capability, NanoClaw handles that through its skills system. Gmail, Telegram, web access, custom capabilities. You extend functionality without duplicating config across eleven agents. Any NanoClaw agent in my herd can use the same skills without me pasting duplicated config into each one. The constraint is the feature.
An orchestrator that costs $15/month and works is infinitely better than one that costs $200/day and makes me anxious.
I learned this the expensive way. Multiple times. Which is why it's a museum and not just a single cautionary tale.
Why the Museum Keeps Growing (for Everyone)
The demo is always compelling. You wire up a multi-agent system and the first run looks magical. Agents talking to each other. Tasks getting routed. Status reports flowing in. You screenshot it and think about tweeting it.
But demos operate in clean state. There's no accumulated drift. No log files growing in the dark. No context windows slowly filling with irrelevant cross-agent chatter. No broken heartbeat check silently eating disk space until you find 19GB of evidence that your monitoring watched nothing.
The failure mode of orchestrators isn't a crash. It's a slow rot. Everything looks like it's running until you check the outputs and realize they've been degrading for days. The health check in my Paperclip experiment spiraled tokens overnight, building a queue of failed heartbeat errors. By the time I sat down the next morning and tried for an hour to get the CEO to diagnose the problems, it was acutely broken. It didn't throw a single clear error. It just couldn't diagnose its own system, and I didn't notice the rot building because I was busy doing the work it was supposed to be doing.
That's the museum's real exhibit: not broken things, but things that quietly stopped working while looking operational.
A Name for the Fix
The answer isn't never build multi-agent systems. The answer is to be honest about what needs to be shared and what needs to be sealed, and to resist the urge to wire everything together just because you can.
Isolate domain state. Share generic capability. Start with one NanoClaw doing one job well before you add the second. When you do add the second, make them share capability modules, not context.
And if you catch yourself building a CEO agent to manage the other agents, stop. As much as I love modeling agentic systems around how companies are structured, I've yet to get good results from trying to make agents behave more human. Human frameworks help me understand and talk to my agents better. But I need to stick to the livestock, not pets analogy and build systems, not companies. The work of coordination is the work. You can't delegate it to another layer of abstraction without understanding what's being coordinated.
I know this because I keep learning it. The museum has a new wing every few months. I'm working on the bleeding edge of what's possible, so that comes with the territory. But the exhibits are getting smaller. The failures are cheaper. The time between "this is going to be amazing" and "wait, I've done this before" keeps shrinking. I'm failing faster. That's progress, even if it doesn't feel like it.
Shipping Beats Architecture, But Only One Kind
The advice "shipping beats architecture" is true, but incomplete. Shipping the wrong architecture just means you're adding to the museum faster.
The version I believe now: shipping beats architecture, but only if you've stopped building the museum and started building the one thing that runs. One agent. One domain. Its own state. Fifteen dollars a month. Boring. Reliable. Actually running while you sleep.
That's the pattern I keep falling into, and the one I keep climbing out of. Maybe writing it down means I'll stop. Probably not. But at least now it has a name: the Seal/Share Rule.
— Joe