Multi-Agent Auto-Looping Orchestration System
Building a multiple agent automatically looping orchestration system in OpenCode
(Original source: https://intrepidis.neocities.org/blog/post/mixture-of-agents)How to mix multiple AI agents, multi-model (not necessarily multi-modal) even from different providers, into one collaborative recursive coding pipeline.
The problem
Currently, there isn’t a single model that does everything well. Gemini Flash is fast and cheap at code generation but shakier on deep reasoning. Claude Opus nails architecture but is expensive for common chores. DeepSeek V4 Pro sits in a sweet spot, though it varies by task type.
I wanted a code harness where I could issue a high-level prompt (like “design and implement a plugin system”) and have it automatically execute a structured, multi-stage workflow. Instead of a single model cannoning off in an unforeseen direction, I wanted an autonomous, self-correcting handoff chain. A specialized architect starts with the technical constraints, a dedicated builder implements the code, and an independent reviewer catches edge-case mistakes, all managed by a single orchestrator.
OpenCode’s agent system made that surprisingly achievable.
What I built
The multi-agent orchestration pipeline:
| Agent | Model Options | Role |
|---|---|---|
| manage | Qwen 3.7 Plus DeepSeek V4 Pro Gemini 3.5 Pro Claude Sonnet 4.8 |
Orchestrator: routes tasks, never does real work |
| plan | DeepSeek V4 Pro GLM 5.2 Claude Opus 4.8 |
Architectural planner |
| build | DeepSeek V4 Flash MiniMax M3 Gemini 3.5 Pro |
Code generator & implementer |
| review | DeepSeek V4 Pro GLM 5.2 Claude Opus 4.8 |
Code reviewer & verifier |
| survey | DeepSeek V4 Flash MiniMax M3 Gemini 3.5 Pro |
Research & discovery: codebase investigation, pattern analysis |
| refactor | DeepSeek V4 Pro GLM 5.2 Claude Opus 4.8 |
Large-scale restructuring: renames, interface extraction |
| test | DeepSeek V4 Flash MiniMax M3 Gemini 3.5 Flash |
Test generation, coverage gap analysis |
| document | DeepSeek V4 Flash MiniMax M3 Gemini 3.5 Flash |
READMEs, API docs, architecture summaries |
Each agent type has hidden _alt fallbacks on a different
provider for resilience. Agents can also use included skills to add
task-specific behaviors loaded on demand. If DeepSeek V4 Pro becomes
unavailable (rate limited, server downtime, etc.) for the “plan” agent,
it shifts to GLM 5.2. The same applies for other agent types,
distributing risk across providers instead of single-sourcing all
fallbacks to one. The manage agent handles all of this transparently.
You never see an error unless both the primary and the fallback
fail.
Key architectural decisions
Pre-delegation vetting (the grill)
Before the manage agent delegates anything, it checks whether the prompt is specific enough to act on. This catches “make it better” prompts before they waste compute. Instead of guessing and shipping the wrong implementation, the manage agent asks the user to nail down scope, constraints, and success criteria upfront.
The plan-implement gate
This is the most important rule in the system: plan first, build second, keep the human in the loop, but limit their involvement as much as possible.
The manage agent enforces this strictly. Ask for a plan, and it
delegates to the plan subagent, gets back a structured
document with context, design decisions, implementation steps, and edge
cases, then presents it to you. It waits. No automatic
handoff to build. You read the plan, you approve, only then does
implementation begin.
This kills the “oh no, it already started writing code I don’t agree with” problem you might get in single-agent setups. The plan agent should ideally run on the powerful LLM, which should excel at weighing trade-offs before a single line of code gets written.
Post-execution review
Every plan->build cycle auto-triggers a review pass before results land in front of you. The review agent checks what got built: correctness, security, performance, and adherence to the original plan.
If the reviewer finds issues, manage routes fixes back to the build agent, then re-reviews. The cycle loops until the review passes or you explicitly override.
This catches a lot: missing error handling, insecure patterns, code that drifts from the agreed architecture. It’s like having a senior engineer glance at every PR before it hits your inbox.
Permission hardening
Each agent has carefully scoped permissions, one of the trickier parts to get right.
Manage has edit: deny and
bash: deny. It orchestrates but never touches files or runs
commands. It’s the only agent with question: allow, since
subagents can’t ask questions directly (more on that later).
Plan gets a broad read-only bash allowlist. It can
inspect the codebase (ls, cat,
find, grep, rg, plus git
commands) but nothing mutating. Review gets a narrower
git-only allowlist (git diff, git log,
git show, git status, git blame).
It only needs to examine diffs and blame history. The catch-all
"*": "deny" at the bottom stops any slippery command from
sneaking through.
Build has a permissive "*": "allow"
with a deny list for destructive commands. It can edit files and run any
command except the explicitly blocked ones:
"*": "allow",
"rm -rf *": "deny",
"rm -fr *": "deny",
"rm -r *": "deny",
"git push --force*": "deny",
"git push -f*": "deny",
"git reset --hard*": "deny",
"sudo *": "deny",
// ...The "*": "allow" default means build works freely for
any command not in the deny list. The deny list catches the genuinely
dangerous operations: recursive deletes, force pushes, destructive git
operations, and system commands. It’s a simpler model than allowlisting
approved commands: everything is allowed unless explicitly denied.
No task tool for subagents
Every subagent has "tools": { "task": false }. Critical:
it stops subagents from spawning their own sub-subagents. Only
manage can delegate work. Skip this and you risk runaway delegation
chains and confusing behavior.
Fallback model strategy
The manage agent’s instructions mandate automatic fallback on delegation failure:
If the “plan” sub-agent fails, you MUST immediately retry using “plan_alt”. When a primary sub-agent fails but returns partial output, include that partial output in the prompt to the fallback
_altagent so it can continue from where the primary left off. Do not report the failure to the user unless “plan_alt” also fails.
Same goes for build and review, and if desired, you could include
multiple _alt agents. The hidden _alt agents
mirror the primary config but run different models. You get a log line
noting which model ultimately handled it, but no interruption to your
workflow.
A later refinement: when a primary agent fails but returns partial
output, manage feeds that partial output into the prompt for the
_alt fallback agent. The fallback then continues from where
the primary left off instead of starting over. This matters for
long-running plans that hit token limits near the finish line.
Question relay
Subagents can’t call AskUserQuestion directly; only the
primary agent can. Instead, they embed questions in the last paragraph
of their response. Manage catches these, presents them to you, waits for
your answer, and feeds it back to the subagent.
So a plan agent can say “I see two approaches for this API: REST or GraphQL. Which direction should I design for?” and you get a clean question-answer round-trip, no special ceremony needed.
Specialized subagents
The original three subagents (plan, build, review) handled everything, but as tasks grew more complex, the build agent became a bottleneck. It was a generalist, expected to write code, refactor it, test it, document it, and still keep up. The context-switching cost was real: a build session that started with a cross-file rename and ended with test generation would lose coherence by the final step.
So I expanded the roster with four new specialized agents (survey,
refactor, test, document). Each has an _alt fallback.
The routing logic
The manage agent decides which agent to delegate based on task classification:
- “how does X work?” or “find all usages of Y” → survey
- “rename Z across the codebase” or “extract interface” → refactor
- “write tests for module A” or “check coverage” → test
- “document this API” or “update the README” → document
- “design a system” → plan
- “implement feature X” → build
- “review this PR” → review
This classification isn’t hardcoded. It lives in the manage agent’s prompt as routing rules. The manage agent reads the task description and picks the best fit. If it’s wrong, you can redirect it, and the next delegation learns from the correction within the session.
The “never delegate” constraint
With 7 subagents, there was a risk of runaway delegation chains: a build agent deciding to spawn a sub-subagent for a subtask, which spawns another, and so on. Every subagent prompt now includes a hard constraint:
You must NEVER delegate to other sub-agents. Only the manage agent can delegate tasks. If you need help from another agent, include that request in your response output and let the manage agent decide what to delegate next.
Combined with "tools": { "task": false } in every
subagent config, this makes the delegation graph strictly tree-shaped:
manage branches to subagents, subagents report back. No recursion.
Permission schema nuances
The question permission only makes sense on the primary
agent. Subagents can’t have it. They need the relay pattern
instead. The docs don’t make this obvious: add
"question": "allow" to a subagent and it simply won’t
behave the way you expect.
The fix: give question: allow only to manage, and use
the question-relay pattern (last-paragraph detection) for subagents.
Bash glob matching semantics
The permission system uses glob patterns for bash rules. Powerful, but with subtle gotchas:
"rm -rf *"matchesrm -rf node_modulesandrm -rfwith any other arguments. The trailing*matters."git push --force*"catches--force,--force-with-lease, etc., but not the short-fform (which needs its own rule).- Order matters: rules evaluate top-to-bottom, first match wins. Put
deny rules before the
"*": "allow"or"*": "ask"catch-all.
I spent a while figuring out why rm -rf kept getting
allowed before realizing my deny rule was ordered wrong relative to the
catch-all.
The todo list evolution
The To Do.md that tracked this build tells its own story:
Initial questions (the “Done” section, reading bottom-up):
“Review each agent’s permissions”: started permissive, locked down iteratively.
“Use Sonnet for review”: model selection matters per role.
“Why did it implement the plan without asking?”: led to the Plan-Implement Gate.
“Why didn’t it use the build_alt agent without prompting?”: led to mandatory automatic fallback.
Context pollution and model hallucination
One frustrating bug I hit had nothing to do with configuration or permissions. It was a silent context pollution problem.
The symptom: The manage agent kept producing
“unknown tool” errors with empty {} input. Three times in a
row, it tried to call a non-existent tool with no arguments. The task it
was supposed to delegate simply never happened.
The root cause: Every agent session loads a system
prompt file (AGENTS.md, CLAUDE.md,
GEMINI.md, etc) from the project root as injected context.
One project had a 147-line system prompt file, written as a detailed,
framework-specific guide filled with instructions about coding style,
architecture rules, dependency management, and testing conventions.
Every time the manage agent started a session, it ingested all 147 lines
as part of its context window. The model’s reasoning degraded under the
bloat, producing malformed JSON tool calls.
The fix:
- Move the file.
AGENTS.mdwas relocated toDocs/LLM guide.md, outside the root-level auto-injection path. - Rewrite it. The new version is 34 lines, framework-agnostic, and purely project-domain reference. It documents architectural conventions without duplicating what agent prompts already cover.
The lesson: Project-level instruction files get injected into the context of every agent session, including the orchestrator. If you’re running a multi-agent system, keep those files ruthlessly terse and framework-agnostic. Detailed project conventions belong in project documentation, not in files that feed directly into model context windows.
How to replicate this setup
The step-by-step, exactly as I did it.
1. Install OpenCode
curl -fsSL https://opencode.ai/install.sh | shVerify it works:
opencode --version
# Should show 1.x.x or similar2. Set up provider API keys
You’ll need API keys for every provider your agents use. Create (or
edit) your opencode config:
mkdir -p ~/.config/opencodeSet environment variables or use OpenCode’s provider configuration:
# For OpenRouter (used by all agents: manage, plan, build, review, survey, refactor, test, document)
export OPENROUTER_API_KEY="sk-or-v1-..."
# For Google AI Studio (configured as a provider, available for future agent use)
export GEMINI_API_KEY="AIzaSy..."
# Other providers...3. Create the agent instruction files
These markdown files define each agent’s behavior. Here’s what I use:
~/.config/opencode/manage.md:
Orchestrator prompt. Key contents: - Subagent delegation rules (plan vs
build vs review). - Plan-Implement Gate logic (present plan, wait for
approval). - Post-Execution Review logic (auto-trigger review after
build). - Automatic fallback rules (retry with _alt on
failure). - Question relay instructions (detect subagent questions,
relay to user). - Forbidden from using bash directly.
~/.config/opencode/plan.md:
Architectural planner prompt: - Analyze requirements, design system
architecture. - Output structured plans (context, design decisions,
steps, edge cases). - No implementation code unless asked. - Question
relay pattern documented.
~/.config/opencode/build.md: Code
generator prompt: - Implement following project conventions. - Write
tests and documentation. - Run verification (tests, lint, type-check). -
Ask rather than guess when unclear. - Question relay pattern
documented.
~/.config/opencode/review.md: Code
reviewer prompt: - Strictly read-only. Never modifies files. - Review
for correctness, security, performance, maintainability. - Output
structured findings by severity (Critical, High, Medium, Low, Nits). -
Use only read-only git commands. - Question relay pattern
documented.
~/.config/opencode/survey.md: Research
& discovery agent prompt: - Investigate codebases, map dependencies,
answer “how does X work?” - Find patterns, identify anti-patterns,
locate relevant code. - Read-only. Never modifies files. - Question
relay pattern documented.
~/.config/opencode/refactor.md:
Large-scale refactoring agent prompt: - Rename symbols, extract
interfaces, restructure across 5+ files. - Maintain behavioral
equivalence. Refactor, don’t rewrite. - Run tests after refactoring to
verify nothing broke. - Question relay pattern documented.
~/.config/opencode/test.md: Test
generation & coverage agent prompt: - Write tests following project
conventions (framework, style, location). - Analyze coverage gaps and
generate fixtures. - Run tests after writing to verify they pass. -
Question relay pattern documented.
~/.config/opencode/document.md:
Documentation agent prompt: - Generate READMEs, API documentation,
architecture summaries. - Update existing documents to reflect code
changes. - Read source code to verify every claim. Never document from
memory. - Question relay pattern documented.
4. Configure opencode.json
See the attached file for the full configuration
(~/.config/opencode/opencode.json). After making the
change, run opencode debug config to catch validation
warnings or bracket syntax errors.
5. Test the workflow
Fire up OpenCode via the agent harness opencode and try
a prompt, such as “I need a plugin system”.
Lessons learned
Start too restrictive, then loosen up. Adding
"rg *": "allow" after an agent reports it can’t search
beats discovering later that a build agent ran rm -rf on
your entire project.
The prompt files matter enormously. Each agent needs to know not just what to do, but how to talk to the other agents. Document the question relay pattern in every subagent’s prompt. They need to know they can ask questions, and how to format them.
Test the fallback paths. Providers have outages and
rate limits you won’t expect. Run through each _alt path
deliberately to confirm it works. You can ask the “manage” agent to
specifically use the “plan_alt” subagent, for example.
Specialization pays off fast. A generalist build agent works fine until you start asking for more complex jobs. The context-switching cost shows in the output. Splitting work across specialized subagents with focused prompts (survey, refactor, test, document) produces better results per domain. Each agent manages a narrower context window, and its prompts are targeted at a single job.
If you build something similar or improve on this design, please share it. These patterns are still new, and it would be good to see what other people come up with.
Comments
Post a Comment