Multi-Agent Auto-Looping Orchestration System

Building a multiple agent automatically looping orchestration system in OpenCode

(Original source: https://intrepidis.neocities.org/blog/post/mixture-of-agents)

How to mix multiple AI agents, multi-model (not necessarily multi-modal) even from different providers, into one collaborative recursive coding pipeline.


The problem

Currently, there isn’t a single model that does everything well. Gemini Flash is fast and cheap at code generation but shakier on deep reasoning. Claude Opus nails architecture but is expensive for common chores. DeepSeek V4 Pro sits in a sweet spot, though it varies by task type.

I wanted a code harness where I could issue a high-level prompt (like “design and implement a plugin system”) and have it automatically execute a structured, multi-stage workflow. Instead of a single model cannoning off in an unforeseen direction, I wanted an autonomous, self-correcting handoff chain. A specialized architect starts with the technical constraints, a dedicated builder implements the code, and an independent reviewer catches edge-case mistakes, all managed by a single orchestrator.

OpenCode’s agent system made that surprisingly achievable.


What I built

The multi-agent orchestration pipeline:

Agent Model Options Role
manage Qwen 3.7 Plus
DeepSeek V4 Pro
Gemini 3.5 Pro
Claude Sonnet 4.8
Orchestrator: routes tasks, never does real work
plan DeepSeek V4 Pro
GLM 5.2
Claude Opus 4.8
Architectural planner
build DeepSeek V4 Flash
MiniMax M3
Gemini 3.5 Pro
Code generator & implementer
review DeepSeek V4 Pro
GLM 5.2
Claude Opus 4.8
Code reviewer & verifier
survey DeepSeek V4 Flash
MiniMax M3
Gemini 3.5 Pro
Research & discovery: codebase investigation, pattern analysis
refactor DeepSeek V4 Pro
GLM 5.2
Claude Opus 4.8
Large-scale restructuring: renames, interface extraction
test DeepSeek V4 Flash
MiniMax M3
Gemini 3.5 Flash
Test generation, coverage gap analysis
document DeepSeek V4 Flash
MiniMax M3
Gemini 3.5 Flash
READMEs, API docs, architecture summaries

Each agent type has hidden _alt fallbacks on a different provider for resilience. Agents can also use included skills to add task-specific behaviors loaded on demand. If DeepSeek V4 Pro becomes unavailable (rate limited, server downtime, etc.) for the “plan” agent, it shifts to GLM 5.2. The same applies for other agent types, distributing risk across providers instead of single-sourcing all fallbacks to one. The manage agent handles all of this transparently. You never see an error unless both the primary and the fallback fail.


Key architectural decisions

Pre-delegation vetting (the grill)

Before the manage agent delegates anything, it checks whether the prompt is specific enough to act on. This catches “make it better” prompts before they waste compute. Instead of guessing and shipping the wrong implementation, the manage agent asks the user to nail down scope, constraints, and success criteria upfront.

The plan-implement gate

This is the most important rule in the system: plan first, build second, keep the human in the loop, but limit their involvement as much as possible.

The manage agent enforces this strictly. Ask for a plan, and it delegates to the plan subagent, gets back a structured document with context, design decisions, implementation steps, and edge cases, then presents it to you. It waits. No automatic handoff to build. You read the plan, you approve, only then does implementation begin.

This kills the “oh no, it already started writing code I don’t agree with” problem you might get in single-agent setups. The plan agent should ideally run on the powerful LLM, which should excel at weighing trade-offs before a single line of code gets written.

Post-execution review

Every plan->build cycle auto-triggers a review pass before results land in front of you. The review agent checks what got built: correctness, security, performance, and adherence to the original plan.

If the reviewer finds issues, manage routes fixes back to the build agent, then re-reviews. The cycle loops until the review passes or you explicitly override.

This catches a lot: missing error handling, insecure patterns, code that drifts from the agreed architecture. It’s like having a senior engineer glance at every PR before it hits your inbox.

Permission hardening

Each agent has carefully scoped permissions, one of the trickier parts to get right.

Manage has edit: deny and bash: deny. It orchestrates but never touches files or runs commands. It’s the only agent with question: allow, since subagents can’t ask questions directly (more on that later).

Plan gets a broad read-only bash allowlist. It can inspect the codebase (ls, cat, find, grep, rg, plus git commands) but nothing mutating. Review gets a narrower git-only allowlist (git diff, git log, git show, git status, git blame). It only needs to examine diffs and blame history. The catch-all "*": "deny" at the bottom stops any slippery command from sneaking through.

Build has a permissive "*": "allow" with a deny list for destructive commands. It can edit files and run any command except the explicitly blocked ones:

"*": "allow",
"rm -rf *": "deny",
"rm -fr *": "deny",
"rm -r *": "deny",
"git push --force*": "deny",
"git push -f*": "deny",
"git reset --hard*": "deny",
"sudo *": "deny",
// ...

The "*": "allow" default means build works freely for any command not in the deny list. The deny list catches the genuinely dangerous operations: recursive deletes, force pushes, destructive git operations, and system commands. It’s a simpler model than allowlisting approved commands: everything is allowed unless explicitly denied.

No task tool for subagents

Every subagent has "tools": { "task": false }. Critical: it stops subagents from spawning their own sub-subagents. Only manage can delegate work. Skip this and you risk runaway delegation chains and confusing behavior.

Fallback model strategy

The manage agent’s instructions mandate automatic fallback on delegation failure:

If the “plan” sub-agent fails, you MUST immediately retry using “plan_alt”. When a primary sub-agent fails but returns partial output, include that partial output in the prompt to the fallback _alt agent so it can continue from where the primary left off. Do not report the failure to the user unless “plan_alt” also fails.

Same goes for build and review, and if desired, you could include multiple _alt agents. The hidden _alt agents mirror the primary config but run different models. You get a log line noting which model ultimately handled it, but no interruption to your workflow.

A later refinement: when a primary agent fails but returns partial output, manage feeds that partial output into the prompt for the _alt fallback agent. The fallback then continues from where the primary left off instead of starting over. This matters for long-running plans that hit token limits near the finish line.

Question relay

Subagents can’t call AskUserQuestion directly; only the primary agent can. Instead, they embed questions in the last paragraph of their response. Manage catches these, presents them to you, waits for your answer, and feeds it back to the subagent.

So a plan agent can say “I see two approaches for this API: REST or GraphQL. Which direction should I design for?” and you get a clean question-answer round-trip, no special ceremony needed.


Specialized subagents

The original three subagents (plan, build, review) handled everything, but as tasks grew more complex, the build agent became a bottleneck. It was a generalist, expected to write code, refactor it, test it, document it, and still keep up. The context-switching cost was real: a build session that started with a cross-file rename and ended with test generation would lose coherence by the final step.

So I expanded the roster with four new specialized agents (survey, refactor, test, document). Each has an _alt fallback.

The routing logic

The manage agent decides which agent to delegate based on task classification:

  • “how does X work?” or “find all usages of Y” → survey
  • “rename Z across the codebase” or “extract interface” → refactor
  • “write tests for module A” or “check coverage” → test
  • “document this API” or “update the README” → document
  • “design a system” → plan
  • “implement feature X” → build
  • “review this PR” → review

This classification isn’t hardcoded. It lives in the manage agent’s prompt as routing rules. The manage agent reads the task description and picks the best fit. If it’s wrong, you can redirect it, and the next delegation learns from the correction within the session.

The “never delegate” constraint

With 7 subagents, there was a risk of runaway delegation chains: a build agent deciding to spawn a sub-subagent for a subtask, which spawns another, and so on. Every subagent prompt now includes a hard constraint:

You must NEVER delegate to other sub-agents. Only the manage agent can delegate tasks. If you need help from another agent, include that request in your response output and let the manage agent decide what to delegate next.

Combined with "tools": { "task": false } in every subagent config, this makes the delegation graph strictly tree-shaped: manage branches to subagents, subagents report back. No recursion.


Permission schema nuances

The question permission only makes sense on the primary agent. Subagents can’t have it. They need the relay pattern instead. The docs don’t make this obvious: add "question": "allow" to a subagent and it simply won’t behave the way you expect.

The fix: give question: allow only to manage, and use the question-relay pattern (last-paragraph detection) for subagents.

Bash glob matching semantics

The permission system uses glob patterns for bash rules. Powerful, but with subtle gotchas:

  • "rm -rf *" matches rm -rf node_modules and rm -rf with any other arguments. The trailing * matters.
  • "git push --force*" catches --force, --force-with-lease, etc., but not the short -f form (which needs its own rule).
  • Order matters: rules evaluate top-to-bottom, first match wins. Put deny rules before the "*": "allow" or "*": "ask" catch-all.

I spent a while figuring out why rm -rf kept getting allowed before realizing my deny rule was ordered wrong relative to the catch-all.

The todo list evolution

The To Do.md that tracked this build tells its own story:

Initial questions (the “Done” section, reading bottom-up):

  1. “Review each agent’s permissions”: started permissive, locked down iteratively.

  2. “Use Sonnet for review”: model selection matters per role.

  3. “Why did it implement the plan without asking?”: led to the Plan-Implement Gate.

  4. “Why didn’t it use the build_alt agent without prompting?”: led to mandatory automatic fallback.

Context pollution and model hallucination

One frustrating bug I hit had nothing to do with configuration or permissions. It was a silent context pollution problem.

The symptom: The manage agent kept producing “unknown tool” errors with empty {} input. Three times in a row, it tried to call a non-existent tool with no arguments. The task it was supposed to delegate simply never happened.

The root cause: Every agent session loads a system prompt file (AGENTS.md, CLAUDE.md, GEMINI.md, etc) from the project root as injected context. One project had a 147-line system prompt file, written as a detailed, framework-specific guide filled with instructions about coding style, architecture rules, dependency management, and testing conventions. Every time the manage agent started a session, it ingested all 147 lines as part of its context window. The model’s reasoning degraded under the bloat, producing malformed JSON tool calls.

The fix:

  1. Move the file. AGENTS.md was relocated to Docs/LLM guide.md, outside the root-level auto-injection path.
  2. Rewrite it. The new version is 34 lines, framework-agnostic, and purely project-domain reference. It documents architectural conventions without duplicating what agent prompts already cover.

The lesson: Project-level instruction files get injected into the context of every agent session, including the orchestrator. If you’re running a multi-agent system, keep those files ruthlessly terse and framework-agnostic. Detailed project conventions belong in project documentation, not in files that feed directly into model context windows.


How to replicate this setup

The step-by-step, exactly as I did it.

1. Install OpenCode

curl -fsSL https://opencode.ai/install.sh | sh

Verify it works:

opencode --version
# Should show 1.x.x or similar

2. Set up provider API keys

You’ll need API keys for every provider your agents use. Create (or edit) your opencode config:

mkdir -p ~/.config/opencode

Set environment variables or use OpenCode’s provider configuration:

# For OpenRouter (used by all agents: manage, plan, build, review, survey, refactor, test, document)
export OPENROUTER_API_KEY="sk-or-v1-..."

# For Google AI Studio (configured as a provider, available for future agent use)
export GEMINI_API_KEY="AIzaSy..."

# Other providers...

3. Create the agent instruction files

These markdown files define each agent’s behavior. Here’s what I use:

~/.config/opencode/manage.md: Orchestrator prompt. Key contents: - Subagent delegation rules (plan vs build vs review). - Plan-Implement Gate logic (present plan, wait for approval). - Post-Execution Review logic (auto-trigger review after build). - Automatic fallback rules (retry with _alt on failure). - Question relay instructions (detect subagent questions, relay to user). - Forbidden from using bash directly.

~/.config/opencode/plan.md: Architectural planner prompt: - Analyze requirements, design system architecture. - Output structured plans (context, design decisions, steps, edge cases). - No implementation code unless asked. - Question relay pattern documented.

~/.config/opencode/build.md: Code generator prompt: - Implement following project conventions. - Write tests and documentation. - Run verification (tests, lint, type-check). - Ask rather than guess when unclear. - Question relay pattern documented.

~/.config/opencode/review.md: Code reviewer prompt: - Strictly read-only. Never modifies files. - Review for correctness, security, performance, maintainability. - Output structured findings by severity (Critical, High, Medium, Low, Nits). - Use only read-only git commands. - Question relay pattern documented.

~/.config/opencode/survey.md: Research & discovery agent prompt: - Investigate codebases, map dependencies, answer “how does X work?” - Find patterns, identify anti-patterns, locate relevant code. - Read-only. Never modifies files. - Question relay pattern documented.

~/.config/opencode/refactor.md: Large-scale refactoring agent prompt: - Rename symbols, extract interfaces, restructure across 5+ files. - Maintain behavioral equivalence. Refactor, don’t rewrite. - Run tests after refactoring to verify nothing broke. - Question relay pattern documented.

~/.config/opencode/test.md: Test generation & coverage agent prompt: - Write tests following project conventions (framework, style, location). - Analyze coverage gaps and generate fixtures. - Run tests after writing to verify they pass. - Question relay pattern documented.

~/.config/opencode/document.md: Documentation agent prompt: - Generate READMEs, API documentation, architecture summaries. - Update existing documents to reflect code changes. - Read source code to verify every claim. Never document from memory. - Question relay pattern documented.

4. Configure opencode.json

See the attached file for the full configuration (~/.config/opencode/opencode.json). After making the change, run opencode debug config to catch validation warnings or bracket syntax errors.

5. Test the workflow

Fire up OpenCode via the agent harness opencode and try a prompt, such as “I need a plugin system”.


Lessons learned

Start too restrictive, then loosen up. Adding "rg *": "allow" after an agent reports it can’t search beats discovering later that a build agent ran rm -rf on your entire project.

The prompt files matter enormously. Each agent needs to know not just what to do, but how to talk to the other agents. Document the question relay pattern in every subagent’s prompt. They need to know they can ask questions, and how to format them.

Test the fallback paths. Providers have outages and rate limits you won’t expect. Run through each _alt path deliberately to confirm it works. You can ask the “manage” agent to specifically use the “plan_alt” subagent, for example.

Specialization pays off fast. A generalist build agent works fine until you start asking for more complex jobs. The context-switching cost shows in the output. Splitting work across specialized subagents with focused prompts (survey, refactor, test, document) produces better results per domain. Each agent manages a narrower context window, and its prompts are targeted at a single job.


If you build something similar or improve on this design, please share it. These patterns are still new, and it would be good to see what other people come up with.

Comments

Popular posts from this blog

xBRZ in Java

Keep the monitor display brightness the same when the power source is changed to battery/charging

Javascript Folding Pattern