238 lines
11 KiB
Markdown
238 lines
11 KiB
Markdown
# Tiered Agent Team System — Design Document
|
|
|
|
_Started: 2026-03-14. Last updated: 2026-03-16._
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
|
|
|
|
---
|
|
|
|
## Core Principles
|
|
|
|
**1. Tiers represent cognitive modes, not org chart levels.**
|
|
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
|
|
|
|
**2. Depth is proportional to complexity.**
|
|
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
|
|
|
|
**3. Goal anchoring at every level.**
|
|
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
|
|
|
|
**4. Artifacts, not summaries.**
|
|
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
|
|
|
|
**5. Verification is mandatory.**
|
|
T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
|
|
|
|
**6. Provider agnostic.**
|
|
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
|
|
|
|
**7. Specialist talent pool.**
|
|
Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
|
|
|
|
---
|
|
|
|
## Tier Definitions
|
|
|
|
| Tier | Role | Owns | Capability Level |
|
|
|------|------|------|-----------------|
|
|
| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
|
|
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
|
|
| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
|
|
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
|
|
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
|
|
|
|
T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
|
|
|
|
Capability levels map to actual models per provider in config — the core system never references a specific model name.
|
|
|
|
---
|
|
|
|
## Dispatch Model
|
|
|
|
### T1 Owns the Plan
|
|
|
|
T1 is not just a decomposer — it is the dispatch planner. Its output declares:
|
|
|
|
- **Workstreams** — the decomposed units of work
|
|
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
|
|
- **Parallelism** — which workstreams are independent and can run concurrently
|
|
|
|
T1 does not prescribe how each tier operates internally. That is the tier's own concern.
|
|
|
|
### Each Tier Owns the Layer Below
|
|
|
|
Control flow is distributed, not centralised:
|
|
|
|
- T1 manages its T2s
|
|
- T2 manages its T3s
|
|
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
|
|
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
|
|
|
|
This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
|
|
|
|
**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
|
|
|
|
### Dynamic Paths
|
|
|
|
Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
|
|
|
|
---
|
|
|
|
## Orchestration Patterns Per Tier
|
|
|
|
Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
|
|
|
|
| Tier | Pattern | Rationale |
|
|
|------|---------|-----------|
|
|
| T1 | Single agent | Must be authoritative; no committee |
|
|
| T2 | Group chat / round-table | Specialist architects (security, perf, data, API) debate and reach consensus before committing to a design |
|
|
| T3 | Light mesh | Peer coordination to negotiate task boundaries and avoid T4 conflicts before dispatch |
|
|
| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
|
|
| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
|
|
|
|
---
|
|
|
|
## Horizontal Scaling Within Tiers
|
|
|
|
```
|
|
T1 (1 agent — authoritative)
|
|
├── T2: Backend Architect ─┐
|
|
├── T2: Frontend Architect ├─ round-table consensus
|
|
└── T2: Infra Architect ─┘
|
|
│
|
|
└── T3: Squad Lead (per workstream) ─┐
|
|
│ ├─ light mesh across T3s
|
|
├── T4: Worker A ─┐ │
|
|
├── T4: Worker B ─┼─ swarm / pipeline (T3 decides)
|
|
└── T4: Worker C ─┘
|
|
│
|
|
└── T5: Verifier(s) — fan-out + consensus
|
|
```
|
|
|
|
---
|
|
|
|
## Shared State
|
|
|
|
For software pipelines, **the repo is the primary blackboard**:
|
|
- T4 workers commit to feature branches
|
|
- T3 leads review and merge to workstream branches
|
|
- T2 architects own integration branches
|
|
- T1 does final integration and acceptance
|
|
|
|
Supplemented by a SQLite coordination store per run tracking:
|
|
- In-flight workstreams and their current execution plans
|
|
- Handoff artifacts and tier status
|
|
- Retry counts and escalation history
|
|
- Path amendments (proposed, by whom, timestamp)
|
|
|
|
---
|
|
|
|
## Failure Handling
|
|
|
|
| Failure | Handler | Action |
|
|
|---------|---------|--------|
|
|
| T4 bad output | T3 | Retry T4 with corrected brief (up to retry_budget) |
|
|
| T4 blocked | T3 | Escalate immediately — no retries |
|
|
| T4 partial output | T3 | Salvage good parts, re-task remainder |
|
|
| T3 workstream stuck | T2 | Re-scope or split the workstream |
|
|
| T2 design wrong | T1 | Re-plan; may discard workstream and restart |
|
|
| Repeated escalation | Surface to user | Block until human unblocks |
|
|
|
|
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
|
|
|
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
|
|
|
|
---
|
|
|
|
## Agent Talent Pool
|
|
|
|
The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
|
|
|
|
**Division of responsibility:**
|
|
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
|
|
- Agency-agents provides: the specialist knowledge each agent brings to its role
|
|
|
|
T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
|
|
|
|
**Default tier-to-specialist mapping for software pipelines:**
|
|
|
|
| Tier | Domain | Agent |
|
|
|------|--------|-------|
|
|
| T1 | Strategy | nexus-strategy |
|
|
| T2 | Backend | software-architect |
|
|
| T2 | Infra | devops-automator |
|
|
| T2 | Data | data-engineer |
|
|
| T3 | Backend | senior-developer |
|
|
| T3 | Reliability | sre |
|
|
| T4 | Frontend | frontend-developer |
|
|
| T4 | Backend | backend-architect |
|
|
| T4 | Database | database-optimizer |
|
|
| T4 | DevOps | devops-automator |
|
|
| T4 | Mobile | mobile-app-builder |
|
|
| T4 | AI/ML | ai-engineer |
|
|
| T4 | Security | security-engineer |
|
|
| T4 | Docs | technical-writer |
|
|
| T5 | Code review | code-reviewer |
|
|
| T5 | Integration | testing-reality-checker |
|
|
| T5 | API | testing-api-tester |
|
|
| T5 | Performance | testing-performance-benchmarker |
|
|
| T5 | Security | security-engineer |
|
|
|
|
The roster is not fixed — T1 can select any agent from the library based on workstream needs.
|
|
|
|
---
|
|
|
|
## Adapter Layers
|
|
|
|
Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
|
|
|
|
```
|
|
Core (platform-agnostic)
|
|
├── team_runner — thin bootstrap: spawn T1, monitor blackboard, handle result
|
|
├── blackboard — SQLite coordination state
|
|
├── task_brief — schema + validation
|
|
└── escalation — retry logic, failure routing
|
|
|
|
Adapters (swappable)
|
|
├── llm/ — anthropic (now), openai, ollama, any API
|
|
├── notify/ — openclaw (now), slack, email, webhook...
|
|
├── vcs/ — github (now), gitlab, gitea, bare git...
|
|
└── runtime/
|
|
├── standard — openclaw sessions_spawn (T1/T2/T3)
|
|
└── coding_agent — claude_code (T4/T5 default), codex, aider...
|
|
```
|
|
|
|
Swapping providers means writing a new adapter file — nothing in core changes.
|
|
|
|
T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
|
|
|
|
---
|
|
|
|
## Decisions Log
|
|
|
|
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
|
|
|
**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
|
|
|
|
**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
|
|
|
|
**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
|
|
|
|
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: round-table. T3: light mesh. T4: swarm+pipeline. T5: fan-out+consensus.
|
|
|
|
**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off.
|
|
|
|
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
|
|
|
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
|
|
|
|
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
|
|
|
|
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
|
|
|
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|