docs: update design — dynamic dispatch, distributed ownership, orchestration patterns

This commit is contained in:
2026-03-16 16:13:33 -04:00
parent 72bd744664
commit 1ed7023c08

View File

@@ -1,6 +1,6 @@
# Tiered Agent Team System — Design Document # Tiered Agent Team System — Design Document
_Started: 2026-03-14. Status: Pre-build, gathering requirements._ _Started: 2026-03-14. Last updated: 2026-03-16._
--- ---
@@ -16,7 +16,7 @@ A dynamic, hierarchical multi-agent system for software pipelines. Teams assembl
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning. Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
**2. Depth is proportional to complexity.** **2. Depth is proportional to complexity.**
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
**3. Goal anchoring at every level.** **3. Goal anchoring at every level.**
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice. T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
@@ -24,8 +24,8 @@ T1's original intent is embedded in every agent's context — not just passed to
**4. Artifacts, not summaries.** **4. Artifacts, not summaries.**
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed. Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
**5. Verification is bidirectional.** **5. Verification is mandatory.**
Lower tiers verify correctness. Upper tiers verify alignment with original intent. Both directions catch different failure modes. T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
**6. Provider agnostic.** **6. Provider agnostic.**
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters. The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
@@ -39,52 +39,78 @@ Tiers define structure and responsibility. Agent personalities define domain exp
| Tier | Role | Owns | Capability Level | | Tier | Role | Owns | Capability Level |
|------|------|------|-----------------| |------|------|------|-----------------|
| T1 | Visionary | Goal, constraints, final acceptance, architectural bets | reasoning-heavy | | T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable | | T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
| T3 | Squad Lead | Workstream delivery, worker coordination, quality gate | capable | | T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap | | T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable | | T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
T5 runs **parallel to T4**, not above it. It's a quality gate, not a management layer. T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
Capability levels map to actual models per provider in config — the core system never references a specific model name. Capability levels map to actual models per provider in config — the core system never references a specific model name.
--- ---
## Variable Depth ## Dispatch Model
``` ### T1 Owns the Plan
Config change T3 → T4
New feature T2 → T3 → T4
Major refactor T1 → T2 → T3 → T4 → T5
New system / product T1 → T2 → T3s (parallel) → T4s → T5s
```
T3 assesses scope on receipt. If a task is simple enough, it handles it directly without spawning upward or waiting for T2 sign-off. T1 is not just a decomposer — it is the dispatch planner. Its output declares:
- **Workstreams** — the decomposed units of work
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
- **Parallelism** — which workstreams are independent and can run concurrently
T1 does not prescribe how each tier operates internally. That is the tier's own concern.
### Each Tier Owns the Layer Below
Control flow is distributed, not centralised:
- T1 manages its T2s
- T2 manages its T3s
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
### Dynamic Paths
Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
---
## Orchestration Patterns Per Tier
Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
| Tier | Pattern | Rationale |
|------|---------|-----------|
| T1 | Single agent | Must be authoritative; no committee |
| T2 | Group chat / round-table | Specialist architects (security, perf, data, API) debate and reach consensus before committing to a design |
| T3 | Light mesh | Peer coordination to negotiate task boundaries and avoid T4 conflicts before dispatch |
| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
--- ---
## Horizontal Scaling Within Tiers ## Horizontal Scaling Within Tiers
Each tier can have multiple agents running in parallel:
``` ```
T1 (12 agents) T1 (1 agent — authoritative)
├── T2: Backend Architect ├── T2: Backend Architect ─┐
├── T3: API Squad Lead ├── T2: Frontend Architect ├─ round-table consensus
│ │ ├── T4: Worker — endpoint A ── T2: Infra Architect ─┘
├── T4: Worker — endpoint B
│ │ └── T5: Verifier └── T3: Squad Lead (per workstream) ─┐
│ └── T3: DB Squad Lead │ ├─ light mesh across T3s
├── T4: Worker — migrations ├── T4: Worker A ─┐ │
── T5: Verifier ── T4: Worker B ─┼─ swarm / pipeline (T3 decides)
── T2: Frontend Architect ── T4: Worker C ─┘
└── T3: UI Squad Lead
── T4: Worker — component X ── T5: Verifier(s) — fan-out + consensus
│ └── T4: Worker — component Y
└── T2: Infra Architect
└── T3: Platform Squad Lead
└── T4: Worker — config / deploy
``` ```
--- ---
@@ -97,7 +123,11 @@ For software pipelines, **the repo is the primary blackboard**:
- T2 architects own integration branches - T2 architects own integration branches
- T1 does final integration and acceptance - T1 does final integration and acceptance
Supplemented by a SQLite coordination store per run tracking in-flight workstreams, handoff artifacts, tier status, and retry counts. Supplemented by a SQLite coordination store per run tracking:
- In-flight workstreams and their current execution plans
- Handoff artifacts and tier status
- Retry counts and escalation history
- Path amendments (proposed, by whom, timestamp)
--- ---
@@ -114,6 +144,8 @@ Supplemented by a SQLite coordination store per run tracking in-flight workstrea
Retry limits prevent infinite loops. Escalation path is always upward, never sideways. Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
--- ---
## Agent Talent Pool ## Agent Talent Pool
@@ -150,7 +182,7 @@ T1 selects the right specialist from the roster when building workstream briefs.
| T5 | Performance | testing-performance-benchmarker | | T5 | Performance | testing-performance-benchmarker |
| T5 | Security | security-engineer | | T5 | Security | security-engineer |
The roster is not fixed — T1 can select any agent from the library based on workstream needs. Non-engineering agents (design, marketing, product) extend the system to non-software pipelines. The roster is not fixed — T1 can select any agent from the library based on workstream needs.
--- ---
@@ -160,7 +192,7 @@ Everything external is a swappable adapter. Core logic never imports from adapte
``` ```
Core (platform-agnostic) Core (platform-agnostic)
├── team_runner — run lifecycle, agent spawning, runtime selection ├── team_runner — thin bootstrap: spawn T1, monitor blackboard, handle result
├── blackboard — SQLite coordination state ├── blackboard — SQLite coordination state
├── task_brief — schema + validation ├── task_brief — schema + validation
└── escalation — retry logic, failure routing └── escalation — retry logic, failure routing
@@ -176,33 +208,30 @@ Adapters (swappable)
Swapping providers means writing a new adapter file — nothing in core changes. Swapping providers means writing a new adapter file — nothing in core changes.
T4 and T5 default to the **coding agent runtime** when available. It provides direct file system access, git operations, and test execution — no need to shuttle file contents through message context. Falls back to standard runtime gracefully if not configured. T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
--- ---
## Decisions ## Decisions Log
**Depth decision** — T1 assesses scope on receipt and determines how many tiers to engage. Not pre-configured per task type. **T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
**Trigger mechanism** — User messages Hans → Hans spins up T1 with the goal. T1 takes it from there. **Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew for review. Merge is gated on human sign-off. Notification is dual: Hans messages Andrew directly, and a PR is opened on the VCS platform so Andrew gets notified natively too. This keeps the review step platform-independent — whichever VCS is in use, Hans always notifies Andrew directly as a fallback. **T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
**Retry limits** — Three failure types, handled differently: **T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
- *Bad output* → retry T4 with a corrected brief (default: 3 retries)
- *Blocked* → escalate immediately, no retries
- *Partial output* → salvage good parts, re-task the remainder
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner. **Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: round-table. T3: light mesh. T4: swarm+pipeline. T5: fan-out+consensus.
**Platform agnosticism** — Core logic is provider and platform agnostic. LLMs, VCS, notifications, and agent runtimes are all adapters. Tiers reference capability levels (`reasoning-heavy`, `capable`, `fast-cheap`), not specific model names. Provider-to-model mapping lives in config. **Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off.
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection and mixing providers across tiers (e.g. T1 on OpenAI o3, T4 workers on local Ollama). **Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw is used as the runtime adapter via existing primitives (sessions_spawn, sessions_send, subagents) — called through a skill layer. No gateway fork. Keeps platform agnosticism intact and avoids Node/Python mismatch and fork maintenance burden. **LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
**Coding agent runtime** — Claude Code is the default T4/T5 runtime for software pipelines. It is purpose-built for implementation and verification: direct file access, git ops, test execution. Enters as a runtime adapter — swappable for Codex, Aider, or any equivalent. T1/T2/T3 always use the standard runtime (they reason, they don't edit files). **Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
**Claude Code native teams** — Claude Code has an experimental agent teams feature that fans out sub-agents internally within a session. Integrated as an opt-in flag (`native_teams: true`) in the coding_agent runtime adapter. When enabled, T3 hands a full workstream to Claude Code and it parallelises internally — faster, but less granular blackboard visibility. Default is `false` — explicit T4 spawning is the baseline; native teams is a speed optimisation to enable deliberately. **Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
**Agency-agents integration**Agent personalities sourced from [msitarzewski/agency-agents](https://github.com/msitarzewski/agency-agents) via git submodule. Included as `agents/` in the repo. T1 selects specialists from the roster via `config/role_registry.yaml`. Each task brief carries an `agent_personality` field (path to the agent .md file) which the runtime adapter injects as the system prompt at spawn time. Adding new specialists means adding an entry to the registry — no core changes required. **Agency-agents integration**Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.