docs: update design — dynamic dispatch, distributed ownership, orchestration patterns

2026-03-16 16:13:33 -04:00
parent 72bd744664
commit 1ed7023c08
1 changed files with 81 additions and 52 deletions
@@ -1,6 +1,6 @@
 # Tiered Agent Team System — Design Document

-_Started: 2026-03-14. Status: Pre-build, gathering requirements._
+_Started: 2026-03-14. Last updated: 2026-03-16._

 ---

@@ -16,7 +16,7 @@ A dynamic, hierarchical multi-agent system for software pipelines. Teams assembl
 Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.

 **2. Depth is proportional to complexity.**
-Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack.
+Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.

 **3. Goal anchoring at every level.**
 T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
@@ -24,8 +24,8 @@ T1's original intent is embedded in every agent's context — not just passed to
 **4. Artifacts, not summaries.**
 Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.

-**5. Verification is bidirectional.**
-Lower tiers verify correctness. Upper tiers verify alignment with original intent. Both directions catch different failure modes.
+**5. Verification is mandatory.**
+T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.

 **6. Provider agnostic.**
 The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
@@ -39,52 +39,78 @@ Tiers define structure and responsibility. Agent personalities define domain exp

 | Tier | Role | Owns | Capability Level |
 |------|------|------|-----------------|
-| T1 | Visionary | Goal, constraints, final acceptance, architectural bets | reasoning-heavy |
+| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
 | T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
-| T3 | Squad Lead | Workstream delivery, worker coordination, quality gate | capable |
+| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
 | T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
 | T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |

-T5 runs **parallel to T4**, not above it. It's a quality gate, not a management layer.
+T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.

 Capability levels map to actual models per provider in config — the core system never references a specific model name.

 ---

-## Variable Depth
+## Dispatch Model

-```
-Config change          T3 → T4
-New feature            T2 → T3 → T4
-Major refactor         T1 → T2 → T3 → T4 → T5
-New system / product   T1 → T2 → T3s (parallel) → T4s → T5s
-```
+### T1 Owns the Plan

-T3 assesses scope on receipt. If a task is simple enough, it handles it directly without spawning upward or waiting for T2 sign-off.
+T1 is not just a decomposer — it is the dispatch planner. Its output declares:
+
+- **Workstreams** — the decomposed units of work
+- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
+- **Parallelism** — which workstreams are independent and can run concurrently
+
+T1 does not prescribe how each tier operates internally. That is the tier's own concern.
+
+### Each Tier Owns the Layer Below
+
+Control flow is distributed, not centralised:
+
+- T1 manages its T2s
+- T2 manages its T3s
+- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
+- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
+
+This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
+
+**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
+
+### Dynamic Paths
+
+Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
+
+---
+
+## Orchestration Patterns Per Tier
+
+Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
+
+| Tier | Pattern | Rationale |
+|------|---------|-----------|
+| T1 | Single agent | Must be authoritative; no committee |
+| T2 | Group chat / round-table | Specialist architects (security, perf, data, API) debate and reach consensus before committing to a design |
+| T3 | Light mesh | Peer coordination to negotiate task boundaries and avoid T4 conflicts before dispatch |
+| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
+| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |

 ---

 ## Horizontal Scaling Within Tiers

-Each tier can have multiple agents running in parallel:
-
 ```
-T1 (1–2 agents)
-├── T2: Backend Architect
-│   ├── T3: API Squad Lead
-│   │   ├── T4: Worker — endpoint A
-│   │   ├── T4: Worker — endpoint B
-│   │   └── T5: Verifier
-│   └── T3: DB Squad Lead
-│       ├── T4: Worker — migrations
-│       └── T5: Verifier
-├── T2: Frontend Architect
-│   └── T3: UI Squad Lead
-│       ├── T4: Worker — component X
-│       └── T4: Worker — component Y
-└── T2: Infra Architect
-    └── T3: Platform Squad Lead
-        └── T4: Worker — config / deploy
+T1 (1 agent — authoritative)
+├── T2: Backend Architect  ─┐
+├── T2: Frontend Architect  ├─ round-table consensus
+└── T2: Infra Architect    ─┘
+    │
+    └── T3: Squad Lead (per workstream)  ─┐
+            │                             ├─ light mesh across T3s
+            ├── T4: Worker A  ─┐          │
+            ├── T4: Worker B  ─┼─ swarm / pipeline (T3 decides)
+            └── T4: Worker C  ─┘
+                    │
+                    └── T5: Verifier(s) — fan-out + consensus
 ```

 ---
@@ -97,7 +123,11 @@ For software pipelines, **the repo is the primary blackboard**:
 - T2 architects own integration branches
 - T1 does final integration and acceptance

-Supplemented by a SQLite coordination store per run tracking in-flight workstreams, handoff artifacts, tier status, and retry counts.
+Supplemented by a SQLite coordination store per run tracking:
+- In-flight workstreams and their current execution plans
+- Handoff artifacts and tier status
+- Retry counts and escalation history
+- Path amendments (proposed, by whom, timestamp)

 ---

@@ -114,6 +144,8 @@ Supplemented by a SQLite coordination store per run tracking in-flight workstrea

 Retry limits prevent infinite loops. Escalation path is always upward, never sideways.

+T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
+
 ---

 ## Agent Talent Pool
@@ -150,7 +182,7 @@ T1 selects the right specialist from the roster when building workstream briefs.
 | T5 | Performance | testing-performance-benchmarker |
 | T5 | Security | security-engineer |

-The roster is not fixed — T1 can select any agent from the library based on workstream needs. Non-engineering agents (design, marketing, product) extend the system to non-software pipelines.
+The roster is not fixed — T1 can select any agent from the library based on workstream needs.

 ---

@@ -160,7 +192,7 @@ Everything external is a swappable adapter. Core logic never imports from adapte

 ```
 Core (platform-agnostic)
-├── team_runner      — run lifecycle, agent spawning, runtime selection
+├── team_runner      — thin bootstrap: spawn T1, monitor blackboard, handle result
 ├── blackboard       — SQLite coordination state
 ├── task_brief       — schema + validation
 └── escalation       — retry logic, failure routing
@@ -176,33 +208,30 @@ Adapters (swappable)

 Swapping providers means writing a new adapter file — nothing in core changes.

-T4 and T5 default to the **coding agent runtime** when available. It provides direct file system access, git operations, and test execution — no need to shuttle file contents through message context. Falls back to standard runtime gracefully if not configured.
+T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.

 ---

-## Decisions
+## Decisions Log

-**Depth decision** — T1 assesses scope on receipt and determines how many tiers to engage. Not pre-configured per task type.
+**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.

-**Trigger mechanism** — User messages Hans → Hans spins up T1 with the goal. T1 takes it from there.
+**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.

-**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew for review. Merge is gated on human sign-off. Notification is dual: Hans messages Andrew directly, and a PR is opened on the VCS platform so Andrew gets notified natively too. This keeps the review step platform-independent — whichever VCS is in use, Hans always notifies Andrew directly as a fallback.
+**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.

-**Retry limits** — Three failure types, handled differently:
- *Bad output* → retry T4 with a corrected brief (default: 3 retries)
- *Blocked* → escalate immediately, no retries
- *Partial output* → salvage good parts, re-task the remainder
+**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.

-T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
+**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: round-table. T3: light mesh. T4: swarm+pipeline. T5: fan-out+consensus.

-**Platform agnosticism** — Core logic is provider and platform agnostic. LLMs, VCS, notifications, and agent runtimes are all adapters. Tiers reference capability levels (`reasoning-heavy`, `capable`, `fast-cheap`), not specific model names. Provider-to-model mapping lives in config.
+**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off.

-**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection and mixing providers across tiers (e.g. T1 on OpenAI o3, T4 workers on local Ollama).
+**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.

-**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw is used as the runtime adapter via existing primitives (sessions_spawn, sessions_send, subagents) — called through a skill layer. No gateway fork. Keeps platform agnosticism intact and avoids Node/Python mismatch and fork maintenance burden.
+**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.

-**Coding agent runtime** — Claude Code is the default T4/T5 runtime for software pipelines. It is purpose-built for implementation and verification: direct file access, git ops, test execution. Enters as a runtime adapter — swappable for Codex, Aider, or any equivalent. T1/T2/T3 always use the standard runtime (they reason, they don't edit files).
+**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.

-**Claude Code native teams** — Claude Code has an experimental agent teams feature that fans out sub-agents internally within a session. Integrated as an opt-in flag (`native_teams: true`) in the coding_agent runtime adapter. When enabled, T3 hands a full workstream to Claude Code and it parallelises internally — faster, but less granular blackboard visibility. Default is `false` — explicit T4 spawning is the baseline; native teams is a speed optimisation to enable deliberately.
+**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.

-**Agency-agents integration** — Agent personalities sourced from [msitarzewski/agency-agents](https://github.com/msitarzewski/agency-agents) via git submodule. Included as `agents/` in the repo. T1 selects specialists from the roster via `config/role_registry.yaml`. Each task brief carries an `agent_personality` field (path to the agent .md file) which the runtime adapter injects as the system prompt at spawn time. Adding new specialists means adding an entry to the registry — no core changes required.
+**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.