docs: resolve all design questions + visibility layer + portability audit

docs: resolve all design questions + visibility layer + portability audit
This commit is contained in:
2026-03-30 15:18:48 -04:00
committed by GitHub
3 changed files with 485 additions and 33 deletions

2
agents

Submodule agents updated: aacfb86196...5f1204a023

View File

@@ -1,7 +1,7 @@
# Tiered Agent Team System — Build Spec
_Started: 2026-03-15. Status: Pre-build._
_See agent-teams-design.md for the design doc and decisions log._
_Started: 2026-03-15. Last updated: 2026-03-30._
_See design.md for the design doc and decisions log._
---
@@ -40,7 +40,7 @@ agent-teams/
│ │ ├── notify.py — abstract notification interface
│ │ └── runtime.py — abstract agent runtime interface
│ ├── llm/
│ │ ├── anthropic.py — Claude via OpenClaw or direct API
│ │ ├── anthropic.py — Claude via direct Anthropic API
│ │ ├── openai.py — GPT / o-series
│ │ └── ollama.py — local models
│ ├── vcs/
@@ -68,6 +68,9 @@ agent-teams/
│ ├── team.yaml — example run configuration
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
├── cli/
│ └── agency.py — run, watch, inspect, approve, reject, pause, resume
├── runs/ — runtime state, one subdir per run_id
│ └── .gitkeep
@@ -131,12 +134,43 @@ CREATE TABLE events (
event_id TEXT PRIMARY KEY,
run_id TEXT NOT NULL,
brief_id TEXT,
kind TEXT NOT NULL, -- spawned | completed | failed | escalated | retried
kind TEXT NOT NULL, -- see event vocabulary below
detail TEXT, -- JSON
created_at TEXT NOT NULL
);
```
**Event kind vocabulary:**
```
-- lifecycle
spawned | completed | failed | escalated | retried
-- visibility / gates
gate_pending -- runner hit an inspection gate, waiting for human
gate_approved -- human approved via CLI or notify
gate_rejected -- human rejected, tier re-invoked
gate_paused -- manual pause via CLI
gate_resumed -- manual resume via CLI
-- amendments / informational
path_amendment -- mid-run tier proposed a tier path change
log -- human-readable log line (detail: {level, message})
```
**t3_task_lists** *(T3 mesh coordination)*
```sql
CREATE TABLE t3_task_lists (
entry_id TEXT PRIMARY KEY,
run_id TEXT NOT NULL,
workstream_id TEXT NOT NULL,
t3_agent_id TEXT NOT NULL,
status TEXT NOT NULL, -- draft | committed
tasks TEXT NOT NULL, -- JSON array of proposed T4 task descriptors
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
```
---
## Task Brief Schema
@@ -283,6 +317,19 @@ retry_defaults:
bad_output: 3
partial: 2
blocked: 0 # always escalate immediately
visibility:
strict_mode: false # true = all gates on (recommended for first runs)
log_level: normal # normal | verbose (verbose = per-T4 start/done lines)
inspection_gates:
t1_plan: true # always — required by design
t2_lead: false # optional — review boundaries before specialists spawn
t2_synthesis: true # recommended — review architecture before implementation
t3_plan: false # verbose — useful early on, disable once T3 is trusted
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
gate_timeout_minutes: 60 # auto-reject if no human response within this window
t3_mesh_timeout_minutes: 10 # max time for T3s to commit task lists before runner escalates
```
---
@@ -338,7 +385,7 @@ t5:
### 1. Run Kickoff
```
User → Hans → team_runner.start(goal, config)
User → team_runner.start(goal, config) # via CLI or any caller
→ generate run_id
→ init blackboard (create runs/<run_id>/blackboard.db)
→ build T1 brief (goal_anchor = goal, retry_budget from config)
@@ -388,7 +435,29 @@ spawn T4 with brief
→ notify T3
```
### 4. Review Gate
### 4. Inspection Gate Flow
```
runner reaches configured gate (e.g. t2_synthesis)
→ write event(gate_pending, detail={tier, summary, what_happens_next})
→ notify_adapter.send(tier summary + gate context)
→ halt: poll blackboard for gate_approved or gate_rejected
gate_approved:
→ write event(gate_approved)
→ continue run
gate_rejected:
→ write event(gate_rejected, detail={reason})
→ re-invoke tier with rejection reason in brief context
→ loop back to gate_pending when tier completes again
gate_timeout (gate_timeout_minutes elapsed):
→ treat as gate_rejected
→ notify Andrew: "Gate timed out, re-invoking tier"
```
### 5. Review Gate
```
T1 completes integration
@@ -412,19 +481,20 @@ T1 completes integration
1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
2. `config/role_registry.yaml` — map tier+domain → agent personality files
3. `core/task_brief.py` — schema + validation (everything depends on this)
4. `core/blackboard.py` — SQLite store, all table definitions
3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
5. `adapters/base/*` — all four abstract interfaces
6. `adapters/llm/anthropic.py` — first LLM implementation
7. `core/escalation.py` — retry + failure routing logic
7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
10. `core/team_runner.py` — full run lifecycle, runtime + personality selection
11. `prompts/` — fallback tier prompts (used when no agent_personality set)
12. `adapters/vcs/github.py` — PR creation + branch management
13. `adapters/notify/openclaw.py` — Hans notification
14. `config/team.yaml` — example config
15. `README.md` — how to run, how to add adapters, how to extend the roster
10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
12. `prompts/` — fallback tier prompts (used when no agent_personality set)
13. `adapters/vcs/github.py` — PR creation + branch management
14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
15. `config/team.yaml` — example config with full visibility block
16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
---

View File

@@ -1,22 +1,28 @@
# Tiered Agent Team System — Design Document
_Started: 2026-03-14. Last updated: 2026-03-16 (evening)._
_Started: 2026-03-14. Last updated: 2026-03-30._
---
## Open Design Questions
## Resolved Design Decisions (formerly Open Questions)
The following areas are identified but not yet resolved. Work through these before implementing `core/team_runner.py`.
All eight open questions resolved 2026-03-30. Details in Decisions Log.
1. **T3 mesh mechanics** — How do T3s within the same T2 domain coordinate? Via blackboard, direct message exchange, or a designated T3 lead? What does "negotiate task boundaries" look like concretely?
1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
2. **T1 output schema** — What does T1's Plan phase output look like as structured data? Needs a formal schema: workstreams, tier paths, parallelism flags, retry budget, T2 specialist list. This is what the runner parses to bootstrap the pipeline.
2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
3. **T5 consensus mechanics** — Individual T5s review their slice and produce results. Who aggregates? What does the joint verdict look like as structured data? What happens on split verdict (some T5s pass, some fail)?
3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
4. **Path amendment mechanism** — When a mid-run tier proposes a path amendment, what's the concrete mechanism? Who writes to the blackboard, in what format, and how does the relevant higher tier get notified?
4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
5. **Failure handling (distributed model)** — The current failure table assumes centralised runner handling. Needs to be rewritten to reflect distributed ownership: T3 handles T4 failures, T2 handles T3 failures, T1 handles T2 failures. Runner only handles T1 failure and terminal escalation to human.
5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
7. **Gate approval UX**`agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
---
@@ -167,6 +173,249 @@ T1 — Phase 1: Plan (self-critique → Andrew approval)
---
## Use Case Flows
T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
### Full Stack — T1→T2→T3→T4→T5
*Complex feature, new product, cross-domain changes*
```
T1 Plan
→ assess complexity (high)
→ output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
→ self-critique pass
→ GATE: surface to Andrew ← approval required
T2 Lead (spawned by runner after approval)
→ receive: goal + full workplan
→ publish: domain boundaries + shared assumptions doc → blackboard
→ GATE (optional): review boundaries before specialists spawn
T2 Specialists (parallel fan-out, wait on Lead)
→ each receives: their domain boundary + shared assumptions
→ produce: architecture proposal for their slice
→ Lead synthesises, drives conflict resolution if needed
→ Lead writes: canonical architecture → blackboard
→ GATE (recommended): review architecture before implementation
Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
T3s (light mesh within T2 domain)
→ write draft task lists to blackboard
→ read peers' lists, reconcile boundaries
→ commit merged task plan before T4 dispatch
→ GATE (optional): review task breakdown
T4s
→ swarm: independent tasks run in parallel
→ pipeline: T4-A output feeds T4-B (T3 declares dependencies)
→ commit to feature branches
T5s (fan-out per T4 slice)
→ each reviews its slice independently
→ T3 collects results → joint verdict
→ GATE (optional): review T5 verdict before T3 marks done
→ partial: T3 retries only failed slices
→ pass: T3 signals workstream done to T2
T2 specialists → signal T2 Lead
T2 Lead → writes integration summary → blackboard
T1 Accept
→ validate against goal anchor
→ open PR, notify_adapter.send(pr summary + url)
```
### Medium Complexity — T1→T3→T4→T5
*Config change, isolated bug fix — T1 determines no cross-domain design needed*
```
T1 Plan
→ assess: contained scope, single domain, no T2 architecture needed
→ workplan: tier paths [T3, T4, T5]
→ GATE: Andrew approval
T3s spawned directly by runner
→ receives T1 brief with task context (no T2 architecture layer)
→ T3 light mesh → T4 dispatch → T5 verify → signal done
T1 Accept → PR
```
### Simple / Hotfix — T1→T4→T5
*Single file, single function, trivial atomic task*
```
T1 Plan
→ assess: trivial, single workstream
→ tier path: [T4, T5]
→ GATE: Andrew approval
T4 (coding agent)
→ single atomic task, commits
T5 (single verifier, not full fan-out)
→ code review + correctness check
→ pass → T1 Accept → PR
```
---
## Resolved Mechanics
### T3 Mesh via Blackboard
T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
2. Each T3 reads all sibling T3 draft lists in its T2 domain
3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
---
### T1 Plan Output Schema
T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
```json
{
"run_id": "uuid",
"goal_anchor": "Original goal — immutable, propagated to every downstream brief",
"complexity": "high | medium | low",
"retry_budget_multiplier": 2,
"workstreams": [
{
"id": "ws-backend-api",
"name": "Backend API",
"domain": "backend",
"tier_path": ["t2", "t3", "t4", "t5"],
"parallel_group": "A",
"t2_specialist": "agents/engineering/engineering-software-architect.md",
"notes": "Focus on webhook ingest and retry queue"
}
],
"parallelism": {
"groups": {
"A": ["ws-backend-api", "ws-frontend"],
"B": ["ws-infra"]
},
"sequence": ["A", "B"]
},
"self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
}
```
`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
---
### T5 Consensus & Verdict Schema
T3 aggregates all T5 results into a joint verdict after fan-out completes.
**Individual T5 result:**
```json
{
"verifier_id": "uuid",
"scope": "queue-client",
"verdict": "pass | fail",
"issues": ["issue description..."],
"notes": "human-readable summary"
}
```
**T3 joint verdict (written to blackboard):**
```json
{
"t5_results": [...],
"joint_verdict": "pass | partial | fail",
"failed_scopes": ["queue-client"],
"summary": "Human-readable summary for gate surface and logs"
}
```
**Split verdict handling:**
- `pass` → T3 marks workstream done, signals T2
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
- `fail` → T3 escalates to T2 (or T1 if shallow path)
---
### Spawn Call Ownership
The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
**Flow:**
1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
2. Runner's spawn loop detects pending rows
3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
---
### Gate Approval UX
**Core mechanic (platform-agnostic):**
1. Runner writes `gate_pending` to blackboard
2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
3. Runner polls blackboard for `gate_approved` or `gate_rejected`
4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
**Adapter responsibility:**
Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
---
### T3 Mesh Timeout
If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
---
### Path Amendment Mechanism
When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
1. The discovering tier writes a `path_amendment` event to the blackboard:
```json
{
"kind": "path_amendment",
"proposed_by": "t3/ws-backend-api",
"reason": "Discovered auth dependency requires T2 architectural pass",
"amendment": {
"workstream": "ws-backend-api",
"add_tiers": ["t2"],
"insert_before": "t3"
}
}
```
2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
No agent needs callback plumbing. The runner is the notification bridge.
---
## Shared State
For software pipelines, **the repo is the primary blackboard**:
@@ -185,14 +434,21 @@ Supplemented by a SQLite coordination store per run tracking:
## Failure Handling
| Failure | Handler | Action |
|---------|---------|--------|
| T4 bad output | T3 | Retry T4 with corrected brief (up to retry_budget) |
| T4 blocked | T3 | Escalate immediately — no retries |
| T4 partial output | T3 | Salvage good parts, re-task remainder |
| T3 workstream stuck | T2 | Re-scope or split the workstream |
| T2 design wrong | T1 | Re-plan; may discard workstream and restart |
| Repeated escalation | Surface to user | Block until human unblocks |
Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
| Failure | Owner | Handler | Action |
|---------|-------|---------|--------|
| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
@@ -264,6 +520,114 @@ T4 and T5 default to the **coding agent runtime** when available. Falls back to
---
## Run Visibility Layer
Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
### 1. Human-Readable Live Log
Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
```
[abc123] 12:30:01 T1 PLAN_START Assessing scope: "Build webhook ingestion system"
[abc123] 12:30:14 T1 PLAN_DONE 3 workstreams — backend-api, infra, docs (2 parallel)
[abc123] 12:30:14 GATE APPROVAL ⏸ Waiting on approval before T2 spawns
[abc123] 12:31:02 GATE APPROVED ✓ Approved — continuing
[abc123] 12:31:03 T2 LEAD_START Lead Architect spawned
[abc123] 12:31:41 T2 BOUNDS_READY Domain boundaries + shared assumptions published
[abc123] 12:31:42 T2 SPEC_START 3 specialists spawned (parallel): backend, infra, docs
[abc123] 12:32:15 T2 SPEC_DONE backend-api architecture draft ready
[abc123] 12:32:58 T2 SYNTH_DONE Canonical architecture written to blackboard
[abc123] 12:32:58 GATE INSPECTION ⏸ T2 synthesis ready for review
[abc123] 12:33:44 T3 MESH_START backend-api: 2 squad leads negotiating task boundaries
[abc123] 12:34:01 T3 MESH_DONE Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
[abc123] 12:34:02 T4 SWARM_START 5 workers spawned in parallel
[abc123] 12:35:10 T4 DONE worker-3 auth-middleware ✓
[abc123] 12:35:22 T4 FAIL worker-4 queue-client ✗ (retry 1/3)
[abc123] 12:36:04 T4 DONE worker-4 queue-client ✓ (retry resolved)
[abc123] 12:36:05 T5 VERIFY_START 4 verifiers spawned
[abc123] 12:36:45 T5 VERDICT partial — queue-client needs rework
[abc123] 12:37:12 T5 VERDICT ✓ all pass — workstream backend-api done
```
Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
### 2. Inspection Gates
Configurable pause points. When the runner hits a gate, it:
1. Writes a `gate_pending` event to the blackboard
2. Fires `notify_adapter.send()` with the tier summary + gate context
3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
The tier summary surfaced at each gate includes:
- **What was produced** (the tier artifact in readable form)
- **What happens next** (which agents will spawn, doing what)
- **Any anomalies** flagged by the tier itself
Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
```yaml
visibility:
strict_mode: false
log_level: normal # normal | verbose
inspection_gates:
t1_plan: true # always — required by design
t2_lead: false # optional — review boundaries before specialists
t2_synthesis: true # recommended — review architecture before implementation
t3_plan: false # verbose — useful early on, disable once T3 is trusted
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
gate_timeout_minutes: 60 # auto-reject if no response within this window
```
### 3. Inspection CLI — `cli/agency.py`
```
agency run <config.yaml> # start a run, returns run_id
agency watch <run_id> # tail live log (follows blackboard events)
agency inspect <run_id> # interactive tree view of run state
agency inspect <run_id> --tier t2 # jump to T2 artifacts
agency inspect <run_id> --brief <id> # show full brief + result JSON
agency approve <run_id> # approve current gate → continue
agency approve <run_id> --note "..." # approve with a note written to blackboard
agency reject <run_id> --reason "..." # reject → tier re-invoked
agency pause <run_id> # force-pause at next tier boundary
agency resume <run_id> # release a manual pause
```
`agency inspect` (no flags) renders a live tree:
```
Run abc123 — "Build webhook ingestion system"
├── T1 Plan ✓
│ └── [view workplan]
├── T2 Architecture ✓ [GATE: pending review]
│ ├── [view domain boundaries]
│ ├── [view shared assumptions]
│ └── [view canonical architecture]
├── T3 backend-api (active)
│ ├── [view task breakdown]
│ └── T4 workers: 3/7 done, 1 retrying, 3 pending
└── T3 infra (pending)
```
### Blackboard Event Vocabulary (extended)
```python
# existing
"spawned" | "completed" | "failed" | "escalated" | "retried"
# new — visibility layer
"gate_pending" # runner hit a gate, waiting for human
"gate_approved" # human approved, run continues
"gate_rejected" # human rejected, tier re-invoked
"gate_paused" # manual pause via CLI
"gate_resumed" # manual resume via CLI
"path_amendment" # mid-run tier proposed path change
"log" # human-readable log line (level + message)
```
---
## Decisions Log
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
@@ -286,7 +650,7 @@ T4 and T5 default to the **coding agent runtime** when available. Falls back to
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off.
**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
@@ -297,3 +661,21 @@ T4 and T5 default to the **coding agent runtime** when available. Falls back to
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
**Gate approval UX**`agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.