docs: lock in visibility layer, resolve all 5 open design questions

- Resolve T3 mesh mechanics: blackboard-based draft/commit cycle - Resolve T1 plan output schema: formal JSON structure with workstreams + parallelism groups - Resolve T5 consensus: T3 aggregates joint verdict (pass/partial/fail), partial retries failed slices only - Resolve path amendment mechanism: event-based, runner notifies higher tier, no approval gate - Resolve failure handling: confirmed distributed ownership, runner owns T1 + terminal only Add run visibility layer: - Human-readable live log (normal + verbose modes) - Configurable inspection gates (t1_plan always, t2_synthesis recommended, others optional) - strict_mode flag for full gating on early runs - cli/agency.py: run, watch, inspect, approve, reject, pause, resume - gate_pending halt loop in team_runner, gate_approved/rejected resume - Expanded blackboard event vocabulary (gate_*, path_amendment, log) - t3_task_lists table for mesh coordination state - Inspection gate flow added to buildspec Key Flows Build order updated: 16 steps (added cli/ step, clarified runner gate responsibilities)
2026-03-30 13:43:19 -04:00
parent 882b769d21
commit a721db63f6
2 changed files with 424 additions and 29 deletions
--- a/docs/buildspec.md
+++ b/docs/buildspec.md
@@ -1,7 +1,7 @@
 # Tiered Agent Team System — Build Spec

-_Started: 2026-03-15. Status: Pre-build._
-_See agent-teams-design.md for the design doc and decisions log._
+_Started: 2026-03-15. Last updated: 2026-03-30._
+_See design.md for the design doc and decisions log._

 ---

@@ -68,6 +68,9 @@ agent-teams/
 │   ├── team.yaml            — example run configuration
 │   └── role_registry.yaml   — maps (tier, domain) → agent personality file
 │
+├── cli/
+│   └── agency.py            — run, watch, inspect, approve, reject, pause, resume
+│
 ├── runs/                    — runtime state, one subdir per run_id
 │   └── .gitkeep
 │
@@ -131,12 +134,43 @@ CREATE TABLE events (
    event_id    TEXT PRIMARY KEY,
    run_id      TEXT NOT NULL,
    brief_id    TEXT,
-    kind        TEXT NOT NULL,  -- spawned | completed | failed | escalated | retried
+    kind        TEXT NOT NULL,  -- see event vocabulary below
    detail      TEXT,           -- JSON
    created_at  TEXT NOT NULL
 );
 ```

+**Event kind vocabulary:**
+```
+-- lifecycle
+spawned | completed | failed | escalated | retried
+
+-- visibility / gates
+gate_pending    -- runner hit an inspection gate, waiting for human
+gate_approved   -- human approved via CLI or notify
+gate_rejected   -- human rejected, tier re-invoked
+gate_paused     -- manual pause via CLI
+gate_resumed    -- manual resume via CLI
+
+-- amendments / informational
+path_amendment  -- mid-run tier proposed a tier path change
+log             -- human-readable log line (detail: {level, message})
+```
+
+**t3_task_lists** *(T3 mesh coordination)*
+```sql
+CREATE TABLE t3_task_lists (
+    entry_id        TEXT PRIMARY KEY,
+    run_id          TEXT NOT NULL,
+    workstream_id   TEXT NOT NULL,
+    t3_agent_id     TEXT NOT NULL,
+    status          TEXT NOT NULL,  -- draft | committed
+    tasks           TEXT NOT NULL,  -- JSON array of proposed T4 task descriptors
+    created_at      TEXT NOT NULL,
+    updated_at      TEXT NOT NULL
+);
+```
+
 ---

 ## Task Brief Schema
@@ -283,6 +317,19 @@ retry_defaults:
  bad_output: 3
  partial: 2
  blocked: 0    # always escalate immediately
+
+visibility:
+  strict_mode: false          # true = all gates on (recommended for first runs)
+  log_level: normal           # normal | verbose (verbose = per-T4 start/done lines)
+  inspection_gates:
+    t1_plan: true             # always — required by design
+    t2_lead: false            # optional — review boundaries before specialists spawn
+    t2_synthesis: true        # recommended — review architecture before implementation
+    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
+    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
+  gate_timeout_minutes: 60    # auto-reject if no human response within this window
+
+t3_mesh_timeout_minutes: 10   # max time for T3s to commit task lists before runner escalates
 ```

 ---
@@ -388,7 +435,29 @@ spawn T4 with brief
    → notify T3
 ```

-### 4. Review Gate
+### 4. Inspection Gate Flow
+
+```
+runner reaches configured gate (e.g. t2_synthesis)
+  → write event(gate_pending, detail={tier, summary, what_happens_next})
+  → notify_adapter.send(tier summary to Andrew via Hans)
+  → halt: poll blackboard for gate_approved or gate_rejected
+
+  gate_approved:
+    → write event(gate_approved)
+    → continue run
+
+  gate_rejected:
+    → write event(gate_rejected, detail={reason})
+    → re-invoke tier with rejection reason in brief context
+    → loop back to gate_pending when tier completes again
+
+  gate_timeout (gate_timeout_minutes elapsed):
+    → treat as gate_rejected
+    → notify Andrew: "Gate timed out, re-invoking tier"
+```
+
+### 5. Review Gate

 ```
 T1 completes integration
@@ -412,19 +481,20 @@ T1 completes integration

 1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
 2. `config/role_registry.yaml` — map tier+domain → agent personality files
-3. `core/task_brief.py` — schema + validation (everything depends on this)
-4. `core/blackboard.py` — SQLite store, all table definitions
+3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
+4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
 5. `adapters/base/*` — all four abstract interfaces
 6. `adapters/llm/anthropic.py` — first LLM implementation
-7. `core/escalation.py` — retry + failure routing logic
+7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
 8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
 9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
-10. `core/team_runner.py` — full run lifecycle, runtime + personality selection
-11. `prompts/` — fallback tier prompts (used when no agent_personality set)
-12. `adapters/vcs/github.py` — PR creation + branch management
-13. `adapters/notify/openclaw.py` — Hans notification
-14. `config/team.yaml` — example config
-15. `README.md` — how to run, how to add adapters, how to extend the roster
+10. `core/team_runner.py` — full run lifecycle: gate logic (gate_pending halt loop, gate_approved resume), path amendment monitor, T1 failure + terminal escalation only
+11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
+12. `prompts/` — fallback tier prompts (used when no agent_personality set)
+13. `adapters/vcs/github.py` — PR creation + branch management
+14. `adapters/notify/openclaw.py` — Hans notification; used for gate surfaces (tier summary to Andrew)
+15. `config/team.yaml` — example config with full visibility block
+16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference

 ---