diff --git a/docs/buildspec.md b/docs/buildspec.md index 2b3cac5..c462bcb 100644 --- a/docs/buildspec.md +++ b/docs/buildspec.md @@ -74,6 +74,8 @@ agent-teams/ ├── runs/ — runtime state, one subdir per run_id │ └── .gitkeep │ +├── pending_gates.json — live file: gates currently awaiting approval (written by runner, read by Hans) +│ └── README.md ``` @@ -488,7 +490,7 @@ T1 completes integration 7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally) 8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection 9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt -10. `core/team_runner.py` — full run lifecycle: gate logic (gate_pending halt loop, gate_approved resume), path amendment monitor, T1 failure + terminal escalation only +10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, writes pending_gates.json, gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only 11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree 12. `prompts/` — fallback tier prompts (used when no agent_personality set) 13. `adapters/vcs/github.py` — PR creation + branch management diff --git a/docs/design.md b/docs/design.md index 8486f51..9e28b9e 100644 --- a/docs/design.md +++ b/docs/design.md @@ -6,7 +6,7 @@ _Started: 2026-03-14. Last updated: 2026-03-30._ ## Resolved Design Decisions (formerly Open Questions) -All five open questions resolved 2026-03-30. Details in Decisions Log. +All eight open questions resolved 2026-03-30. Details in Decisions Log. 1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_. @@ -18,6 +18,12 @@ All five open questions resolved 2026-03-30. Details in Decisions Log. 5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table. +6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop. + +7. **Gate approval UX** → Both Signal reply (via Hans) and direct CLI are supported — both write to the same blackboard. Runner only cares that a `gate_approved` event exists, not who wrote it. Hans maintains `pending_gates.json` in workspace for multi-run disambiguation. + +8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch). + --- --- @@ -340,6 +346,61 @@ T3 aggregates all T5 results into a joint verdict after fan-out completes. --- +### Spawn Call Ownership + +The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it. + +**Flow:** +1. A tier completes and writes child briefs to the `briefs` table with `status=pending` +2. Runner's spawn loop detects pending rows +3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts +4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief +5. Spawned agent runs, writes its own child briefs as pending when done → loop continues + +This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required. + +--- + +### Gate Approval UX + +Two paths, both valid, same outcome — runner only cares that a `gate_approved` event exists in the blackboard: + +**Signal (via Hans):** +Andrew receives the tier summary from Hans in Signal. Replies "approve" or "reject: reason". Hans resolves which run + gate the reply refers to using `workspace/pending_gates.json` (maintained by runner on each `gate_pending` event), then runs `agency approve ` or `agency reject --reason "..."` on Andrew's behalf. Hans confirms back: "✅ Approved — T3 spawning now." + +**Direct CLI:** +Andrew runs `agency approve ` from his terminal. Zero-friction when already at a machine. + +**`pending_gates.json` format:** +```json +{ + "gates": [ + { + "run_id": "abc123", + "gate": "t2_synthesis", + "pending_since": "2026-03-30T14:00:00Z", + "summary": "T2 synthesis ready — canonical architecture written" + } + ] +} +``` + +If only one gate is pending, Hans can resolve "approve" without an explicit run_id. If multiple are pending, Hans asks Andrew to specify. + +--- + +### T3 Mesh Timeout + +If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`: + +1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs. + +2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate. + +Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context. + +--- + ### Path Amendment Mechanism When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed: @@ -611,6 +672,12 @@ Run abc123 — "Build webhook ingestion system" **Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time. +**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access. + +**Gate approval UX** — Both Signal reply (Hans as bridge) and direct `agency approve` CLI are supported. Same blackboard write either way; runner doesn't care which path was used. Hans maintains `pending_gates.json` in workspace to resolve ambiguous replies when multiple gates are pending. Single pending gate → "approve" is unambiguous. + +**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback. + **T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free. **T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.