diff --git a/docs/buildspec.md b/docs/buildspec.md index c462bcb..d4152aa 100644 --- a/docs/buildspec.md +++ b/docs/buildspec.md @@ -40,7 +40,7 @@ agent-teams/ │ │ ├── notify.py — abstract notification interface │ │ └── runtime.py — abstract agent runtime interface │ ├── llm/ -│ │ ├── anthropic.py — Claude via OpenClaw or direct API +│ │ ├── anthropic.py — Claude via direct Anthropic API │ │ ├── openai.py — GPT / o-series │ │ └── ollama.py — local models │ ├── vcs/ @@ -74,8 +74,6 @@ agent-teams/ ├── runs/ — runtime state, one subdir per run_id │ └── .gitkeep │ -├── pending_gates.json — live file: gates currently awaiting approval (written by runner, read by Hans) -│ └── README.md ``` @@ -387,7 +385,7 @@ t5: ### 1. Run Kickoff ``` -User → Hans → team_runner.start(goal, config) +User → team_runner.start(goal, config) # via CLI or any caller → generate run_id → init blackboard (create runs//blackboard.db) → build T1 brief (goal_anchor = goal, retry_budget from config) @@ -442,7 +440,7 @@ spawn T4 with brief ``` runner reaches configured gate (e.g. t2_synthesis) → write event(gate_pending, detail={tier, summary, what_happens_next}) - → notify_adapter.send(tier summary to Andrew via Hans) + → notify_adapter.send(tier summary + gate context) → halt: poll blackboard for gate_approved or gate_rejected gate_approved: @@ -490,11 +488,11 @@ T1 completes integration 7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally) 8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection 9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt -10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, writes pending_gates.json, gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only +10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only 11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree 12. `prompts/` — fallback tier prompts (used when no agent_personality set) 13. `adapters/vcs/github.py` — PR creation + branch management -14. `adapters/notify/openclaw.py` — Hans notification; used for gate surfaces (tier summary to Andrew) +14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing 15. `config/team.yaml` — example config with full visibility block 16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference diff --git a/docs/design.md b/docs/design.md index 9e28b9e..4b18256 100644 --- a/docs/design.md +++ b/docs/design.md @@ -20,7 +20,7 @@ All eight open questions resolved 2026-03-30. Details in Decisions Log. 6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop. -7. **Gate approval UX** → Both Signal reply (via Hans) and direct CLI are supported — both write to the same blackboard. Runner only cares that a `gate_approved` event exists, not who wrote it. Hans maintains `pending_gates.json` in workspace for multi-run disambiguation. +7. **Gate approval UX** → `agency approve ` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern. 8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch). @@ -224,7 +224,7 @@ T2 Lead → writes integration summary → blackboard T1 Accept → validate against goal anchor - → open PR, notify Andrew via Hans + → open PR, notify_adapter.send(pr summary + url) ``` ### Medium Complexity — T1→T3→T4→T5 @@ -363,29 +363,19 @@ This keeps gate logic in one place (the runner's spawn loop), makes all spawn ca ### Gate Approval UX -Two paths, both valid, same outcome — runner only cares that a `gate_approved` event exists in the blackboard: +**Core mechanic (platform-agnostic):** -**Signal (via Hans):** -Andrew receives the tier summary from Hans in Signal. Replies "approve" or "reject: reason". Hans resolves which run + gate the reply refers to using `workspace/pending_gates.json` (maintained by runner on each `gate_pending` event), then runs `agency approve ` or `agency reject --reason "..."` on Andrew's behalf. Hans confirms back: "✅ Approved — T3 spawning now." +1. Runner writes `gate_pending` to blackboard +2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`) +3. Runner polls blackboard for `gate_approved` or `gate_rejected` +4. `agency approve ` / `agency reject --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access -**Direct CLI:** -Andrew runs `agency approve ` from his terminal. Zero-friction when already at a machine. +Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard. -**`pending_gates.json` format:** -```json -{ - "gates": [ - { - "run_id": "abc123", - "gate": "t2_synthesis", - "pending_since": "2026-03-30T14:00:00Z", - "summary": "T2 synthesis ready — canonical architecture written" - } - ] -} -``` +**Adapter responsibility:** +Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard. -If only one gate is pending, Hans can resolve "approve" without an explicit run_id. If multiple are pending, Hans asks Andrew to specify. +Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core. --- @@ -566,7 +556,7 @@ Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-leve Configurable pause points. When the runner hits a gate, it: 1. Writes a `gate_pending` event to the blackboard -2. Fires `notify_adapter.send()` with a tier summary to Andrew (via Hans) +2. Fires `notify_adapter.send()` with the tier summary + gate context 3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written The tier summary surfaced at each gate includes: @@ -660,7 +650,7 @@ Run abc123 — "Build webhook ingestion system" **Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus. -**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off. +**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered. **Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported. @@ -674,7 +664,7 @@ Run abc123 — "Build webhook ingestion system" **Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access. -**Gate approval UX** — Both Signal reply (Hans as bridge) and direct `agency approve` CLI are supported. Same blackboard write either way; runner doesn't care which path was used. Hans maintains `pending_gates.json` in workspace to resolve ambiguous replies when multiple gates are pending. Single pending gate → "approve" is unambiguous. +**Gate approval UX** — `agency approve ` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic. **T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback. @@ -688,4 +678,4 @@ Run abc123 — "Build webhook ingestion system" **Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table. -**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary to Andrew via Hans. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets Andrew review joint verdict before T3 marks workstream done. +**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.