docs: resolve all design questions + visibility layer + portability audit
docs: resolve all design questions + visibility layer + portability audit
This commit is contained in:
2
agents
2
agents
Submodule agents updated: aacfb86196...5f1204a023
@@ -1,7 +1,7 @@
|
|||||||
# Tiered Agent Team System — Build Spec
|
# Tiered Agent Team System — Build Spec
|
||||||
|
|
||||||
_Started: 2026-03-15. Status: Pre-build._
|
_Started: 2026-03-15. Last updated: 2026-03-30._
|
||||||
_See agent-teams-design.md for the design doc and decisions log._
|
_See design.md for the design doc and decisions log._
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -40,7 +40,7 @@ agent-teams/
|
|||||||
│ │ ├── notify.py — abstract notification interface
|
│ │ ├── notify.py — abstract notification interface
|
||||||
│ │ └── runtime.py — abstract agent runtime interface
|
│ │ └── runtime.py — abstract agent runtime interface
|
||||||
│ ├── llm/
|
│ ├── llm/
|
||||||
│ │ ├── anthropic.py — Claude via OpenClaw or direct API
|
│ │ ├── anthropic.py — Claude via direct Anthropic API
|
||||||
│ │ ├── openai.py — GPT / o-series
|
│ │ ├── openai.py — GPT / o-series
|
||||||
│ │ └── ollama.py — local models
|
│ │ └── ollama.py — local models
|
||||||
│ ├── vcs/
|
│ ├── vcs/
|
||||||
@@ -68,6 +68,9 @@ agent-teams/
|
|||||||
│ ├── team.yaml — example run configuration
|
│ ├── team.yaml — example run configuration
|
||||||
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
|
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
|
||||||
│
|
│
|
||||||
|
├── cli/
|
||||||
|
│ └── agency.py — run, watch, inspect, approve, reject, pause, resume
|
||||||
|
│
|
||||||
├── runs/ — runtime state, one subdir per run_id
|
├── runs/ — runtime state, one subdir per run_id
|
||||||
│ └── .gitkeep
|
│ └── .gitkeep
|
||||||
│
|
│
|
||||||
@@ -131,12 +134,43 @@ CREATE TABLE events (
|
|||||||
event_id TEXT PRIMARY KEY,
|
event_id TEXT PRIMARY KEY,
|
||||||
run_id TEXT NOT NULL,
|
run_id TEXT NOT NULL,
|
||||||
brief_id TEXT,
|
brief_id TEXT,
|
||||||
kind TEXT NOT NULL, -- spawned | completed | failed | escalated | retried
|
kind TEXT NOT NULL, -- see event vocabulary below
|
||||||
detail TEXT, -- JSON
|
detail TEXT, -- JSON
|
||||||
created_at TEXT NOT NULL
|
created_at TEXT NOT NULL
|
||||||
);
|
);
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Event kind vocabulary:**
|
||||||
|
```
|
||||||
|
-- lifecycle
|
||||||
|
spawned | completed | failed | escalated | retried
|
||||||
|
|
||||||
|
-- visibility / gates
|
||||||
|
gate_pending -- runner hit an inspection gate, waiting for human
|
||||||
|
gate_approved -- human approved via CLI or notify
|
||||||
|
gate_rejected -- human rejected, tier re-invoked
|
||||||
|
gate_paused -- manual pause via CLI
|
||||||
|
gate_resumed -- manual resume via CLI
|
||||||
|
|
||||||
|
-- amendments / informational
|
||||||
|
path_amendment -- mid-run tier proposed a tier path change
|
||||||
|
log -- human-readable log line (detail: {level, message})
|
||||||
|
```
|
||||||
|
|
||||||
|
**t3_task_lists** *(T3 mesh coordination)*
|
||||||
|
```sql
|
||||||
|
CREATE TABLE t3_task_lists (
|
||||||
|
entry_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
workstream_id TEXT NOT NULL,
|
||||||
|
t3_agent_id TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- draft | committed
|
||||||
|
tasks TEXT NOT NULL, -- JSON array of proposed T4 task descriptors
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Task Brief Schema
|
## Task Brief Schema
|
||||||
@@ -283,6 +317,19 @@ retry_defaults:
|
|||||||
bad_output: 3
|
bad_output: 3
|
||||||
partial: 2
|
partial: 2
|
||||||
blocked: 0 # always escalate immediately
|
blocked: 0 # always escalate immediately
|
||||||
|
|
||||||
|
visibility:
|
||||||
|
strict_mode: false # true = all gates on (recommended for first runs)
|
||||||
|
log_level: normal # normal | verbose (verbose = per-T4 start/done lines)
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists spawn
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no human response within this window
|
||||||
|
|
||||||
|
t3_mesh_timeout_minutes: 10 # max time for T3s to commit task lists before runner escalates
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -338,7 +385,7 @@ t5:
|
|||||||
### 1. Run Kickoff
|
### 1. Run Kickoff
|
||||||
|
|
||||||
```
|
```
|
||||||
User → Hans → team_runner.start(goal, config)
|
User → team_runner.start(goal, config) # via CLI or any caller
|
||||||
→ generate run_id
|
→ generate run_id
|
||||||
→ init blackboard (create runs/<run_id>/blackboard.db)
|
→ init blackboard (create runs/<run_id>/blackboard.db)
|
||||||
→ build T1 brief (goal_anchor = goal, retry_budget from config)
|
→ build T1 brief (goal_anchor = goal, retry_budget from config)
|
||||||
@@ -388,7 +435,29 @@ spawn T4 with brief
|
|||||||
→ notify T3
|
→ notify T3
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Review Gate
|
### 4. Inspection Gate Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
runner reaches configured gate (e.g. t2_synthesis)
|
||||||
|
→ write event(gate_pending, detail={tier, summary, what_happens_next})
|
||||||
|
→ notify_adapter.send(tier summary + gate context)
|
||||||
|
→ halt: poll blackboard for gate_approved or gate_rejected
|
||||||
|
|
||||||
|
gate_approved:
|
||||||
|
→ write event(gate_approved)
|
||||||
|
→ continue run
|
||||||
|
|
||||||
|
gate_rejected:
|
||||||
|
→ write event(gate_rejected, detail={reason})
|
||||||
|
→ re-invoke tier with rejection reason in brief context
|
||||||
|
→ loop back to gate_pending when tier completes again
|
||||||
|
|
||||||
|
gate_timeout (gate_timeout_minutes elapsed):
|
||||||
|
→ treat as gate_rejected
|
||||||
|
→ notify Andrew: "Gate timed out, re-invoking tier"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Review Gate
|
||||||
|
|
||||||
```
|
```
|
||||||
T1 completes integration
|
T1 completes integration
|
||||||
@@ -412,19 +481,20 @@ T1 completes integration
|
|||||||
|
|
||||||
1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
|
1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
|
||||||
2. `config/role_registry.yaml` — map tier+domain → agent personality files
|
2. `config/role_registry.yaml` — map tier+domain → agent personality files
|
||||||
3. `core/task_brief.py` — schema + validation (everything depends on this)
|
3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
|
||||||
4. `core/blackboard.py` — SQLite store, all table definitions
|
4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
|
||||||
5. `adapters/base/*` — all four abstract interfaces
|
5. `adapters/base/*` — all four abstract interfaces
|
||||||
6. `adapters/llm/anthropic.py` — first LLM implementation
|
6. `adapters/llm/anthropic.py` — first LLM implementation
|
||||||
7. `core/escalation.py` — retry + failure routing logic
|
7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
|
||||||
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
|
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
|
||||||
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
|
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
|
||||||
10. `core/team_runner.py` — full run lifecycle, runtime + personality selection
|
10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
|
||||||
11. `prompts/` — fallback tier prompts (used when no agent_personality set)
|
11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
|
||||||
12. `adapters/vcs/github.py` — PR creation + branch management
|
12. `prompts/` — fallback tier prompts (used when no agent_personality set)
|
||||||
13. `adapters/notify/openclaw.py` — Hans notification
|
13. `adapters/vcs/github.py` — PR creation + branch management
|
||||||
14. `config/team.yaml` — example config
|
14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
|
||||||
15. `README.md` — how to run, how to add adapters, how to extend the roster
|
15. `config/team.yaml` — example config with full visibility block
|
||||||
|
16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
416
docs/design.md
416
docs/design.md
@@ -1,22 +1,28 @@
|
|||||||
# Tiered Agent Team System — Design Document
|
# Tiered Agent Team System — Design Document
|
||||||
|
|
||||||
_Started: 2026-03-14. Last updated: 2026-03-16 (evening)._
|
_Started: 2026-03-14. Last updated: 2026-03-30._
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Open Design Questions
|
## Resolved Design Decisions (formerly Open Questions)
|
||||||
|
|
||||||
The following areas are identified but not yet resolved. Work through these before implementing `core/team_runner.py`.
|
All eight open questions resolved 2026-03-30. Details in Decisions Log.
|
||||||
|
|
||||||
1. **T3 mesh mechanics** — How do T3s within the same T2 domain coordinate? Via blackboard, direct message exchange, or a designated T3 lead? What does "negotiate task boundaries" look like concretely?
|
1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
|
||||||
|
|
||||||
2. **T1 output schema** — What does T1's Plan phase output look like as structured data? Needs a formal schema: workstreams, tier paths, parallelism flags, retry budget, T2 specialist list. This is what the runner parses to bootstrap the pipeline.
|
2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
|
||||||
|
|
||||||
3. **T5 consensus mechanics** — Individual T5s review their slice and produce results. Who aggregates? What does the joint verdict look like as structured data? What happens on split verdict (some T5s pass, some fail)?
|
3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
|
||||||
|
|
||||||
4. **Path amendment mechanism** — When a mid-run tier proposes a path amendment, what's the concrete mechanism? Who writes to the blackboard, in what format, and how does the relevant higher tier get notified?
|
4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
|
||||||
|
|
||||||
5. **Failure handling (distributed model)** — The current failure table assumes centralised runner handling. Needs to be rewritten to reflect distributed ownership: T3 handles T4 failures, T2 handles T3 failures, T1 handles T2 failures. Runner only handles T1 failure and terminal escalation to human.
|
5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
|
||||||
|
|
||||||
|
6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
|
||||||
|
|
||||||
|
7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
|
||||||
|
|
||||||
|
8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -167,6 +173,249 @@ T1 — Phase 1: Plan (self-critique → Andrew approval)
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Use Case Flows
|
||||||
|
|
||||||
|
T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
|
||||||
|
|
||||||
|
### Full Stack — T1→T2→T3→T4→T5
|
||||||
|
*Complex feature, new product, cross-domain changes*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess complexity (high)
|
||||||
|
→ output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
|
||||||
|
→ self-critique pass
|
||||||
|
→ GATE: surface to Andrew ← approval required
|
||||||
|
|
||||||
|
T2 Lead (spawned by runner after approval)
|
||||||
|
→ receive: goal + full workplan
|
||||||
|
→ publish: domain boundaries + shared assumptions doc → blackboard
|
||||||
|
→ GATE (optional): review boundaries before specialists spawn
|
||||||
|
|
||||||
|
T2 Specialists (parallel fan-out, wait on Lead)
|
||||||
|
→ each receives: their domain boundary + shared assumptions
|
||||||
|
→ produce: architecture proposal for their slice
|
||||||
|
→ Lead synthesises, drives conflict resolution if needed
|
||||||
|
→ Lead writes: canonical architecture → blackboard
|
||||||
|
→ GATE (recommended): review architecture before implementation
|
||||||
|
|
||||||
|
Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
|
||||||
|
|
||||||
|
T3s (light mesh within T2 domain)
|
||||||
|
→ write draft task lists to blackboard
|
||||||
|
→ read peers' lists, reconcile boundaries
|
||||||
|
→ commit merged task plan before T4 dispatch
|
||||||
|
→ GATE (optional): review task breakdown
|
||||||
|
|
||||||
|
T4s
|
||||||
|
→ swarm: independent tasks run in parallel
|
||||||
|
→ pipeline: T4-A output feeds T4-B (T3 declares dependencies)
|
||||||
|
→ commit to feature branches
|
||||||
|
|
||||||
|
T5s (fan-out per T4 slice)
|
||||||
|
→ each reviews its slice independently
|
||||||
|
→ T3 collects results → joint verdict
|
||||||
|
→ GATE (optional): review T5 verdict before T3 marks done
|
||||||
|
→ partial: T3 retries only failed slices
|
||||||
|
→ pass: T3 signals workstream done to T2
|
||||||
|
|
||||||
|
T2 specialists → signal T2 Lead
|
||||||
|
T2 Lead → writes integration summary → blackboard
|
||||||
|
|
||||||
|
T1 Accept
|
||||||
|
→ validate against goal anchor
|
||||||
|
→ open PR, notify_adapter.send(pr summary + url)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Medium Complexity — T1→T3→T4→T5
|
||||||
|
*Config change, isolated bug fix — T1 determines no cross-domain design needed*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: contained scope, single domain, no T2 architecture needed
|
||||||
|
→ workplan: tier paths [T3, T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T3s spawned directly by runner
|
||||||
|
→ receives T1 brief with task context (no T2 architecture layer)
|
||||||
|
→ T3 light mesh → T4 dispatch → T5 verify → signal done
|
||||||
|
|
||||||
|
T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
### Simple / Hotfix — T1→T4→T5
|
||||||
|
*Single file, single function, trivial atomic task*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: trivial, single workstream
|
||||||
|
→ tier path: [T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T4 (coding agent)
|
||||||
|
→ single atomic task, commits
|
||||||
|
|
||||||
|
T5 (single verifier, not full fan-out)
|
||||||
|
→ code review + correctness check
|
||||||
|
→ pass → T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Mechanics
|
||||||
|
|
||||||
|
### T3 Mesh via Blackboard
|
||||||
|
|
||||||
|
T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
|
||||||
|
|
||||||
|
1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
|
||||||
|
2. Each T3 reads all sibling T3 draft lists in its T2 domain
|
||||||
|
3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
|
||||||
|
4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
|
||||||
|
5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
|
||||||
|
|
||||||
|
The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T1 Plan Output Schema
|
||||||
|
|
||||||
|
T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"run_id": "uuid",
|
||||||
|
"goal_anchor": "Original goal — immutable, propagated to every downstream brief",
|
||||||
|
"complexity": "high | medium | low",
|
||||||
|
"retry_budget_multiplier": 2,
|
||||||
|
"workstreams": [
|
||||||
|
{
|
||||||
|
"id": "ws-backend-api",
|
||||||
|
"name": "Backend API",
|
||||||
|
"domain": "backend",
|
||||||
|
"tier_path": ["t2", "t3", "t4", "t5"],
|
||||||
|
"parallel_group": "A",
|
||||||
|
"t2_specialist": "agents/engineering/engineering-software-architect.md",
|
||||||
|
"notes": "Focus on webhook ingest and retry queue"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"parallelism": {
|
||||||
|
"groups": {
|
||||||
|
"A": ["ws-backend-api", "ws-frontend"],
|
||||||
|
"B": ["ws-infra"]
|
||||||
|
},
|
||||||
|
"sequence": ["A", "B"]
|
||||||
|
},
|
||||||
|
"self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T5 Consensus & Verdict Schema
|
||||||
|
|
||||||
|
T3 aggregates all T5 results into a joint verdict after fan-out completes.
|
||||||
|
|
||||||
|
**Individual T5 result:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"verifier_id": "uuid",
|
||||||
|
"scope": "queue-client",
|
||||||
|
"verdict": "pass | fail",
|
||||||
|
"issues": ["issue description..."],
|
||||||
|
"notes": "human-readable summary"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**T3 joint verdict (written to blackboard):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"t5_results": [...],
|
||||||
|
"joint_verdict": "pass | partial | fail",
|
||||||
|
"failed_scopes": ["queue-client"],
|
||||||
|
"summary": "Human-readable summary for gate surface and logs"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Split verdict handling:**
|
||||||
|
- `pass` → T3 marks workstream done, signals T2
|
||||||
|
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
|
||||||
|
- `fail` → T3 escalates to T2 (or T1 if shallow path)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Spawn Call Ownership
|
||||||
|
|
||||||
|
The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
|
||||||
|
2. Runner's spawn loop detects pending rows
|
||||||
|
3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
|
||||||
|
4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
|
||||||
|
5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
|
||||||
|
|
||||||
|
This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Gate Approval UX
|
||||||
|
|
||||||
|
**Core mechanic (platform-agnostic):**
|
||||||
|
|
||||||
|
1. Runner writes `gate_pending` to blackboard
|
||||||
|
2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
|
||||||
|
3. Runner polls blackboard for `gate_approved` or `gate_rejected`
|
||||||
|
4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
|
||||||
|
|
||||||
|
Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
|
||||||
|
|
||||||
|
**Adapter responsibility:**
|
||||||
|
Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
|
||||||
|
|
||||||
|
Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T3 Mesh Timeout
|
||||||
|
|
||||||
|
If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
|
||||||
|
|
||||||
|
1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
|
||||||
|
|
||||||
|
2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
|
||||||
|
|
||||||
|
Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Path Amendment Mechanism
|
||||||
|
|
||||||
|
When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
|
||||||
|
|
||||||
|
1. The discovering tier writes a `path_amendment` event to the blackboard:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"kind": "path_amendment",
|
||||||
|
"proposed_by": "t3/ws-backend-api",
|
||||||
|
"reason": "Discovered auth dependency requires T2 architectural pass",
|
||||||
|
"amendment": {
|
||||||
|
"workstream": "ws-backend-api",
|
||||||
|
"add_tiers": ["t2"],
|
||||||
|
"insert_before": "t3"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
|
||||||
|
3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
|
||||||
|
4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
|
||||||
|
|
||||||
|
No agent needs callback plumbing. The runner is the notification bridge.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Shared State
|
## Shared State
|
||||||
|
|
||||||
For software pipelines, **the repo is the primary blackboard**:
|
For software pipelines, **the repo is the primary blackboard**:
|
||||||
@@ -185,14 +434,21 @@ Supplemented by a SQLite coordination store per run tracking:
|
|||||||
|
|
||||||
## Failure Handling
|
## Failure Handling
|
||||||
|
|
||||||
| Failure | Handler | Action |
|
Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
|
||||||
|---------|---------|--------|
|
|
||||||
| T4 bad output | T3 | Retry T4 with corrected brief (up to retry_budget) |
|
| Failure | Owner | Handler | Action |
|
||||||
| T4 blocked | T3 | Escalate immediately — no retries |
|
|---------|-------|---------|--------|
|
||||||
| T4 partial output | T3 | Salvage good parts, re-task remainder |
|
| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
|
||||||
| T3 workstream stuck | T2 | Re-scope or split the workstream |
|
| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
|
||||||
| T2 design wrong | T1 | Re-plan; may discard workstream and restart |
|
| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
|
||||||
| Repeated escalation | Surface to user | Block until human unblocks |
|
| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
|
||||||
|
| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
|
||||||
|
| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
|
||||||
|
| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
|
||||||
|
| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
|
||||||
|
| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
|
||||||
|
|
||||||
|
**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
|
||||||
|
|
||||||
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
||||||
|
|
||||||
@@ -264,6 +520,114 @@ T4 and T5 default to the **coding agent runtime** when available. Falls back to
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Run Visibility Layer
|
||||||
|
|
||||||
|
Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
|
||||||
|
|
||||||
|
### 1. Human-Readable Live Log
|
||||||
|
|
||||||
|
Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
|
||||||
|
|
||||||
|
```
|
||||||
|
[abc123] 12:30:01 T1 PLAN_START Assessing scope: "Build webhook ingestion system"
|
||||||
|
[abc123] 12:30:14 T1 PLAN_DONE 3 workstreams — backend-api, infra, docs (2 parallel)
|
||||||
|
[abc123] 12:30:14 GATE APPROVAL ⏸ Waiting on approval before T2 spawns
|
||||||
|
[abc123] 12:31:02 GATE APPROVED ✓ Approved — continuing
|
||||||
|
[abc123] 12:31:03 T2 LEAD_START Lead Architect spawned
|
||||||
|
[abc123] 12:31:41 T2 BOUNDS_READY Domain boundaries + shared assumptions published
|
||||||
|
[abc123] 12:31:42 T2 SPEC_START 3 specialists spawned (parallel): backend, infra, docs
|
||||||
|
[abc123] 12:32:15 T2 SPEC_DONE backend-api architecture draft ready
|
||||||
|
[abc123] 12:32:58 T2 SYNTH_DONE Canonical architecture written to blackboard
|
||||||
|
[abc123] 12:32:58 GATE INSPECTION ⏸ T2 synthesis ready for review
|
||||||
|
[abc123] 12:33:44 T3 MESH_START backend-api: 2 squad leads negotiating task boundaries
|
||||||
|
[abc123] 12:34:01 T3 MESH_DONE Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
|
||||||
|
[abc123] 12:34:02 T4 SWARM_START 5 workers spawned in parallel
|
||||||
|
[abc123] 12:35:10 T4 DONE worker-3 auth-middleware ✓
|
||||||
|
[abc123] 12:35:22 T4 FAIL worker-4 queue-client ✗ (retry 1/3)
|
||||||
|
[abc123] 12:36:04 T4 DONE worker-4 queue-client ✓ (retry resolved)
|
||||||
|
[abc123] 12:36:05 T5 VERIFY_START 4 verifiers spawned
|
||||||
|
[abc123] 12:36:45 T5 VERDICT partial — queue-client needs rework
|
||||||
|
[abc123] 12:37:12 T5 VERDICT ✓ all pass — workstream backend-api done
|
||||||
|
```
|
||||||
|
|
||||||
|
Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
|
||||||
|
|
||||||
|
### 2. Inspection Gates
|
||||||
|
|
||||||
|
Configurable pause points. When the runner hits a gate, it:
|
||||||
|
1. Writes a `gate_pending` event to the blackboard
|
||||||
|
2. Fires `notify_adapter.send()` with the tier summary + gate context
|
||||||
|
3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
|
||||||
|
|
||||||
|
The tier summary surfaced at each gate includes:
|
||||||
|
- **What was produced** (the tier artifact in readable form)
|
||||||
|
- **What happens next** (which agents will spawn, doing what)
|
||||||
|
- **Any anomalies** flagged by the tier itself
|
||||||
|
|
||||||
|
Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
visibility:
|
||||||
|
strict_mode: false
|
||||||
|
log_level: normal # normal | verbose
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no response within this window
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Inspection CLI — `cli/agency.py`
|
||||||
|
|
||||||
|
```
|
||||||
|
agency run <config.yaml> # start a run, returns run_id
|
||||||
|
agency watch <run_id> # tail live log (follows blackboard events)
|
||||||
|
agency inspect <run_id> # interactive tree view of run state
|
||||||
|
agency inspect <run_id> --tier t2 # jump to T2 artifacts
|
||||||
|
agency inspect <run_id> --brief <id> # show full brief + result JSON
|
||||||
|
|
||||||
|
agency approve <run_id> # approve current gate → continue
|
||||||
|
agency approve <run_id> --note "..." # approve with a note written to blackboard
|
||||||
|
agency reject <run_id> --reason "..." # reject → tier re-invoked
|
||||||
|
agency pause <run_id> # force-pause at next tier boundary
|
||||||
|
agency resume <run_id> # release a manual pause
|
||||||
|
```
|
||||||
|
|
||||||
|
`agency inspect` (no flags) renders a live tree:
|
||||||
|
```
|
||||||
|
Run abc123 — "Build webhook ingestion system"
|
||||||
|
├── T1 Plan ✓
|
||||||
|
│ └── [view workplan]
|
||||||
|
├── T2 Architecture ✓ [GATE: pending review]
|
||||||
|
│ ├── [view domain boundaries]
|
||||||
|
│ ├── [view shared assumptions]
|
||||||
|
│ └── [view canonical architecture]
|
||||||
|
├── T3 backend-api (active)
|
||||||
|
│ ├── [view task breakdown]
|
||||||
|
│ └── T4 workers: 3/7 done, 1 retrying, 3 pending
|
||||||
|
└── T3 infra (pending)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Blackboard Event Vocabulary (extended)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# existing
|
||||||
|
"spawned" | "completed" | "failed" | "escalated" | "retried"
|
||||||
|
|
||||||
|
# new — visibility layer
|
||||||
|
"gate_pending" # runner hit a gate, waiting for human
|
||||||
|
"gate_approved" # human approved, run continues
|
||||||
|
"gate_rejected" # human rejected, tier re-invoked
|
||||||
|
"gate_paused" # manual pause via CLI
|
||||||
|
"gate_resumed" # manual resume via CLI
|
||||||
|
"path_amendment" # mid-run tier proposed path change
|
||||||
|
"log" # human-readable log line (level + message)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Decisions Log
|
## Decisions Log
|
||||||
|
|
||||||
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
||||||
@@ -286,7 +650,7 @@ T4 and T5 default to the **coding agent runtime** when available. Falls back to
|
|||||||
|
|
||||||
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
|
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
|
||||||
|
|
||||||
**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off.
|
**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
|
||||||
|
|
||||||
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
||||||
|
|
||||||
@@ -297,3 +661,21 @@ T4 and T5 default to the **coding agent runtime** when available. Falls back to
|
|||||||
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
||||||
|
|
||||||
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|
||||||
|
|
||||||
|
**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
|
||||||
|
|
||||||
|
**Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
|
||||||
|
|
||||||
|
**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
|
||||||
|
|
||||||
|
**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
|
||||||
|
|
||||||
|
**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
|
||||||
|
|
||||||
|
**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
|
||||||
|
|
||||||
|
**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
|
||||||
|
|
||||||
|
**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
|
||||||
|
|
||||||
|
**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
|
||||||
|
|||||||
Reference in New Issue
Block a user