- Resolve T3 mesh mechanics: blackboard-based draft/commit cycle - Resolve T1 plan output schema: formal JSON structure with workstreams + parallelism groups - Resolve T5 consensus: T3 aggregates joint verdict (pass/partial/fail), partial retries failed slices only - Resolve path amendment mechanism: event-based, runner notifies higher tier, no approval gate - Resolve failure handling: confirmed distributed ownership, runner owns T1 + terminal only Add run visibility layer: - Human-readable live log (normal + verbose modes) - Configurable inspection gates (t1_plan always, t2_synthesis recommended, others optional) - strict_mode flag for full gating on early runs - cli/agency.py: run, watch, inspect, approve, reject, pause, resume - gate_pending halt loop in team_runner, gate_approved/rejected resume - Expanded blackboard event vocabulary (gate_*, path_amendment, log) - t3_task_lists table for mesh coordination state - Inspection gate flow added to buildspec Key Flows Build order updated: 16 steps (added cli/ step, clarified runner gate responsibilities)
625 lines
30 KiB
Markdown
625 lines
30 KiB
Markdown
# Tiered Agent Team System — Design Document
|
|
|
|
_Started: 2026-03-14. Last updated: 2026-03-30._
|
|
|
|
---
|
|
|
|
## Resolved Design Decisions (formerly Open Questions)
|
|
|
|
All five open questions resolved 2026-03-30. Details in Decisions Log.
|
|
|
|
1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
|
|
|
|
2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
|
|
|
|
3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
|
|
|
|
4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
|
|
|
|
5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
|
|
|
|
---
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
|
|
|
|
---
|
|
|
|
## Core Principles
|
|
|
|
**1. Tiers represent cognitive modes, not org chart levels.**
|
|
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
|
|
|
|
**2. Depth is proportional to complexity.**
|
|
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
|
|
|
|
**3. Goal anchoring at every level.**
|
|
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
|
|
|
|
**4. Artifacts, not summaries.**
|
|
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
|
|
|
|
**5. Verification is mandatory.**
|
|
T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
|
|
|
|
**6. Provider agnostic.**
|
|
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
|
|
|
|
**7. Specialist talent pool.**
|
|
Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
|
|
|
|
---
|
|
|
|
## Tier Definitions
|
|
|
|
| Tier | Role | Owns | Capability Level |
|
|
|------|------|------|-----------------|
|
|
| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
|
|
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
|
|
| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
|
|
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
|
|
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
|
|
|
|
T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
|
|
|
|
Capability levels map to actual models per provider in config — the core system never references a specific model name.
|
|
|
|
---
|
|
|
|
## Dispatch Model
|
|
|
|
### T1 Owns the Plan
|
|
|
|
T1 is not just a decomposer — it is the dispatch planner. Its output declares:
|
|
|
|
- **Workstreams** — the decomposed units of work
|
|
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
|
|
- **Parallelism** — which workstreams are independent and can run concurrently
|
|
|
|
T1 does not prescribe how each tier operates internally. That is the tier's own concern.
|
|
|
|
### T1 Lifecycle — Two Explicit Phases
|
|
|
|
T1 is invoked twice per run, each with a distinct prompt and purpose:
|
|
|
|
**Phase 1 — Plan:**
|
|
1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
|
|
2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
|
|
3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
|
|
|
|
**Phase 2 — Accept:**
|
|
After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
|
|
|
|
Both phases are named explicitly in the task brief schema and tracked on the blackboard.
|
|
|
|
### Each Tier Owns the Layer Below
|
|
|
|
Control flow is distributed, not centralised:
|
|
|
|
- T1 manages its T2s
|
|
- T2 Lead manages T2 specialists and their domain boundaries
|
|
- T2 specialists each own their T3s
|
|
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
|
|
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
|
|
|
|
This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
|
|
|
|
**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
|
|
|
|
### Dynamic Paths
|
|
|
|
Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
|
|
|
|
---
|
|
|
|
## Orchestration Patterns Per Tier
|
|
|
|
Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
|
|
|
|
| Tier | Pattern | Rationale |
|
|
|------|---------|-----------|
|
|
| T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
|
|
| T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
|
|
| T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
|
|
| T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
|
|
| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
|
|
| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
|
|
|
|
### T2 Flow in Detail
|
|
|
|
1. T1 spawns **T2 Lead Architect** with goal + workstream context
|
|
2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
|
|
3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
|
|
4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
|
|
5. Specialists work in parallel, each within their defined domain
|
|
6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
|
|
7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
|
|
8. T1 (Accept phase) validates canonical architecture against goal anchor
|
|
9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
|
|
|
|
---
|
|
|
|
## Horizontal Scaling Within Tiers
|
|
|
|
```
|
|
T1 — Phase 1: Plan (self-critique → Andrew approval)
|
|
│
|
|
├── T2: Lead Architect (boundaries + shared assumptions first)
|
|
│ ├── T2: Backend Architect ─┐
|
|
│ ├── T2: Frontend Architect ├─ parallel, within defined domains
|
|
│ └── T2: Infra Architect ─┘
|
|
│ │
|
|
│ └── (Lead synthesises → conflict resolution if needed → canonical architecture)
|
|
│
|
|
├── T2 Backend Architect owns:
|
|
│ ├── T3: API Squad Lead ─┐
|
|
│ └── T3: DB Squad Lead ─┴─ light mesh within domain
|
|
│ ├── T4: Worker A ─┐
|
|
│ ├── T4: Worker B ─┼─ swarm / pipeline (T3 decides)
|
|
│ └── T4: Worker C ─┘
|
|
│ └── T5: Verifier(s) — fan-out + consensus
|
|
│
|
|
└── T1 — Phase 2: Accept (validates against goal anchor → PR)
|
|
```
|
|
|
|
---
|
|
|
|
## Use Case Flows
|
|
|
|
T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
|
|
|
|
### Full Stack — T1→T2→T3→T4→T5
|
|
*Complex feature, new product, cross-domain changes*
|
|
|
|
```
|
|
T1 Plan
|
|
→ assess complexity (high)
|
|
→ output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
|
|
→ self-critique pass
|
|
→ GATE: surface to Andrew ← approval required
|
|
|
|
T2 Lead (spawned by runner after approval)
|
|
→ receive: goal + full workplan
|
|
→ publish: domain boundaries + shared assumptions doc → blackboard
|
|
→ GATE (optional): review boundaries before specialists spawn
|
|
|
|
T2 Specialists (parallel fan-out, wait on Lead)
|
|
→ each receives: their domain boundary + shared assumptions
|
|
→ produce: architecture proposal for their slice
|
|
→ Lead synthesises, drives conflict resolution if needed
|
|
→ Lead writes: canonical architecture → blackboard
|
|
→ GATE (recommended): review architecture before implementation
|
|
|
|
Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
|
|
|
|
T3s (light mesh within T2 domain)
|
|
→ write draft task lists to blackboard
|
|
→ read peers' lists, reconcile boundaries
|
|
→ commit merged task plan before T4 dispatch
|
|
→ GATE (optional): review task breakdown
|
|
|
|
T4s
|
|
→ swarm: independent tasks run in parallel
|
|
→ pipeline: T4-A output feeds T4-B (T3 declares dependencies)
|
|
→ commit to feature branches
|
|
|
|
T5s (fan-out per T4 slice)
|
|
→ each reviews its slice independently
|
|
→ T3 collects results → joint verdict
|
|
→ GATE (optional): review T5 verdict before T3 marks done
|
|
→ partial: T3 retries only failed slices
|
|
→ pass: T3 signals workstream done to T2
|
|
|
|
T2 specialists → signal T2 Lead
|
|
T2 Lead → writes integration summary → blackboard
|
|
|
|
T1 Accept
|
|
→ validate against goal anchor
|
|
→ open PR, notify Andrew via Hans
|
|
```
|
|
|
|
### Medium Complexity — T1→T3→T4→T5
|
|
*Config change, isolated bug fix — T1 determines no cross-domain design needed*
|
|
|
|
```
|
|
T1 Plan
|
|
→ assess: contained scope, single domain, no T2 architecture needed
|
|
→ workplan: tier paths [T3, T4, T5]
|
|
→ GATE: Andrew approval
|
|
|
|
T3s spawned directly by runner
|
|
→ receives T1 brief with task context (no T2 architecture layer)
|
|
→ T3 light mesh → T4 dispatch → T5 verify → signal done
|
|
|
|
T1 Accept → PR
|
|
```
|
|
|
|
### Simple / Hotfix — T1→T4→T5
|
|
*Single file, single function, trivial atomic task*
|
|
|
|
```
|
|
T1 Plan
|
|
→ assess: trivial, single workstream
|
|
→ tier path: [T4, T5]
|
|
→ GATE: Andrew approval
|
|
|
|
T4 (coding agent)
|
|
→ single atomic task, commits
|
|
|
|
T5 (single verifier, not full fan-out)
|
|
→ code review + correctness check
|
|
→ pass → T1 Accept → PR
|
|
```
|
|
|
|
---
|
|
|
|
## Resolved Mechanics
|
|
|
|
### T3 Mesh via Blackboard
|
|
|
|
T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
|
|
|
|
1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
|
|
2. Each T3 reads all sibling T3 draft lists in its T2 domain
|
|
3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
|
|
4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
|
|
5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
|
|
|
|
The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
|
|
|
|
---
|
|
|
|
### T1 Plan Output Schema
|
|
|
|
T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
|
|
|
|
```json
|
|
{
|
|
"run_id": "uuid",
|
|
"goal_anchor": "Original goal — immutable, propagated to every downstream brief",
|
|
"complexity": "high | medium | low",
|
|
"retry_budget_multiplier": 2,
|
|
"workstreams": [
|
|
{
|
|
"id": "ws-backend-api",
|
|
"name": "Backend API",
|
|
"domain": "backend",
|
|
"tier_path": ["t2", "t3", "t4", "t5"],
|
|
"parallel_group": "A",
|
|
"t2_specialist": "agents/engineering/engineering-software-architect.md",
|
|
"notes": "Focus on webhook ingest and retry queue"
|
|
}
|
|
],
|
|
"parallelism": {
|
|
"groups": {
|
|
"A": ["ws-backend-api", "ws-frontend"],
|
|
"B": ["ws-infra"]
|
|
},
|
|
"sequence": ["A", "B"]
|
|
},
|
|
"self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
|
|
}
|
|
```
|
|
|
|
`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
|
|
|
|
---
|
|
|
|
### T5 Consensus & Verdict Schema
|
|
|
|
T3 aggregates all T5 results into a joint verdict after fan-out completes.
|
|
|
|
**Individual T5 result:**
|
|
```json
|
|
{
|
|
"verifier_id": "uuid",
|
|
"scope": "queue-client",
|
|
"verdict": "pass | fail",
|
|
"issues": ["issue description..."],
|
|
"notes": "human-readable summary"
|
|
}
|
|
```
|
|
|
|
**T3 joint verdict (written to blackboard):**
|
|
```json
|
|
{
|
|
"t5_results": [...],
|
|
"joint_verdict": "pass | partial | fail",
|
|
"failed_scopes": ["queue-client"],
|
|
"summary": "Human-readable summary for gate surface and logs"
|
|
}
|
|
```
|
|
|
|
**Split verdict handling:**
|
|
- `pass` → T3 marks workstream done, signals T2
|
|
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
|
|
- `fail` → T3 escalates to T2 (or T1 if shallow path)
|
|
|
|
---
|
|
|
|
### Path Amendment Mechanism
|
|
|
|
When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
|
|
|
|
1. The discovering tier writes a `path_amendment` event to the blackboard:
|
|
```json
|
|
{
|
|
"kind": "path_amendment",
|
|
"proposed_by": "t3/ws-backend-api",
|
|
"reason": "Discovered auth dependency requires T2 architectural pass",
|
|
"amendment": {
|
|
"workstream": "ws-backend-api",
|
|
"add_tiers": ["t2"],
|
|
"insert_before": "t3"
|
|
}
|
|
}
|
|
```
|
|
2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
|
|
3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
|
|
4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
|
|
|
|
No agent needs callback plumbing. The runner is the notification bridge.
|
|
|
|
---
|
|
|
|
## Shared State
|
|
|
|
For software pipelines, **the repo is the primary blackboard**:
|
|
- T4 workers commit to feature branches
|
|
- T3 leads review and merge to workstream branches
|
|
- T2 architects own integration branches
|
|
- T1 does final integration and acceptance
|
|
|
|
Supplemented by a SQLite coordination store per run tracking:
|
|
- In-flight workstreams and their current execution plans
|
|
- Handoff artifacts and tier status
|
|
- Retry counts and escalation history
|
|
- Path amendments (proposed, by whom, timestamp)
|
|
|
|
---
|
|
|
|
## Failure Handling
|
|
|
|
Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
|
|
|
|
| Failure | Owner | Handler | Action |
|
|
|---------|-------|---------|--------|
|
|
| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
|
|
| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
|
|
| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
|
|
| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
|
|
| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
|
|
| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
|
|
| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
|
|
| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
|
|
| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
|
|
|
|
**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
|
|
|
|
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
|
|
|
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
|
|
|
|
---
|
|
|
|
## Agent Talent Pool
|
|
|
|
The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
|
|
|
|
**Division of responsibility:**
|
|
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
|
|
- Agency-agents provides: the specialist knowledge each agent brings to its role
|
|
|
|
T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
|
|
|
|
**Default tier-to-specialist mapping for software pipelines:**
|
|
|
|
| Tier | Domain | Agent |
|
|
|------|--------|-------|
|
|
| T1 | Strategy | nexus-strategy |
|
|
| T2 | Backend | software-architect |
|
|
| T2 | Infra | devops-automator |
|
|
| T2 | Data | data-engineer |
|
|
| T3 | Backend | senior-developer |
|
|
| T3 | Reliability | sre |
|
|
| T4 | Frontend | frontend-developer |
|
|
| T4 | Backend | backend-architect |
|
|
| T4 | Database | database-optimizer |
|
|
| T4 | DevOps | devops-automator |
|
|
| T4 | Mobile | mobile-app-builder |
|
|
| T4 | AI/ML | ai-engineer |
|
|
| T4 | Security | security-engineer |
|
|
| T4 | Docs | technical-writer |
|
|
| T5 | Code review | code-reviewer |
|
|
| T5 | Integration | testing-reality-checker |
|
|
| T5 | API | testing-api-tester |
|
|
| T5 | Performance | testing-performance-benchmarker |
|
|
| T5 | Security | security-engineer |
|
|
|
|
The roster is not fixed — T1 can select any agent from the library based on workstream needs.
|
|
|
|
---
|
|
|
|
## Adapter Layers
|
|
|
|
Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
|
|
|
|
```
|
|
Core (platform-agnostic)
|
|
├── team_runner — thin bootstrap: spawn T1, monitor blackboard, handle result
|
|
├── blackboard — SQLite coordination state
|
|
├── task_brief — schema + validation
|
|
└── escalation — retry logic, failure routing
|
|
|
|
Adapters (swappable)
|
|
├── llm/ — anthropic (now), openai, ollama, any API
|
|
├── notify/ — openclaw (now), slack, email, webhook...
|
|
├── vcs/ — github (now), gitlab, gitea, bare git...
|
|
└── runtime/
|
|
├── standard — openclaw sessions_spawn (T1/T2/T3)
|
|
└── coding_agent — claude_code (T4/T5 default), codex, aider...
|
|
```
|
|
|
|
Swapping providers means writing a new adapter file — nothing in core changes.
|
|
|
|
T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
|
|
|
|
---
|
|
|
|
## Run Visibility Layer
|
|
|
|
Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
|
|
|
|
### 1. Human-Readable Live Log
|
|
|
|
Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
|
|
|
|
```
|
|
[abc123] 12:30:01 T1 PLAN_START Assessing scope: "Build webhook ingestion system"
|
|
[abc123] 12:30:14 T1 PLAN_DONE 3 workstreams — backend-api, infra, docs (2 parallel)
|
|
[abc123] 12:30:14 GATE APPROVAL ⏸ Waiting on approval before T2 spawns
|
|
[abc123] 12:31:02 GATE APPROVED ✓ Approved — continuing
|
|
[abc123] 12:31:03 T2 LEAD_START Lead Architect spawned
|
|
[abc123] 12:31:41 T2 BOUNDS_READY Domain boundaries + shared assumptions published
|
|
[abc123] 12:31:42 T2 SPEC_START 3 specialists spawned (parallel): backend, infra, docs
|
|
[abc123] 12:32:15 T2 SPEC_DONE backend-api architecture draft ready
|
|
[abc123] 12:32:58 T2 SYNTH_DONE Canonical architecture written to blackboard
|
|
[abc123] 12:32:58 GATE INSPECTION ⏸ T2 synthesis ready for review
|
|
[abc123] 12:33:44 T3 MESH_START backend-api: 2 squad leads negotiating task boundaries
|
|
[abc123] 12:34:01 T3 MESH_DONE Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
|
|
[abc123] 12:34:02 T4 SWARM_START 5 workers spawned in parallel
|
|
[abc123] 12:35:10 T4 DONE worker-3 auth-middleware ✓
|
|
[abc123] 12:35:22 T4 FAIL worker-4 queue-client ✗ (retry 1/3)
|
|
[abc123] 12:36:04 T4 DONE worker-4 queue-client ✓ (retry resolved)
|
|
[abc123] 12:36:05 T5 VERIFY_START 4 verifiers spawned
|
|
[abc123] 12:36:45 T5 VERDICT partial — queue-client needs rework
|
|
[abc123] 12:37:12 T5 VERDICT ✓ all pass — workstream backend-api done
|
|
```
|
|
|
|
Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
|
|
|
|
### 2. Inspection Gates
|
|
|
|
Configurable pause points. When the runner hits a gate, it:
|
|
1. Writes a `gate_pending` event to the blackboard
|
|
2. Fires `notify_adapter.send()` with a tier summary to Andrew (via Hans)
|
|
3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
|
|
|
|
The tier summary surfaced at each gate includes:
|
|
- **What was produced** (the tier artifact in readable form)
|
|
- **What happens next** (which agents will spawn, doing what)
|
|
- **Any anomalies** flagged by the tier itself
|
|
|
|
Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
|
|
|
|
```yaml
|
|
visibility:
|
|
strict_mode: false
|
|
log_level: normal # normal | verbose
|
|
inspection_gates:
|
|
t1_plan: true # always — required by design
|
|
t2_lead: false # optional — review boundaries before specialists
|
|
t2_synthesis: true # recommended — review architecture before implementation
|
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
|
gate_timeout_minutes: 60 # auto-reject if no response within this window
|
|
```
|
|
|
|
### 3. Inspection CLI — `cli/agency.py`
|
|
|
|
```
|
|
agency run <config.yaml> # start a run, returns run_id
|
|
agency watch <run_id> # tail live log (follows blackboard events)
|
|
agency inspect <run_id> # interactive tree view of run state
|
|
agency inspect <run_id> --tier t2 # jump to T2 artifacts
|
|
agency inspect <run_id> --brief <id> # show full brief + result JSON
|
|
|
|
agency approve <run_id> # approve current gate → continue
|
|
agency approve <run_id> --note "..." # approve with a note written to blackboard
|
|
agency reject <run_id> --reason "..." # reject → tier re-invoked
|
|
agency pause <run_id> # force-pause at next tier boundary
|
|
agency resume <run_id> # release a manual pause
|
|
```
|
|
|
|
`agency inspect` (no flags) renders a live tree:
|
|
```
|
|
Run abc123 — "Build webhook ingestion system"
|
|
├── T1 Plan ✓
|
|
│ └── [view workplan]
|
|
├── T2 Architecture ✓ [GATE: pending review]
|
|
│ ├── [view domain boundaries]
|
|
│ ├── [view shared assumptions]
|
|
│ └── [view canonical architecture]
|
|
├── T3 backend-api (active)
|
|
│ ├── [view task breakdown]
|
|
│ └── T4 workers: 3/7 done, 1 retrying, 3 pending
|
|
└── T3 infra (pending)
|
|
```
|
|
|
|
### Blackboard Event Vocabulary (extended)
|
|
|
|
```python
|
|
# existing
|
|
"spawned" | "completed" | "failed" | "escalated" | "retried"
|
|
|
|
# new — visibility layer
|
|
"gate_pending" # runner hit a gate, waiting for human
|
|
"gate_approved" # human approved, run continues
|
|
"gate_rejected" # human rejected, tier re-invoked
|
|
"gate_paused" # manual pause via CLI
|
|
"gate_resumed" # manual resume via CLI
|
|
"path_amendment" # mid-run tier proposed path change
|
|
"log" # human-readable log line (level + message)
|
|
```
|
|
|
|
---
|
|
|
|
## Decisions Log
|
|
|
|
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
|
|
|
**T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
|
|
|
|
**T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
|
|
|
|
**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
|
|
|
|
**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
|
|
|
|
**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
|
|
|
|
**T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
|
|
|
|
**T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
|
|
|
|
**T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
|
|
|
|
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
|
|
|
|
**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off.
|
|
|
|
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
|
|
|
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
|
|
|
|
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
|
|
|
|
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
|
|
|
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|
|
|
|
**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
|
|
|
|
**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
|
|
|
|
**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
|
|
|
|
**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
|
|
|
|
**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
|
|
|
|
**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary to Andrew via Hans. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets Andrew review joint verdict before T3 marks workstream done.
|