docs: add design doc and buildspec (#5)
This commit is contained in:
208
docs/design.md
Normal file
208
docs/design.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# Tiered Agent Team System — Design Document
|
||||
|
||||
_Started: 2026-03-14. Status: Pre-build, gathering requirements._
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
|
||||
|
||||
---
|
||||
|
||||
## Core Principles
|
||||
|
||||
**1. Tiers represent cognitive modes, not org chart levels.**
|
||||
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
|
||||
|
||||
**2. Depth is proportional to complexity.**
|
||||
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack.
|
||||
|
||||
**3. Goal anchoring at every level.**
|
||||
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
|
||||
|
||||
**4. Artifacts, not summaries.**
|
||||
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
|
||||
|
||||
**5. Verification is bidirectional.**
|
||||
Lower tiers verify correctness. Upper tiers verify alignment with original intent. Both directions catch different failure modes.
|
||||
|
||||
**6. Provider agnostic.**
|
||||
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
|
||||
|
||||
**7. Specialist talent pool.**
|
||||
Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
|
||||
|
||||
---
|
||||
|
||||
## Tier Definitions
|
||||
|
||||
| Tier | Role | Owns | Capability Level |
|
||||
|------|------|------|-----------------|
|
||||
| T1 | Visionary | Goal, constraints, final acceptance, architectural bets | reasoning-heavy |
|
||||
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
|
||||
| T3 | Squad Lead | Workstream delivery, worker coordination, quality gate | capable |
|
||||
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
|
||||
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
|
||||
|
||||
T5 runs **parallel to T4**, not above it. It's a quality gate, not a management layer.
|
||||
|
||||
Capability levels map to actual models per provider in config — the core system never references a specific model name.
|
||||
|
||||
---
|
||||
|
||||
## Variable Depth
|
||||
|
||||
```
|
||||
Config change T3 → T4
|
||||
New feature T2 → T3 → T4
|
||||
Major refactor T1 → T2 → T3 → T4 → T5
|
||||
New system / product T1 → T2 → T3s (parallel) → T4s → T5s
|
||||
```
|
||||
|
||||
T3 assesses scope on receipt. If a task is simple enough, it handles it directly without spawning upward or waiting for T2 sign-off.
|
||||
|
||||
---
|
||||
|
||||
## Horizontal Scaling Within Tiers
|
||||
|
||||
Each tier can have multiple agents running in parallel:
|
||||
|
||||
```
|
||||
T1 (1–2 agents)
|
||||
├── T2: Backend Architect
|
||||
│ ├── T3: API Squad Lead
|
||||
│ │ ├── T4: Worker — endpoint A
|
||||
│ │ ├── T4: Worker — endpoint B
|
||||
│ │ └── T5: Verifier
|
||||
│ └── T3: DB Squad Lead
|
||||
│ ├── T4: Worker — migrations
|
||||
│ └── T5: Verifier
|
||||
├── T2: Frontend Architect
|
||||
│ └── T3: UI Squad Lead
|
||||
│ ├── T4: Worker — component X
|
||||
│ └── T4: Worker — component Y
|
||||
└── T2: Infra Architect
|
||||
└── T3: Platform Squad Lead
|
||||
└── T4: Worker — config / deploy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Shared State
|
||||
|
||||
For software pipelines, **the repo is the primary blackboard**:
|
||||
- T4 workers commit to feature branches
|
||||
- T3 leads review and merge to workstream branches
|
||||
- T2 architects own integration branches
|
||||
- T1 does final integration and acceptance
|
||||
|
||||
Supplemented by a SQLite coordination store per run tracking in-flight workstreams, handoff artifacts, tier status, and retry counts.
|
||||
|
||||
---
|
||||
|
||||
## Failure Handling
|
||||
|
||||
| Failure | Handler | Action |
|
||||
|---------|---------|--------|
|
||||
| T4 bad output | T3 | Retry T4 with corrected brief (up to retry_budget) |
|
||||
| T4 blocked | T3 | Escalate immediately — no retries |
|
||||
| T4 partial output | T3 | Salvage good parts, re-task remainder |
|
||||
| T3 workstream stuck | T2 | Re-scope or split the workstream |
|
||||
| T2 design wrong | T1 | Re-plan; may discard workstream and restart |
|
||||
| Repeated escalation | Surface to user | Block until human unblocks |
|
||||
|
||||
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
||||
|
||||
---
|
||||
|
||||
## Agent Talent Pool
|
||||
|
||||
The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
|
||||
|
||||
**Division of responsibility:**
|
||||
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
|
||||
- Agency-agents provides: the specialist knowledge each agent brings to its role
|
||||
|
||||
T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
|
||||
|
||||
**Default tier-to-specialist mapping for software pipelines:**
|
||||
|
||||
| Tier | Domain | Agent |
|
||||
|------|--------|-------|
|
||||
| T1 | Strategy | nexus-strategy |
|
||||
| T2 | Backend | software-architect |
|
||||
| T2 | Infra | devops-automator |
|
||||
| T2 | Data | data-engineer |
|
||||
| T3 | Backend | senior-developer |
|
||||
| T3 | Reliability | sre |
|
||||
| T4 | Frontend | frontend-developer |
|
||||
| T4 | Backend | backend-architect |
|
||||
| T4 | Database | database-optimizer |
|
||||
| T4 | DevOps | devops-automator |
|
||||
| T4 | Mobile | mobile-app-builder |
|
||||
| T4 | AI/ML | ai-engineer |
|
||||
| T4 | Security | security-engineer |
|
||||
| T4 | Docs | technical-writer |
|
||||
| T5 | Code review | code-reviewer |
|
||||
| T5 | Integration | testing-reality-checker |
|
||||
| T5 | API | testing-api-tester |
|
||||
| T5 | Performance | testing-performance-benchmarker |
|
||||
| T5 | Security | security-engineer |
|
||||
|
||||
The roster is not fixed — T1 can select any agent from the library based on workstream needs. Non-engineering agents (design, marketing, product) extend the system to non-software pipelines.
|
||||
|
||||
---
|
||||
|
||||
## Adapter Layers
|
||||
|
||||
Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
|
||||
|
||||
```
|
||||
Core (platform-agnostic)
|
||||
├── team_runner — run lifecycle, agent spawning, runtime selection
|
||||
├── blackboard — SQLite coordination state
|
||||
├── task_brief — schema + validation
|
||||
└── escalation — retry logic, failure routing
|
||||
|
||||
Adapters (swappable)
|
||||
├── llm/ — anthropic (now), openai, ollama, any API
|
||||
├── notify/ — openclaw (now), slack, email, webhook...
|
||||
├── vcs/ — github (now), gitlab, gitea, bare git...
|
||||
└── runtime/
|
||||
├── standard — openclaw sessions_spawn (T1/T2/T3)
|
||||
└── coding_agent — claude_code (T4/T5 default), codex, aider...
|
||||
```
|
||||
|
||||
Swapping providers means writing a new adapter file — nothing in core changes.
|
||||
|
||||
T4 and T5 default to the **coding agent runtime** when available. It provides direct file system access, git operations, and test execution — no need to shuttle file contents through message context. Falls back to standard runtime gracefully if not configured.
|
||||
|
||||
---
|
||||
|
||||
## Decisions
|
||||
|
||||
**Depth decision** — T1 assesses scope on receipt and determines how many tiers to engage. Not pre-configured per task type.
|
||||
|
||||
**Trigger mechanism** — User messages Hans → Hans spins up T1 with the goal. T1 takes it from there.
|
||||
|
||||
**Output / review** — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew for review. Merge is gated on human sign-off. Notification is dual: Hans messages Andrew directly, and a PR is opened on the VCS platform so Andrew gets notified natively too. This keeps the review step platform-independent — whichever VCS is in use, Hans always notifies Andrew directly as a fallback.
|
||||
|
||||
**Retry limits** — Three failure types, handled differently:
|
||||
- *Bad output* → retry T4 with a corrected brief (default: 3 retries)
|
||||
- *Blocked* → escalate immediately, no retries
|
||||
- *Partial output* → salvage good parts, re-task the remainder
|
||||
|
||||
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
|
||||
|
||||
**Platform agnosticism** — Core logic is provider and platform agnostic. LLMs, VCS, notifications, and agent runtimes are all adapters. Tiers reference capability levels (`reasoning-heavy`, `capable`, `fast-cheap`), not specific model names. Provider-to-model mapping lives in config.
|
||||
|
||||
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection and mixing providers across tiers (e.g. T1 on OpenAI o3, T4 workers on local Ollama).
|
||||
|
||||
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw is used as the runtime adapter via existing primitives (sessions_spawn, sessions_send, subagents) — called through a skill layer. No gateway fork. Keeps platform agnosticism intact and avoids Node/Python mismatch and fork maintenance burden.
|
||||
|
||||
**Coding agent runtime** — Claude Code is the default T4/T5 runtime for software pipelines. It is purpose-built for implementation and verification: direct file access, git ops, test execution. Enters as a runtime adapter — swappable for Codex, Aider, or any equivalent. T1/T2/T3 always use the standard runtime (they reason, they don't edit files).
|
||||
|
||||
**Claude Code native teams** — Claude Code has an experimental agent teams feature that fans out sub-agents internally within a session. Integrated as an opt-in flag (`native_teams: true`) in the coding_agent runtime adapter. When enabled, T3 hands a full workstream to Claude Code and it parallelises internally — faster, but less granular blackboard visibility. Default is `false` — explicit T4 spawning is the baseline; native teams is a speed optimisation to enable deliberately.
|
||||
|
||||
**Agency-agents integration** — Agent personalities sourced from [msitarzewski/agency-agents](https://github.com/msitarzewski/agency-agents) via git submodule. Included as `agents/` in the repo. T1 selects specialists from the roster via `config/role_registry.yaml`. Each task brief carries an `agent_personality` field (path to the agent .md file) which the runtime adapter injects as the system prompt at spawn time. Adding new specialists means adding an entry to the registry — no core changes required.
|
||||
Reference in New Issue
Block a user