docs: add design doc and buildspec (#5)

2026-03-16 15:51:14 -04:00
parent 084cfb0bb2
commit 72bd744664
2 changed files with 645 additions and 0 deletions
--- a/docs/buildspec.md
+++ b/docs/buildspec.md
@@ -0,0 +1,437 @@
+# Tiered Agent Team System — Build Spec
+
+_Started: 2026-03-15. Status: Pre-build._
+_See agent-teams-design.md for the design doc and decisions log._
+
+---
+
+## Language & Runtime
+
+**Python 3.11+.** Reasons:
+- Agent/AI tooling is Python-first
+- Clean type hints + dataclasses for schemas
+- Agents can read and modify their own orchestration code
+- Runs anywhere — no Node, no OpenClaw dependency
+
+---
+
+## Repository
+
+Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`
+
+Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
+
+---
+
+## Directory Structure
+
+```
+agent-teams/
+├── core/
+│   ├── team_runner.py       — run lifecycle, agent spawning
+│   ├── blackboard.py        — SQLite coordination state
+│   ├── task_brief.py        — schema + validation
+│   └── escalation.py        — retry logic, failure routing
+│
+├── adapters/
+│   ├── base/
+│   │   ├── llm.py           — abstract LLM interface
+│   │   ├── vcs.py           — abstract VCS interface
+│   │   ├── notify.py        — abstract notification interface
+│   │   └── runtime.py       — abstract agent runtime interface
+│   ├── llm/
+│   │   ├── anthropic.py     — Claude via OpenClaw or direct API
+│   │   ├── openai.py        — GPT / o-series
+│   │   └── ollama.py        — local models
+│   ├── vcs/
+│   │   └── github.py
+│   ├── notify/
+│   │   └── openclaw.py      — messages Hans who notifies Andrew
+│   └── runtime/
+│       ├── openclaw.py      — sessions_spawn (general purpose)
+│       └── claude_code.py   — coding agent runtime (file/git/exec tools)
+│
+├── agents/                  — git submodule: msitarzewski/agency-agents
+│   ├── engineering/
+│   ├── testing/
+│   ├── strategy/
+│   └── ...                  — full agency-agents roster
+│
+├── prompts/
+│   ├── t1_visionary.md      — fallback if no agent_personality set
+│   ├── t2_architect.md
+│   ├── t3_squad_lead.md
+│   ├── t4_implementer.md
+│   └── t5_verifier.md
+│
+├── config/
+│   ├── team.yaml            — example run configuration
+│   └── role_registry.yaml   — maps (tier, domain) → agent personality file
+│
+├── runs/                    — runtime state, one subdir per run_id
+│   └── .gitkeep
+│
+└── README.md
+```
+
+---
+
+## Blackboard
+
+SQLite. One file per run at `runs/<run_id>/blackboard.db`.
+
+### Tables
+
+**runs**
+```sql
+CREATE TABLE runs (
+    run_id      TEXT PRIMARY KEY,
+    goal        TEXT NOT NULL,
+    status      TEXT NOT NULL,  -- pending | active | review | done | failed
+    created_at  TEXT NOT NULL,
+    updated_at  TEXT NOT NULL
+);
+```
+
+**workstreams**
+```sql
+CREATE TABLE workstreams (
+    workstream_id   TEXT PRIMARY KEY,
+    run_id          TEXT NOT NULL,
+    name            TEXT NOT NULL,
+    tier            INTEGER NOT NULL,
+    status          TEXT NOT NULL,  -- pending | active | blocked | done | failed
+    owner_agent_id  TEXT,
+    created_at      TEXT NOT NULL,
+    updated_at      TEXT NOT NULL
+);
+```
+
+**briefs**
+```sql
+CREATE TABLE briefs (
+    brief_id        TEXT PRIMARY KEY,
+    run_id          TEXT NOT NULL,
+    parent_brief_id TEXT,
+    workstream_id   TEXT,
+    tier            INTEGER NOT NULL,
+    role            TEXT NOT NULL,
+    status          TEXT NOT NULL,  -- pending | active | done | failed
+    payload         TEXT NOT NULL,  -- full JSON brief
+    result          TEXT,           -- JSON result when done
+    retry_count     INTEGER DEFAULT 0,
+    created_at      TEXT NOT NULL,
+    updated_at      TEXT NOT NULL
+);
+```
+
+**events**
+```sql
+CREATE TABLE events (
+    event_id    TEXT PRIMARY KEY,
+    run_id      TEXT NOT NULL,
+    brief_id    TEXT,
+    kind        TEXT NOT NULL,  -- spawned | completed | failed | escalated | retried
+    detail      TEXT,           -- JSON
+    created_at  TEXT NOT NULL
+);
+```
+
+---
+
+## Task Brief Schema
+
+Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
+
+```json
+{
+  "brief_id": "uuid",
+  "run_id": "uuid",
+  "parent_brief_id": "uuid | null",
+  "tier": 4,
+  "role": "implementer",
+  "goal_anchor": "Original T1 intent — always propagated unchanged",
+  "workstream": "backend-api",
+  "task": "Implement POST /webhooks/ingest endpoint",
+  "acceptance_criteria": [
+    "Accepts JSON payload",
+    "Returns 202 on success",
+    "Writes to queue"
+  ],
+  "constraints": [
+    "Use existing queue client in src/queue.py",
+    "No new dependencies"
+  ],
+  "context": {
+    "relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
+    "interface_contract": "..."
+  },
+  "retry_budget": 3,
+  "retry_count": 0,
+  "preferred_runtime": "coding_agent",
+  "agent_personality": "agents/engineering/engineering-code-reviewer.md",
+  "created_at": "ISO-8601"
+}
+```
+
+`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
+
+`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
+
+```
+```
+
+---
+
+## Adapter Interfaces
+
+### LLM (`adapters/base/llm.py`)
+```python
+class LLMAdapter:
+    def complete(self, prompt: str, capability: str, context: dict) -> str
+    def resolve_model(self, capability: str) -> str
+    # capability: "reasoning-heavy" | "capable" | "fast-cheap"
+```
+
+### VCS (`adapters/base/vcs.py`)
+```python
+class VCSAdapter:
+    def create_branch(self, name: str) -> None
+    def commit(self, files: list[str], message: str) -> str       # returns commit sha
+    def create_pr(self, title: str, body: str, head: str, base: str) -> str  # returns pr url
+    def get_pr_status(self, pr_id: str) -> str                    # open | merged | closed
+```
+
+### Notify (`adapters/base/notify.py`)
+```python
+class NotifyAdapter:
+    def send(self, message: str, context: dict) -> None
+```
+
+### Runtime (`adapters/base/runtime.py`)
+```python
+class RuntimeAdapter:
+    def spawn(self, task: str, capability: str, context: dict) -> str  # returns agent_id
+    def get_result(self, agent_id: str, timeout_s: int) -> dict
+    def kill(self, agent_id: str) -> None
+
+# Two implementations:
+#   openclaw.py    — general purpose, uses sessions_spawn, suits T1/T2/T3
+#   claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
+#
+# The runner selects runtime based on brief.preferred_runtime:
+#   "standard"      → openclaw.py (default)
+#   "coding_agent"  → claude_code.py (falls back to standard if unavailable)
+#
+# Both implementations inject brief.agent_personality as the system prompt
+# when spawning, if present. Falls back to generic tier prompt otherwise.
+# claude_code.py passes the agent file via --system-prompt flag natively
+# (agency-agents was designed for Claude Code's agents/ directory).
+```
+
+---
+
+## Run Config (`config/team.yaml`)
+
+```yaml
+run:
+  goal: "Build webhook ingestion system with retry logic and DLQ"
+  repo: "git@github.com:org/repo.git"
+  base_branch: "main"
+
+adapters:
+  llm: anthropic
+  vcs: github
+  notify: openclaw
+  runtime: openclaw
+
+models:
+  provider: anthropic          # default provider
+  capability_map:
+    reasoning-heavy:
+      anthropic: claude-opus-4-6
+      openai: o3
+    capable:
+      anthropic: claude-sonnet-4-6
+      openai: gpt-4o
+      ollama: llama3.1:70b
+    fast-cheap:
+      anthropic: claude-haiku-3-5
+      openai: gpt-4o-mini
+      ollama: llama3.2
+
+  # optional: override provider per tier
+  tier_overrides:
+    t1: { provider: openai, capability: reasoning-heavy }
+    t4: { provider: ollama, capability: fast-cheap }
+
+runtime:
+  default: openclaw
+  coding_agent: claude_code     # used for T4/T5 when available; omit to disable
+  native_teams: false           # Claude Code's experimental agent teams — opt-in only
+                                # when true: T3 hands full workstream to Claude Code,
+                                # which fans out internally. faster but less blackboard
+                                # visibility. default: false (explicit T4 spawning)
+  # tier_runtime_map (optional overrides):
+  #   t1: standard
+  #   t2: standard
+  #   t3: standard
+  #   t4: coding_agent
+  #   t5: coding_agent
+
+retry_defaults:
+  bad_output: 3
+  partial: 2
+  blocked: 0    # always escalate immediately
+```
+
+---
+
+## Role Registry (`config/role_registry.yaml`)
+
+Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
+
+```yaml
+t1:
+  default: agents/strategy/nexus-strategy.md
+
+t2:
+  backend:  agents/engineering/engineering-software-architect.md
+  frontend: agents/engineering/engineering-software-architect.md
+  infra:    agents/engineering/engineering-devops-automator.md
+  data:     agents/engineering/engineering-data-engineer.md
+  default:  agents/engineering/engineering-software-architect.md
+
+t3:
+  backend:  agents/engineering/engineering-senior-developer.md
+  frontend: agents/engineering/engineering-senior-developer.md
+  infra:    agents/engineering/engineering-sre.md
+  default:  agents/engineering/engineering-senior-developer.md
+
+t4:
+  frontend:  agents/engineering/engineering-frontend-developer.md
+  backend:   agents/engineering/engineering-backend-architect.md
+  database:  agents/engineering/engineering-database-optimizer.md
+  devops:    agents/engineering/engineering-devops-automator.md
+  mobile:    agents/engineering/engineering-mobile-app-builder.md
+  ai:        agents/engineering/engineering-ai-engineer.md
+  security:  agents/engineering/engineering-security-engineer.md
+  docs:      agents/engineering/engineering-technical-writer.md
+  default:   agents/engineering/engineering-senior-developer.md
+
+t5:
+  code:        agents/engineering/engineering-code-reviewer.md
+  integration: agents/testing/testing-reality-checker.md
+  api:         agents/testing/testing-api-tester.md
+  performance: agents/testing/testing-performance-benchmarker.md
+  security:    agents/engineering/engineering-security-engineer.md
+  default:     agents/engineering/engineering-code-reviewer.md
+```
+
+```yaml
+```
+
+---
+
+## Key Flows
+
+### 1. Run Kickoff
+
+```
+User → Hans → team_runner.start(goal, config)
+  → generate run_id
+  → init blackboard (create runs/<run_id>/blackboard.db)
+  → build T1 brief (goal_anchor = goal, retry_budget from config)
+  → spawn T1 via runtime adapter
+  → await T1 workplan
+```
+
+### 2. T1 Scope Assessment
+
+```
+T1 receives brief
+  → assess complexity → decide depth
+  → identify workstreams
+  → set retry_budget multiplier per workstream (1x simple, 2x complex)
+  → emit N workstream briefs for T2 (or T3 if shallow)
+  → write workplan to blackboard
+  → team_runner spawns T2s in parallel
+```
+
+### 3. T4 Retry Loop (escalation.py)
+
+```
+spawn T4 with brief
+  → receive result
+  → classify: bad_output | blocked | partial | success
+
+  blocked:
+    → log event(escalated)
+    → pass to T3 immediately
+
+  bad_output, retries_remaining:
+    → amend brief with failure context, increment retry_count
+    → re-spawn T4
+    → log event(retried)
+
+  bad_output, retries_exhausted:
+    → log event(escalated)
+    → pass to T3
+
+  partial:
+    → write salvageable parts to blackboard
+    → re-task remainder with new brief
+
+  success:
+    → write result to blackboard
+    → log event(completed)
+    → notify T3
+```
+
+### 4. Review Gate
+
+```
+T1 completes integration
+  → vcs_adapter.create_pr(
+      title="[agent-teams] <run_id>: <goal summary>",
+      body="<workplan + workstream summaries>",
+      head="integration/<run_id>",
+      base="main"
+    )
+  → notify_adapter.send(
+      "Run <run_id> complete. PR ready for review: <pr_url>",
+      context={run_id, goal, workstreams, pr_url}
+    )
+  → blackboard: update run status → "review"
+  → halt — no auto-merge
+```
+
+---
+
+## Build Order
+
+1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
+2. `config/role_registry.yaml` — map tier+domain → agent personality files
+3. `core/task_brief.py` — schema + validation (everything depends on this)
+4. `core/blackboard.py` — SQLite store, all table definitions
+5. `adapters/base/*` — all four abstract interfaces
+6. `adapters/llm/anthropic.py` — first LLM implementation
+7. `core/escalation.py` — retry + failure routing logic
+8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
+9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
+10. `core/team_runner.py` — full run lifecycle, runtime + personality selection
+11. `prompts/` — fallback tier prompts (used when no agent_personality set)
+12. `adapters/vcs/github.py` — PR creation + branch management
+13. `adapters/notify/openclaw.py` — Hans notification
+14. `config/team.yaml` — example config
+15. `README.md` — how to run, how to add adapters, how to extend the roster
+
+---
+
+## Out of Scope (Phase 2)
+
+- Cost accounting per tier + run rollup
+- Parallel workstream progress dashboard
+- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
+- Persistent standing teams
+- Web UI for run monitoring