the-agency/docs/buildspec.md

# Tiered Agent Team System — Build Spec

_Started: 2026-03-15. Last updated: 2026-03-30._
_See design.md for the design doc and decisions log._

---

## Language & Runtime

**Python 3.11+.** Reasons:
- Agent/AI tooling is Python-first
- Clean type hints + dataclasses for schemas
- Agents can read and modify their own orchestration code
- Runs anywhere — no Node, no OpenClaw dependency

---

## Repository

Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`

Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.

---

## Directory Structure

```
agent-teams/
├── core/
│   ├── team_runner.py       — run lifecycle, agent spawning
│   ├── blackboard.py        — SQLite coordination state
│   ├── task_brief.py        — schema + validation
│   └── escalation.py        — retry logic, failure routing
│
├── adapters/
│   ├── base/
│   │   ├── llm.py           — abstract LLM interface
│   │   ├── vcs.py           — abstract VCS interface
│   │   ├── notify.py        — abstract notification interface
│   │   └── runtime.py       — abstract agent runtime interface
│   ├── llm/
│   │   ├── anthropic.py     — Claude via OpenClaw or direct API
│   │   ├── openai.py        — GPT / o-series
│   │   └── ollama.py        — local models
│   ├── vcs/
│   │   └── github.py
│   ├── notify/
│   │   └── openclaw.py      — messages Hans who notifies Andrew
│   └── runtime/
│       ├── openclaw.py      — sessions_spawn (general purpose)
│       └── claude_code.py   — coding agent runtime (file/git/exec tools)
│
├── agents/                  — git submodule: msitarzewski/agency-agents
│   ├── engineering/
│   ├── testing/
│   ├── strategy/
│   └── ...                  — full agency-agents roster
│
├── prompts/
│   ├── t1_visionary.md      — fallback if no agent_personality set
│   ├── t2_architect.md
│   ├── t3_squad_lead.md
│   ├── t4_implementer.md
│   └── t5_verifier.md
│
├── config/
│   ├── team.yaml            — example run configuration
│   └── role_registry.yaml   — maps (tier, domain) → agent personality file
│
├── cli/
│   └── agency.py            — run, watch, inspect, approve, reject, pause, resume
│
├── runs/                    — runtime state, one subdir per run_id
│   └── .gitkeep
│
├── pending_gates.json       — live file: gates currently awaiting approval (written by runner, read by Hans)
│
└── README.md
```

---

## Blackboard

SQLite. One file per run at `runs/<run_id>/blackboard.db`.

### Tables

**runs**
```sql
CREATE TABLE runs (
    run_id      TEXT PRIMARY KEY,
    goal        TEXT NOT NULL,
    status      TEXT NOT NULL,  -- pending | active | review | done | failed
    created_at  TEXT NOT NULL,
    updated_at  TEXT NOT NULL
);
```

**workstreams**
```sql
CREATE TABLE workstreams (
    workstream_id   TEXT PRIMARY KEY,
    run_id          TEXT NOT NULL,
    name            TEXT NOT NULL,
    tier            INTEGER NOT NULL,
    status          TEXT NOT NULL,  -- pending | active | blocked | done | failed
    owner_agent_id  TEXT,
    created_at      TEXT NOT NULL,
    updated_at      TEXT NOT NULL
);
```

**briefs**
```sql
CREATE TABLE briefs (
    brief_id        TEXT PRIMARY KEY,
    run_id          TEXT NOT NULL,
    parent_brief_id TEXT,
    workstream_id   TEXT,
    tier            INTEGER NOT NULL,
    role            TEXT NOT NULL,
    status          TEXT NOT NULL,  -- pending | active | done | failed
    payload         TEXT NOT NULL,  -- full JSON brief
    result          TEXT,           -- JSON result when done
    retry_count     INTEGER DEFAULT 0,
    created_at      TEXT NOT NULL,
    updated_at      TEXT NOT NULL
);
```

**events**
```sql
CREATE TABLE events (
    event_id    TEXT PRIMARY KEY,
    run_id      TEXT NOT NULL,
    brief_id    TEXT,
    kind        TEXT NOT NULL,  -- see event vocabulary below
    detail      TEXT,           -- JSON
    created_at  TEXT NOT NULL
);
```

**Event kind vocabulary:**
```
-- lifecycle
spawned | completed | failed | escalated | retried

-- visibility / gates
gate_pending    -- runner hit an inspection gate, waiting for human
gate_approved   -- human approved via CLI or notify
gate_rejected   -- human rejected, tier re-invoked
gate_paused     -- manual pause via CLI
gate_resumed    -- manual resume via CLI

-- amendments / informational
path_amendment  -- mid-run tier proposed a tier path change
log             -- human-readable log line (detail: {level, message})
```

**t3_task_lists** *(T3 mesh coordination)*
```sql
CREATE TABLE t3_task_lists (
    entry_id        TEXT PRIMARY KEY,
    run_id          TEXT NOT NULL,
    workstream_id   TEXT NOT NULL,
    t3_agent_id     TEXT NOT NULL,
    status          TEXT NOT NULL,  -- draft | committed
    tasks           TEXT NOT NULL,  -- JSON array of proposed T4 task descriptors
    created_at      TEXT NOT NULL,
    updated_at      TEXT NOT NULL
);
```

---

## Task Brief Schema

Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.

```json
{
  "brief_id": "uuid",
  "run_id": "uuid",
  "parent_brief_id": "uuid | null",
  "tier": 4,
  "role": "implementer",
  "goal_anchor": "Original T1 intent — always propagated unchanged",
  "workstream": "backend-api",
  "task": "Implement POST /webhooks/ingest endpoint",
  "acceptance_criteria": [
    "Accepts JSON payload",
    "Returns 202 on success",
    "Writes to queue"
  ],
  "constraints": [
    "Use existing queue client in src/queue.py",
    "No new dependencies"
  ],
  "context": {
    "relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
    "interface_contract": "..."
  },
  "retry_budget": 3,
  "retry_count": 0,
  "preferred_runtime": "coding_agent",
  "agent_personality": "agents/engineering/engineering-code-reviewer.md",
  "created_at": "ISO-8601"
}
```

`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.

`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.

```
```

---

## Adapter Interfaces

### LLM (`adapters/base/llm.py`)
```python
class LLMAdapter:
    def complete(self, prompt: str, capability: str, context: dict) -> str
    def resolve_model(self, capability: str) -> str
    # capability: "reasoning-heavy" | "capable" | "fast-cheap"
```

### VCS (`adapters/base/vcs.py`)
```python
class VCSAdapter:
    def create_branch(self, name: str) -> None
    def commit(self, files: list[str], message: str) -> str       # returns commit sha
    def create_pr(self, title: str, body: str, head: str, base: str) -> str  # returns pr url
    def get_pr_status(self, pr_id: str) -> str                    # open | merged | closed
```

### Notify (`adapters/base/notify.py`)
```python
class NotifyAdapter:
    def send(self, message: str, context: dict) -> None
```

### Runtime (`adapters/base/runtime.py`)
```python
class RuntimeAdapter:
    def spawn(self, task: str, capability: str, context: dict) -> str  # returns agent_id
    def get_result(self, agent_id: str, timeout_s: int) -> dict
    def kill(self, agent_id: str) -> None

# Two implementations:
#   openclaw.py    — general purpose, uses sessions_spawn, suits T1/T2/T3
#   claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
#
# The runner selects runtime based on brief.preferred_runtime:
#   "standard"      → openclaw.py (default)
#   "coding_agent"  → claude_code.py (falls back to standard if unavailable)
#
# Both implementations inject brief.agent_personality as the system prompt
# when spawning, if present. Falls back to generic tier prompt otherwise.
# claude_code.py passes the agent file via --system-prompt flag natively
# (agency-agents was designed for Claude Code's agents/ directory).
```

---

## Run Config (`config/team.yaml`)

```yaml
run:
  goal: "Build webhook ingestion system with retry logic and DLQ"
  repo: "git@github.com:org/repo.git"
  base_branch: "main"

adapters:
  llm: anthropic
  vcs: github
  notify: openclaw
  runtime: openclaw

models:
  provider: anthropic          # default provider
  capability_map:
    reasoning-heavy:
      anthropic: claude-opus-4-6
      openai: o3
    capable:
      anthropic: claude-sonnet-4-6
      openai: gpt-4o
      ollama: llama3.1:70b
    fast-cheap:
      anthropic: claude-haiku-3-5
      openai: gpt-4o-mini
      ollama: llama3.2

  # optional: override provider per tier
  tier_overrides:
    t1: { provider: openai, capability: reasoning-heavy }
    t4: { provider: ollama, capability: fast-cheap }

runtime:
  default: openclaw
  coding_agent: claude_code     # used for T4/T5 when available; omit to disable
  native_teams: false           # Claude Code's experimental agent teams — opt-in only
                                # when true: T3 hands full workstream to Claude Code,
                                # which fans out internally. faster but less blackboard
                                # visibility. default: false (explicit T4 spawning)
  # tier_runtime_map (optional overrides):
  #   t1: standard
  #   t2: standard
  #   t3: standard
  #   t4: coding_agent
  #   t5: coding_agent

retry_defaults:
  bad_output: 3
  partial: 2
  blocked: 0    # always escalate immediately

visibility:
  strict_mode: false          # true = all gates on (recommended for first runs)
  log_level: normal           # normal | verbose (verbose = per-T4 start/done lines)
  inspection_gates:
    t1_plan: true             # always — required by design
    t2_lead: false            # optional — review boundaries before specialists spawn
    t2_synthesis: true        # recommended — review architecture before implementation
    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
  gate_timeout_minutes: 60    # auto-reject if no human response within this window

t3_mesh_timeout_minutes: 10   # max time for T3s to commit task lists before runner escalates
```

---

## Role Registry (`config/role_registry.yaml`)

Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.

```yaml
t1:
  default: agents/strategy/nexus-strategy.md

t2:
  backend:  agents/engineering/engineering-software-architect.md
  frontend: agents/engineering/engineering-software-architect.md
  infra:    agents/engineering/engineering-devops-automator.md
  data:     agents/engineering/engineering-data-engineer.md
  default:  agents/engineering/engineering-software-architect.md

t3:
  backend:  agents/engineering/engineering-senior-developer.md
  frontend: agents/engineering/engineering-senior-developer.md
  infra:    agents/engineering/engineering-sre.md
  default:  agents/engineering/engineering-senior-developer.md

t4:
  frontend:  agents/engineering/engineering-frontend-developer.md
  backend:   agents/engineering/engineering-backend-architect.md
  database:  agents/engineering/engineering-database-optimizer.md
  devops:    agents/engineering/engineering-devops-automator.md
  mobile:    agents/engineering/engineering-mobile-app-builder.md
  ai:        agents/engineering/engineering-ai-engineer.md
  security:  agents/engineering/engineering-security-engineer.md
  docs:      agents/engineering/engineering-technical-writer.md
  default:   agents/engineering/engineering-senior-developer.md

t5:
  code:        agents/engineering/engineering-code-reviewer.md
  integration: agents/testing/testing-reality-checker.md
  api:         agents/testing/testing-api-tester.md
  performance: agents/testing/testing-performance-benchmarker.md
  security:    agents/engineering/engineering-security-engineer.md
  default:     agents/engineering/engineering-code-reviewer.md
```

```yaml
```

---

## Key Flows

### 1. Run Kickoff

```
User → Hans → team_runner.start(goal, config)
  → generate run_id
  → init blackboard (create runs/<run_id>/blackboard.db)
  → build T1 brief (goal_anchor = goal, retry_budget from config)
  → spawn T1 via runtime adapter
  → await T1 workplan
```

### 2. T1 Scope Assessment

```
T1 receives brief
  → assess complexity → decide depth
  → identify workstreams
  → set retry_budget multiplier per workstream (1x simple, 2x complex)
  → emit N workstream briefs for T2 (or T3 if shallow)
  → write workplan to blackboard
  → team_runner spawns T2s in parallel
```

### 3. T4 Retry Loop (escalation.py)

```
spawn T4 with brief
  → receive result
  → classify: bad_output | blocked | partial | success

  blocked:
    → log event(escalated)
    → pass to T3 immediately

  bad_output, retries_remaining:
    → amend brief with failure context, increment retry_count
    → re-spawn T4
    → log event(retried)

  bad_output, retries_exhausted:
    → log event(escalated)
    → pass to T3

  partial:
    → write salvageable parts to blackboard
    → re-task remainder with new brief

  success:
    → write result to blackboard
    → log event(completed)
    → notify T3
```

### 4. Inspection Gate Flow

```
runner reaches configured gate (e.g. t2_synthesis)
  → write event(gate_pending, detail={tier, summary, what_happens_next})
  → notify_adapter.send(tier summary to Andrew via Hans)
  → halt: poll blackboard for gate_approved or gate_rejected

  gate_approved:
    → write event(gate_approved)
    → continue run

  gate_rejected:
    → write event(gate_rejected, detail={reason})
    → re-invoke tier with rejection reason in brief context
    → loop back to gate_pending when tier completes again

  gate_timeout (gate_timeout_minutes elapsed):
    → treat as gate_rejected
    → notify Andrew: "Gate timed out, re-invoking tier"
```

### 5. Review Gate

```
T1 completes integration
  → vcs_adapter.create_pr(
      title="[agent-teams] <run_id>: <goal summary>",
      body="<workplan + workstream summaries>",
      head="integration/<run_id>",
      base="main"
    )
  → notify_adapter.send(
      "Run <run_id> complete. PR ready for review: <pr_url>",
      context={run_id, goal, workstreams, pr_url}
    )
  → blackboard: update run status → "review"
  → halt — no auto-merge
```

---

## Build Order

1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
2. `config/role_registry.yaml` — map tier+domain → agent personality files
3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
5. `adapters/base/*` — all four abstract interfaces
6. `adapters/llm/anthropic.py` — first LLM implementation
7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, writes pending_gates.json, gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
12. `prompts/` — fallback tier prompts (used when no agent_personality set)
13. `adapters/vcs/github.py` — PR creation + branch management
14. `adapters/notify/openclaw.py` — Hans notification; used for gate surfaces (tier summary to Andrew)
15. `config/team.yaml` — example config with full visibility block
16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference

---

## Out of Scope (Phase 2)

- Cost accounting per tier + run rollup
- Parallel workstream progress dashboard
- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
- Persistent standing teams
- Web UI for run monitoring