# Tiered Agent Team System — Build Spec _Started: 2026-03-15. Last updated: 2026-03-30._ _See design.md for the design doc and decisions log._ --- ## Language & Runtime **Python 3.11+.** Reasons: - Agent/AI tooling is Python-first - Clean type hints + dataclasses for schemas - Agents can read and modify their own orchestration code - Runs anywhere — no Node, no OpenClaw dependency --- ## Repository Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git` Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw. --- ## Directory Structure ``` agent-teams/ ├── core/ │ ├── team_runner.py — run lifecycle, agent spawning │ ├── blackboard.py — SQLite coordination state │ ├── task_brief.py — schema + validation │ └── escalation.py — retry logic, failure routing │ ├── adapters/ │ ├── base/ │ │ ├── llm.py — abstract LLM interface │ │ ├── vcs.py — abstract VCS interface │ │ ├── notify.py — abstract notification interface │ │ └── runtime.py — abstract agent runtime interface │ ├── llm/ │ │ ├── anthropic.py — Claude via direct Anthropic API │ │ ├── openai.py — GPT / o-series │ │ └── ollama.py — local models │ ├── vcs/ │ │ └── github.py │ ├── notify/ │ │ └── openclaw.py — messages Hans who notifies Andrew │ └── runtime/ │ ├── openclaw.py — sessions_spawn (general purpose) │ └── claude_code.py — coding agent runtime (file/git/exec tools) │ ├── agents/ — git submodule: msitarzewski/agency-agents │ ├── engineering/ │ ├── testing/ │ ├── strategy/ │ └── ... — full agency-agents roster │ ├── prompts/ │ ├── t1_visionary.md — fallback if no agent_personality set │ ├── t2_architect.md │ ├── t3_squad_lead.md │ ├── t4_implementer.md │ └── t5_verifier.md │ ├── config/ │ ├── team.yaml — example run configuration │ └── role_registry.yaml — maps (tier, domain) → agent personality file │ ├── cli/ │ └── agency.py — run, watch, inspect, approve, reject, pause, resume │ ├── runs/ — runtime state, one subdir per run_id │ └── .gitkeep │ └── README.md ``` --- ## Blackboard SQLite. One file per run at `runs//blackboard.db`. ### Tables **runs** ```sql CREATE TABLE runs ( run_id TEXT PRIMARY KEY, goal TEXT NOT NULL, status TEXT NOT NULL, -- pending | active | review | done | failed created_at TEXT NOT NULL, updated_at TEXT NOT NULL ); ``` **workstreams** ```sql CREATE TABLE workstreams ( workstream_id TEXT PRIMARY KEY, run_id TEXT NOT NULL, name TEXT NOT NULL, tier INTEGER NOT NULL, status TEXT NOT NULL, -- pending | active | blocked | done | failed owner_agent_id TEXT, created_at TEXT NOT NULL, updated_at TEXT NOT NULL ); ``` **briefs** ```sql CREATE TABLE briefs ( brief_id TEXT PRIMARY KEY, run_id TEXT NOT NULL, parent_brief_id TEXT, workstream_id TEXT, tier INTEGER NOT NULL, role TEXT NOT NULL, status TEXT NOT NULL, -- pending | active | done | failed payload TEXT NOT NULL, -- full JSON brief result TEXT, -- JSON result when done retry_count INTEGER DEFAULT 0, created_at TEXT NOT NULL, updated_at TEXT NOT NULL ); ``` **events** ```sql CREATE TABLE events ( event_id TEXT PRIMARY KEY, run_id TEXT NOT NULL, brief_id TEXT, kind TEXT NOT NULL, -- see event vocabulary below detail TEXT, -- JSON created_at TEXT NOT NULL ); ``` **Event kind vocabulary:** ``` -- lifecycle spawned | completed | failed | escalated | retried -- visibility / gates gate_pending -- runner hit an inspection gate, waiting for human gate_approved -- human approved via CLI or notify gate_rejected -- human rejected, tier re-invoked gate_paused -- manual pause via CLI gate_resumed -- manual resume via CLI -- amendments / informational path_amendment -- mid-run tier proposed a tier path change log -- human-readable log line (detail: {level, message}) ``` **t3_task_lists** *(T3 mesh coordination)* ```sql CREATE TABLE t3_task_lists ( entry_id TEXT PRIMARY KEY, run_id TEXT NOT NULL, workstream_id TEXT NOT NULL, t3_agent_id TEXT NOT NULL, status TEXT NOT NULL, -- draft | committed tasks TEXT NOT NULL, -- JSON array of proposed T4 task descriptors created_at TEXT NOT NULL, updated_at TEXT NOT NULL ); ``` --- ## Task Brief Schema Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief. ```json { "brief_id": "uuid", "run_id": "uuid", "parent_brief_id": "uuid | null", "tier": 4, "role": "implementer", "goal_anchor": "Original T1 intent — always propagated unchanged", "workstream": "backend-api", "task": "Implement POST /webhooks/ingest endpoint", "acceptance_criteria": [ "Accepts JSON payload", "Returns 202 on success", "Writes to queue" ], "constraints": [ "Use existing queue client in src/queue.py", "No new dependencies" ], "context": { "relevant_files": ["src/routes/webhooks.py", "src/queue.py"], "interface_contract": "..." }, "retry_budget": 3, "retry_count": 0, "preferred_runtime": "coding_agent", "agent_personality": "agents/engineering/engineering-code-reviewer.md", "created_at": "ISO-8601" } ``` `preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured. `agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set. ``` ``` --- ## Adapter Interfaces ### LLM (`adapters/base/llm.py`) ```python class LLMAdapter: def complete(self, prompt: str, capability: str, context: dict) -> str def resolve_model(self, capability: str) -> str # capability: "reasoning-heavy" | "capable" | "fast-cheap" ``` ### VCS (`adapters/base/vcs.py`) ```python class VCSAdapter: def create_branch(self, name: str) -> None def commit(self, files: list[str], message: str) -> str # returns commit sha def create_pr(self, title: str, body: str, head: str, base: str) -> str # returns pr url def get_pr_status(self, pr_id: str) -> str # open | merged | closed ``` ### Notify (`adapters/base/notify.py`) ```python class NotifyAdapter: def send(self, message: str, context: dict) -> None ``` ### Runtime (`adapters/base/runtime.py`) ```python class RuntimeAdapter: def spawn(self, task: str, capability: str, context: dict) -> str # returns agent_id def get_result(self, agent_id: str, timeout_s: int) -> dict def kill(self, agent_id: str) -> None # Two implementations: # openclaw.py — general purpose, uses sessions_spawn, suits T1/T2/T3 # claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5 # # The runner selects runtime based on brief.preferred_runtime: # "standard" → openclaw.py (default) # "coding_agent" → claude_code.py (falls back to standard if unavailable) # # Both implementations inject brief.agent_personality as the system prompt # when spawning, if present. Falls back to generic tier prompt otherwise. # claude_code.py passes the agent file via --system-prompt flag natively # (agency-agents was designed for Claude Code's agents/ directory). ``` --- ## Run Config (`config/team.yaml`) ```yaml run: goal: "Build webhook ingestion system with retry logic and DLQ" repo: "git@github.com:org/repo.git" base_branch: "main" adapters: llm: anthropic vcs: github notify: openclaw runtime: openclaw models: provider: anthropic # default provider capability_map: reasoning-heavy: anthropic: claude-opus-4-6 openai: o3 capable: anthropic: claude-sonnet-4-6 openai: gpt-4o ollama: llama3.1:70b fast-cheap: anthropic: claude-haiku-3-5 openai: gpt-4o-mini ollama: llama3.2 # optional: override provider per tier tier_overrides: t1: { provider: openai, capability: reasoning-heavy } t4: { provider: ollama, capability: fast-cheap } runtime: default: openclaw coding_agent: claude_code # used for T4/T5 when available; omit to disable native_teams: false # Claude Code's experimental agent teams — opt-in only # when true: T3 hands full workstream to Claude Code, # which fans out internally. faster but less blackboard # visibility. default: false (explicit T4 spawning) # tier_runtime_map (optional overrides): # t1: standard # t2: standard # t3: standard # t4: coding_agent # t5: coding_agent retry_defaults: bad_output: 3 partial: 2 blocked: 0 # always escalate immediately visibility: strict_mode: false # true = all gates on (recommended for first runs) log_level: normal # normal | verbose (verbose = per-T4 start/done lines) inspection_gates: t1_plan: true # always — required by design t2_lead: false # optional — review boundaries before specialists spawn t2_synthesis: true # recommended — review architecture before implementation t3_plan: false # verbose — useful early on, disable once T3 is trusted t5_verdict: false # review T5 joint verdict before T3 marks workstream done gate_timeout_minutes: 60 # auto-reject if no human response within this window t3_mesh_timeout_minutes: 10 # max time for T3s to commit task lists before runner escalates ``` --- ## Role Registry (`config/role_registry.yaml`) Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes. ```yaml t1: default: agents/strategy/nexus-strategy.md t2: backend: agents/engineering/engineering-software-architect.md frontend: agents/engineering/engineering-software-architect.md infra: agents/engineering/engineering-devops-automator.md data: agents/engineering/engineering-data-engineer.md default: agents/engineering/engineering-software-architect.md t3: backend: agents/engineering/engineering-senior-developer.md frontend: agents/engineering/engineering-senior-developer.md infra: agents/engineering/engineering-sre.md default: agents/engineering/engineering-senior-developer.md t4: frontend: agents/engineering/engineering-frontend-developer.md backend: agents/engineering/engineering-backend-architect.md database: agents/engineering/engineering-database-optimizer.md devops: agents/engineering/engineering-devops-automator.md mobile: agents/engineering/engineering-mobile-app-builder.md ai: agents/engineering/engineering-ai-engineer.md security: agents/engineering/engineering-security-engineer.md docs: agents/engineering/engineering-technical-writer.md default: agents/engineering/engineering-senior-developer.md t5: code: agents/engineering/engineering-code-reviewer.md integration: agents/testing/testing-reality-checker.md api: agents/testing/testing-api-tester.md performance: agents/testing/testing-performance-benchmarker.md security: agents/engineering/engineering-security-engineer.md default: agents/engineering/engineering-code-reviewer.md ``` ```yaml ``` --- ## Key Flows ### 1. Run Kickoff ``` User → team_runner.start(goal, config) # via CLI or any caller → generate run_id → init blackboard (create runs//blackboard.db) → build T1 brief (goal_anchor = goal, retry_budget from config) → spawn T1 via runtime adapter → await T1 workplan ``` ### 2. T1 Scope Assessment ``` T1 receives brief → assess complexity → decide depth → identify workstreams → set retry_budget multiplier per workstream (1x simple, 2x complex) → emit N workstream briefs for T2 (or T3 if shallow) → write workplan to blackboard → team_runner spawns T2s in parallel ``` ### 3. T4 Retry Loop (escalation.py) ``` spawn T4 with brief → receive result → classify: bad_output | blocked | partial | success blocked: → log event(escalated) → pass to T3 immediately bad_output, retries_remaining: → amend brief with failure context, increment retry_count → re-spawn T4 → log event(retried) bad_output, retries_exhausted: → log event(escalated) → pass to T3 partial: → write salvageable parts to blackboard → re-task remainder with new brief success: → write result to blackboard → log event(completed) → notify T3 ``` ### 4. Inspection Gate Flow ``` runner reaches configured gate (e.g. t2_synthesis) → write event(gate_pending, detail={tier, summary, what_happens_next}) → notify_adapter.send(tier summary + gate context) → halt: poll blackboard for gate_approved or gate_rejected gate_approved: → write event(gate_approved) → continue run gate_rejected: → write event(gate_rejected, detail={reason}) → re-invoke tier with rejection reason in brief context → loop back to gate_pending when tier completes again gate_timeout (gate_timeout_minutes elapsed): → treat as gate_rejected → notify Andrew: "Gate timed out, re-invoking tier" ``` ### 5. Review Gate ``` T1 completes integration → vcs_adapter.create_pr( title="[agent-teams] : ", body="", head="integration/", base="main" ) → notify_adapter.send( "Run complete. PR ready for review: ", context={run_id, goal, workstreams, pr_url} ) → blackboard: update run status → "review" → halt — no auto-merge ``` --- ## Build Order 1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool 2. `config/role_registry.yaml` — map tier+domain → agent personality files 3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema 4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary 5. `adapters/base/*` — all four abstract interfaces 6. `adapters/llm/anthropic.py` — first LLM implementation 7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally) 8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection 9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt 10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only 11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree 12. `prompts/` — fallback tier prompts (used when no agent_personality set) 13. `adapters/vcs/github.py` — PR creation + branch management 14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing 15. `config/team.yaml` — example config with full visibility block 16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference --- ## Out of Scope (Phase 2) - Cost accounting per tier + run rollup - Parallel workstream progress dashboard - Additional adapter implementations (GitLab, Slack, OpenAI, Ollama) - Persistent standing teams - Web UI for run monitoring