refactor(team_runner): replace static adapter imports with dynamic importlib loading

Concrete adapter classes (AnthropicAdapter, GitHubAdapter, etc.) are no longer imported at the top of team_runner.py. Instead, each registry maps short names to 'module.path:ClassName' strings resolved lazily via importlib.import_module at instantiation time. This means: - Adding a new adapter requires only an entry in the registry string dict (or a full dotted path directly in team.yaml) — no changes to TeamRunner. - Third-party / custom adapters work out of the box: set e.g. adapters.llm: mypackage.llm.openai:OpenAIAdapter in team.yaml. - The runner no longer hard-wires knowledge of which concrete classes exist. Addresses tandrewng review comment on PR #1.
fix: derive LLM provider from adapter, not config
2026-03-16 00:30:28 -04:00 · 2026-03-15 23:47:52 -04:00 · 2026-03-15 21:43:01 -04:00 · 2026-03-15 21:40:05 -04:00 · 2026-03-15 18:55:57 -04:00 · 2026-03-15 03:15:37 -04:00
12 changed files with 660 additions and 3541 deletions
@@ -1,3 +1,3 @@
 [submodule "agents"]
 	path = agents
-	url = https://git.tandrewng.com/cw-hans/agency-agents.git
+	url = https://github.com/coding-with-hans-heinemann/agency-agents.git
@@ -1,48 +0,0 @@
-# CLAUDE.md — Agent Quick Reference
-
-Read this before exploring the codebase. It saves tokens.
-
-## What This Is
-
-A tiered multi-agent orchestration framework. T1 decomposes goals → T2 architects → T3 leads → T4 implements → T5 verifies. SQLite blackboard tracks state. All external dependencies (LLM, VCS, notify, runtime) are pluggable adapters.
-
-## Key Docs
-
- `docs/design.md` — architecture decisions, tier design, key choices
- `docs/buildspec.md` — 15-step build order, phase breakdown
-
-## Project Layout
-
-```
-core/           — task_brief.py, blackboard.py, escalation.py, team_runner.py
-adapters/base/  — abstract base classes (LLMAdapter, VCSAdapter, NotifyAdapter, RuntimeAdapter)
-adapters/llm/   — anthropic.py
-adapters/vcs/   — github.py
-adapters/notify/— openclaw.py
-adapters/runtime— openclaw.py, claude_code.py
-prompts/        — T1–T5 system prompt .md files
-config/         — team.yaml (run config), role_registry.yaml (tier→role→persona)
-agents/         — git submodule, agent persona .md files
-runs/           — per-run blackboard.db files (gitignored)
-```
-
-## Conventions
-
- **Never commit or push directly to `main`** — always branch (`hans/...` or `feature/...`) and PR
- New adapters: subclass the relevant `adapters/base/*.py` abstract class
- New roles: add persona `.md` to `agents/` submodule + entry in `config/role_registry.yaml`
- Failure handling lives in `core/escalation.py` — extend `FailureType` there
- `TaskBrief` is the canonical work unit — all tiers pass briefs to each other
- Blackboard is the single source of truth per run — always write events there
-
-## Current State
-
-Phase 2 adapter implementations exist. `core/team_runner.py` may still have stubs — check before assuming it's wired up end-to-end.
-
-## Running
-
-```bash
-python -m venv .venv && source .venv/bin/activate
-pip install -r requirements.txt
-python -m core.team_runner --config config/team.yaml
-```
@@ -10,6 +10,8 @@ from __future__ import annotations

 import os

+import anthropic
+
 from adapters.base.llm import LLMAdapter


@@ -55,14 +57,6 @@ class AnthropicAdapter(LLMAdapter):
        ValueError
            If ANTHROPIC_API_KEY is not set in the environment.
        """
-        try:
-            import anthropic as _anthropic
-        except ModuleNotFoundError as exc:
-            raise ImportError(
-                "The 'anthropic' package is required for AnthropicAdapter. "
-                "Install it with: pip install anthropic"
-            ) from exc
-
        self._config = config
        api_key = os.environ.get("ANTHROPIC_API_KEY")
        if not api_key:
@@ -70,7 +64,7 @@ class AnthropicAdapter(LLMAdapter):
                "ANTHROPIC_API_KEY environment variable is not set. "
                "Export it before running the-agency."
            )
-        self._client = _anthropic.Anthropic(api_key=api_key)
+        self._client = anthropic.Anthropic(api_key=api_key)
        self._models_cfg: dict = config.get("models", {})
        self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
        self._default_temperature: float = self._models_cfg.get("default_temperature", 0)
@@ -1,576 +0,0 @@
-"""
-cli/agency.py
-Command-line interface for the-agency pipeline.
-
-Subcommands
-----------
-run     <config.yaml>                 Start a new run, print run_id.
-watch   <run_id>                      Tail live blackboard events.
-inspect <run_id> [--tier T] [--brief B]  Show run tree / artifact detail.
-approve <run_id> [--note "..."]       Approve current inspection gate.
-reject  <run_id> --reason "..."       Reject current gate (re-invoke tier).
-pause   <run_id>                      Force-pause at next tier boundary.
-resume  <run_id>                      Release a manual pause.
-
-Gate approval UX
----------------
-`agency approve <run_id>` writes a gate_approved event directly to the
-blackboard.  The runner only polls the blackboard — it does not care how
-the event got there.  This makes approval work on any platform that has
-filesystem access to the runs/ directory.
-"""
-from __future__ import annotations
-
-import argparse
-import json
-import os
-import sys
-import time
-from datetime import datetime, timezone
-from pathlib import Path
-from typing import Optional
-
-# ---------------------------------------------------------------------------
-# Blackboard import (optional — degrade gracefully if core not on sys.path)
-# ---------------------------------------------------------------------------
-try:
-    from core.blackboard import Blackboard
-    _HAS_BLACKBOARD = True
-except ImportError:
-    _HAS_BLACKBOARD = False
-
-
-# ---------------------------------------------------------------------------
-# ANSI colours (degraded to no-op if not a TTY)
-# ---------------------------------------------------------------------------
-_IS_TTY = sys.stdout.isatty()
-
-
-def _c(code: str, text: str) -> str:
-    if not _IS_TTY:
-        return text
-    return f"\033[{code}m{text}\033[0m"
-
-
-def _bold(t: str) -> str:
-    return _c("1", t)
-
-
-def _dim(t: str) -> str:
-    return _c("2", t)
-
-
-def _green(t: str) -> str:
-    return _c("32", t)
-
-
-def _yellow(t: str) -> str:
-    return _c("33", t)
-
-
-def _red(t: str) -> str:
-    return _c("31", t)
-
-
-def _cyan(t: str) -> str:
-    return _c("36", t)
-
-
-def _magenta(t: str) -> str:
-    return _c("35", t)
-
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-def _now_iso() -> str:
-    return datetime.now(timezone.utc).isoformat()
-
-
-def _require_blackboard(run_id: str) -> "Blackboard":
-    if not _HAS_BLACKBOARD:
-        _die("Could not import core.blackboard.  Make sure you are running from the project root.")
-    db_path = Path("runs") / run_id / "blackboard.db"
-    if not db_path.exists():
-        _die(f"No blackboard found for run_id={run_id!r}.  Expected: {db_path}")
-    return Blackboard(run_id)
-
-
-def _die(msg: str) -> None:
-    print(_red(f"Error: {msg}"), file=sys.stderr)
-    sys.exit(1)
-
-
-def _fmt_ts(iso: Optional[str]) -> str:
-    if not iso:
-        return ""
-    try:
-        dt = datetime.fromisoformat(iso)
-        return dt.strftime("%H:%M:%S")
-    except ValueError:
-        return iso[:19]
-
-
-def _parse_detail(raw: Optional[str]) -> dict:
-    if not raw:
-        return {}
-    try:
-        return json.loads(raw)
-    except (json.JSONDecodeError, TypeError):
-        return {"raw": raw}
-
-
-# ---------------------------------------------------------------------------
-# Event rendering
-# ---------------------------------------------------------------------------
-
-_KIND_SYMBOLS: dict[str, str] = {
-    "spawned":        "→",
-    "completed":      "✓",
-    "failed":         "✗",
-    "escalated":      "↑",
-    "retried":        "↺",
-    "gate_pending":   "⏸",
-    "gate_approved":  "✓",
-    "gate_rejected":  "✗",
-    "gate_paused":    "⏸",
-    "gate_resumed":   "▶",
-    "path_amendment": "~",
-    "log":            " ",
-}
-
-_KIND_COLOUR: dict[str, str] = {
-    "spawned":        "36",   # cyan
-    "completed":      "32",   # green
-    "failed":         "31",   # red
-    "escalated":      "33",   # yellow
-    "retried":        "33",   # yellow
-    "gate_pending":   "35",   # magenta
-    "gate_approved":  "32",   # green
-    "gate_rejected":  "31",   # red
-    "gate_paused":    "35",   # magenta
-    "gate_resumed":   "32",   # green
-    "path_amendment": "33",   # yellow
-    "log":            "0",    # default
-}
-
-
-def _render_event(ev: dict, run_id: str) -> str:
-    kind = ev.get("kind", "")
-    ts = _fmt_ts(ev.get("created_at"))
-    detail = _parse_detail(ev.get("detail"))
-    sym = _KIND_SYMBOLS.get(kind, "·")
-    col = _KIND_COLOUR.get(kind, "0")
-    kind_str = _c(col, f"{sym} {kind:<18}")
-
-    # Build a short message from detail
-    msg_parts: list[str] = []
-
-    if kind == "log":
-        level = detail.get("level", "info")
-        message = detail.get("message", "")
-        level_col = "33" if level == "warning" else ("31" if level == "error" else "0")
-        msg_parts.append(_c(level_col, message))
-    elif kind in ("gate_pending", "gate_approved", "gate_rejected"):
-        gate = detail.get("gate", "")
-        summary = detail.get("summary", "")
-        reason = detail.get("reason", "")
-        if gate:
-            msg_parts.append(_bold(f"[{gate}]"))
-        if summary:
-            msg_parts.append(summary)
-        if reason:
-            msg_parts.append(_dim(f"({reason})"))
-    elif kind in ("spawned", "completed", "failed", "escalated", "retried"):
-        tier = detail.get("tier")
-        role = detail.get("role", "")
-        ws = detail.get("workstream", "")
-        task_id = detail.get("task_id", "")
-        reason = detail.get("reason", detail.get("error", ""))
-        if tier:
-            msg_parts.append(_bold(f"T{tier}"))
-        if role:
-            msg_parts.append(role)
-        if ws:
-            msg_parts.append(_dim(f"ws={ws}"))
-        if task_id:
-            msg_parts.append(_dim(f"task={task_id}"))
-        if reason:
-            msg_parts.append(_dim(f"— {reason[:80]}"))
-    elif kind == "path_amendment":
-        proposed_by = detail.get("proposed_by", "")
-        reason = detail.get("reason", "")
-        msg_parts.append(f"{proposed_by}: {reason}")
-    else:
-        for k, v in list(detail.items())[:3]:
-            msg_parts.append(f"{k}={v!r}")
-
-    msg = " ".join(msg_parts)
-    run_prefix = _dim(f"[{run_id}]")
-    ts_str = _dim(ts)
-    return f"{run_prefix} {ts_str}  {kind_str}  {msg}"
-
-
-# ---------------------------------------------------------------------------
-# Subcommand: run
-# ---------------------------------------------------------------------------
-
-def cmd_run(args: argparse.Namespace) -> None:
-    """Start a new pipeline run."""
-    config_path = args.config
-    if not os.path.exists(config_path):
-        _die(f"Config file not found: {config_path}")
-
-    # Import here to keep startup fast for non-run commands
-    try:
-        from core.team_runner import TeamRunner
-    except ImportError as exc:
-        _die(f"Could not import core.team_runner: {exc}")
-
-    dry = getattr(args, "dry_run", False)
-    runner = TeamRunner(config_path=config_path, dry_run=dry)
-    print(f"Starting run {_bold(runner.run_id)} …")
-    print(_dim(f"  Watch:   agency watch {runner.run_id}"))
-    print(_dim(f"  Inspect: agency inspect {runner.run_id}"))
-
-    try:
-        runner.run()
-        print(_green(f"Run {runner.run_id} complete."))
-    except KeyboardInterrupt:
-        print(_yellow(f"\nRun {runner.run_id} interrupted."))
-        sys.exit(1)
-    except Exception as exc:
-        print(_red(f"Run {runner.run_id} failed: {exc}"))
-        sys.exit(1)
-
-
-# ---------------------------------------------------------------------------
-# Subcommand: watch
-# ---------------------------------------------------------------------------
-
-def cmd_watch(args: argparse.Namespace) -> None:
-    """Tail live blackboard events for a run."""
-    bb = _require_blackboard(args.run_id)
-    run_id = args.run_id
-    poll = getattr(args, "poll", 2.0)
-
-    print(_bold(f"Watching run {run_id} …"), _dim("(Ctrl-C to stop)"))
-
-    seen_ids: set[str] = set()
-    try:
-        while True:
-            events = bb.get_all_events(limit=1000)
-            for ev in events:
-                eid = ev.get("event_id", "")
-                if eid in seen_ids:
-                    continue
-                seen_ids.add(eid)
-                print(_render_event(ev, run_id))
-                sys.stdout.flush()
-
-            # Check if run is done
-            summary = bb.get_run_summary()
-            run_status = summary.get("status", "")
-            if run_status in ("done", "review", "failed"):
-                print()
-                if run_status == "review":
-                    print(_green(f"Run {run_id} complete — status: {run_status}"))
-                elif run_status == "failed":
-                    print(_red(f"Run {run_id} failed"))
-                else:
-                    print(_bold(f"Run {run_id} status: {run_status}"))
-                break
-
-            time.sleep(poll)
-    except KeyboardInterrupt:
-        print(_dim("\nStopped watching."))
-    finally:
-        bb.close()
-
-
-# ---------------------------------------------------------------------------
-# Subcommand: inspect
-# ---------------------------------------------------------------------------
-
-def cmd_inspect(args: argparse.Namespace) -> None:
-    """Show a live tree of run state."""
-    bb = _require_blackboard(args.run_id)
-    run_id = args.run_id
-    tier_filter: Optional[int] = getattr(args, "tier", None)
-    brief_filter: Optional[str] = getattr(args, "brief", None)
-
-    try:
-        summary = bb.get_run_summary()
-        if "error" in summary:
-            _die(summary["error"])
-
-        if brief_filter:
-            _inspect_brief(bb, run_id, brief_filter)
-            return
-
-        if tier_filter:
-            _inspect_tier(bb, run_id, tier_filter)
-            return
-
-        _inspect_run_tree(bb, run_id, summary)
-    finally:
-        bb.close()
-
-
-def _inspect_run_tree(bb: "Blackboard", run_id: str, summary: dict) -> None:
-    status = summary.get("status", "?")
-    status_str = (
-        _green(status) if status in ("done", "review")
-        else _red(status) if status == "failed"
-        else _yellow(status)
-    )
-    print(f"\nRun {_bold(run_id)}  [{status_str}]")
-    print(_dim(f"  Goal: {summary.get('goal', '')}"))
-    print()
-
-    workstreams = bb.get_workstreams()
-    if not workstreams:
-        print(_dim("  No workstreams yet."))
-    else:
-        for ws in workstreams:
-            ws_status = ws.get("status", "?")
-            ws_col = "32" if ws_status == "done" else ("31" if ws_status == "failed" else "33")
-            ws_line = f"  ├── {ws.get('name', ws.get('workstream_id'))} [{_c(ws_col, ws_status)}]"
-            print(ws_line)
-
-            briefs = bb.get_briefs(workstream_id=ws["workstream_id"])
-            for b in briefs:
-                b_status = b.get("status", "?")
-                b_col = "32" if b_status == "done" else ("31" if b_status == "failed" else "0")
-                print(
-                    f"  │   ├── T{b.get('tier')} {b.get('role')} "
-                    f"[{_c(b_col, b_status)}] "
-                    f"retries={b.get('retry_count', 0)} "
-                    f"{_dim(b.get('brief_id', '')[:8])}"
-                )
-
-    print()
-    # Summary counts
-    briefs_summary = summary.get("briefs", {})
-    events_summary = summary.get("events", {})
-    print(
-        _dim(
-            f"  Briefs: {briefs_summary}  "
-            f"Events: {events_summary}"
-        )
-    )
-
-
-def _inspect_tier(bb: "Blackboard", run_id: str, tier: int) -> None:
-    briefs = bb.get_briefs(tier=tier)
-    print(f"\nRun {_bold(run_id)} — T{tier} briefs ({len(briefs)})\n")
-    for b in briefs:
-        status = b.get("status", "?")
-        col = "32" if status == "done" else ("31" if status == "failed" else "0")
-        print(
-            f"  {_dim(b.get('brief_id', '')[:8])}  "
-            f"{b.get('role', ''):<22}  [{_c(col, status)}]  "
-            f"retries={b.get('retry_count', 0)}"
-        )
-
-
-def _inspect_brief(bb: "Blackboard", run_id: str, brief_id: str) -> None:
-    briefs = bb.get_briefs()
-    match = next(
-        (b for b in briefs if b.get("brief_id", "").startswith(brief_id)),
-        None,
-    )
-    if not match:
-        _die(f"Brief {brief_id!r} not found in run {run_id}")
-
-    print(f"\nBrief {_bold(match['brief_id'])}")
-    print(f"  Tier:    T{match.get('tier')}")
-    print(f"  Role:    {match.get('role')}")
-    print(f"  Status:  {match.get('status')}")
-    print(f"  Retries: {match.get('retry_count', 0)}")
-    print()
-
-    payload_raw = match.get("payload")
-    if payload_raw:
-        try:
-            payload = json.loads(payload_raw)
-            print(_bold("Payload (brief):"))
-            print(json.dumps(payload, indent=2))
-        except (json.JSONDecodeError, TypeError):
-            print(payload_raw)
-        print()
-
-    result_raw = match.get("result")
-    if result_raw:
-        try:
-            result = json.loads(result_raw)
-            print(_bold("Result:"))
-            print(json.dumps(result, indent=2))
-        except (json.JSONDecodeError, TypeError):
-            print(result_raw)
-
-
-# ---------------------------------------------------------------------------
-# Subcommand: approve
-# ---------------------------------------------------------------------------
-
-def cmd_approve(args: argparse.Namespace) -> None:
-    """Approve the current inspection gate, writing gate_approved to the blackboard."""
-    bb = _require_blackboard(args.run_id)
-    note = getattr(args, "note", None) or ""
-    try:
-        bb.log_event(
-            "gate_approved",
-            detail={"approved_by": "cli", "note": note, "timestamp": _now_iso()},
-        )
-        print(_green(f"Gate approved for run {args.run_id}."))
-        if note:
-            print(_dim(f"  Note: {note}"))
-    finally:
-        bb.close()
-
-
-# ---------------------------------------------------------------------------
-# Subcommand: reject
-# ---------------------------------------------------------------------------
-
-def cmd_reject(args: argparse.Namespace) -> None:
-    """Reject the current inspection gate, writing gate_rejected to the blackboard."""
-    bb = _require_blackboard(args.run_id)
-    reason = getattr(args, "reason", None) or "rejected via CLI"
-    try:
-        bb.log_event(
-            "gate_rejected",
-            detail={"rejected_by": "cli", "reason": reason, "timestamp": _now_iso()},
-        )
-        print(_yellow(f"Gate rejected for run {args.run_id}."))
-        print(_dim(f"  Reason: {reason}"))
-    finally:
-        bb.close()
-
-
-# ---------------------------------------------------------------------------
-# Subcommand: pause
-# ---------------------------------------------------------------------------
-
-def cmd_pause(args: argparse.Namespace) -> None:
-    """Force-pause the run at the next tier boundary."""
-    bb = _require_blackboard(args.run_id)
-    try:
-        bb.log_event(
-            "gate_paused",
-            detail={"paused_by": "cli", "timestamp": _now_iso()},
-        )
-        print(_yellow(f"Pause signal written for run {args.run_id}."))
-        print(_dim(f"  Run will pause at the next tier boundary."))
-        print(_dim(f"  To resume: agency resume {args.run_id}"))
-    finally:
-        bb.close()
-
-
-# ---------------------------------------------------------------------------
-# Subcommand: resume
-# ---------------------------------------------------------------------------
-
-def cmd_resume(args: argparse.Namespace) -> None:
-    """Release a manual pause."""
-    bb = _require_blackboard(args.run_id)
-    try:
-        bb.log_event(
-            "gate_resumed",
-            detail={"resumed_by": "cli", "timestamp": _now_iso()},
-        )
-        print(_green(f"Resume signal written for run {args.run_id}."))
-    finally:
-        bb.close()
-
-
-# ---------------------------------------------------------------------------
-# Argument parser
-# ---------------------------------------------------------------------------
-
-def build_parser() -> argparse.ArgumentParser:
-    parser = argparse.ArgumentParser(
-        prog="agency",
-        description="the-agency pipeline CLI",
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-        epilog="""
-Examples:
-  agency run config/team.yaml
-  agency watch abc12345
-  agency inspect abc12345
-  agency inspect abc12345 --tier 2
-  agency inspect abc12345 --brief a1b2c3d4
-  agency approve abc12345
-  agency approve abc12345 --note "looks good"
-  agency reject  abc12345 --reason "T2 missed the caching layer"
-  agency pause   abc12345
-  agency resume  abc12345
-""",
-    )
-    sub = parser.add_subparsers(dest="command", metavar="<command>")
-    sub.required = True
-
-    # run
-    p_run = sub.add_parser("run", help="Start a new pipeline run")
-    p_run.add_argument("config", nargs="?", default="config/team.yaml",
-                       help="Path to team.yaml (default: config/team.yaml)")
-    p_run.add_argument("--dry-run", action="store_true",
-                       help="Log actions without spawning agents")
-    p_run.set_defaults(func=cmd_run)
-
-    # watch
-    p_watch = sub.add_parser("watch", help="Tail live blackboard events")
-    p_watch.add_argument("run_id", help="Run ID to watch")
-    p_watch.add_argument("--poll", type=float, default=2.0,
-                         help="Poll interval in seconds (default: 2)")
-    p_watch.set_defaults(func=cmd_watch)
-
-    # inspect
-    p_inspect = sub.add_parser("inspect", help="Show run state tree")
-    p_inspect.add_argument("run_id", help="Run ID to inspect")
-    p_inspect.add_argument("--tier", type=int, default=None,
-                           help="Filter to a specific tier (e.g. --tier 2)")
-    p_inspect.add_argument("--brief", default=None,
-                           help="Show full brief+result for brief_id prefix")
-    p_inspect.set_defaults(func=cmd_inspect)
-
-    # approve
-    p_approve = sub.add_parser("approve", help="Approve current inspection gate")
-    p_approve.add_argument("run_id", help="Run ID")
-    p_approve.add_argument("--note", default="", help="Optional note written to blackboard")
-    p_approve.set_defaults(func=cmd_approve)
-
-    # reject
-    p_reject = sub.add_parser("reject", help="Reject current inspection gate")
-    p_reject.add_argument("run_id", help="Run ID")
-    p_reject.add_argument("--reason", default="rejected via CLI",
-                          help="Reason for rejection (shown in blackboard + logs)")
-    p_reject.set_defaults(func=cmd_reject)
-
-    # pause
-    p_pause = sub.add_parser("pause", help="Force-pause at next tier boundary")
-    p_pause.add_argument("run_id", help="Run ID")
-    p_pause.set_defaults(func=cmd_pause)
-
-    # resume
-    p_resume = sub.add_parser("resume", help="Release a manual pause")
-    p_resume.add_argument("run_id", help="Run ID")
-    p_resume.set_defaults(func=cmd_resume)
-
-    return parser
-
-
-# ---------------------------------------------------------------------------
-# Entry point
-# ---------------------------------------------------------------------------
-
-def main(argv: Optional[list[str]] = None) -> None:
-    parser = build_parser()
-    args = parser.parse_args(argv)
-    args.func(args)
-
-
-if __name__ == "__main__":
-    main()
@@ -2,40 +2,28 @@ t1:
  default: agents/strategy/nexus-strategy.md

 t2:
-  backend:  agents/engineering/engineering-backend-architect.md
-  frontend: agents/engineering/engineering-frontend-architect.md
+  backend:  agents/engineering/engineering-software-architect.md
+  frontend: agents/engineering/engineering-software-architect.md
  infra:    agents/engineering/engineering-devops-automator.md
  data:     agents/engineering/engineering-data-engineer.md
-  ai:       agents/engineering/engineering-software-architect.md
-  security: agents/engineering/engineering-security-engineer.md
-  mobile:   agents/engineering/engineering-software-architect.md
  default:  agents/engineering/engineering-software-architect.md

 t3:
-  backend:  agents/engineering/engineering-senior-backend-developer.md
-  frontend: agents/engineering/engineering-senior-frontend-developer.md
+  backend:  agents/engineering/engineering-senior-developer.md
+  frontend: agents/engineering/engineering-senior-developer.md
  infra:    agents/engineering/engineering-sre.md
-  data:     agents/engineering/engineering-data-engineer.md
-  ai:       agents/engineering/engineering-ai-engineer.md
-  security: agents/engineering/engineering-security-engineer.md
-  mobile:   agents/engineering/engineering-mobile-app-builder.md
-  database: agents/engineering/engineering-database-optimizer.md
-  devops:   agents/engineering/engineering-sre.md
-  docs:     agents/engineering/engineering-technical-writer.md
-  default:  agents/engineering/engineering-backend-developer.md
+  default:  agents/engineering/engineering-senior-developer.md

 t4:
  frontend:  agents/engineering/engineering-frontend-developer.md
-  backend:   agents/engineering/engineering-backend-developer.md
+  backend:   agents/engineering/engineering-backend-architect.md
  database:  agents/engineering/engineering-database-optimizer.md
  devops:    agents/engineering/engineering-devops-automator.md
  mobile:    agents/engineering/engineering-mobile-app-builder.md
  ai:        agents/engineering/engineering-ai-engineer.md
  security:  agents/engineering/engineering-security-engineer.md
  docs:      agents/engineering/engineering-technical-writer.md
-  data:      agents/engineering/engineering-data-engineer.md
-  embedded:  agents/engineering/engineering-embedded-firmware-engineer.md
-  default:   agents/engineering/engineering-backend-developer.md
+  default:   agents/engineering/engineering-senior-developer.md

 t5:
  code:        agents/engineering/engineering-code-reviewer.md
@@ -43,8 +31,4 @@ t5:
  api:         agents/testing/testing-api-tester.md
  performance: agents/testing/testing-performance-benchmarker.md
  security:    agents/engineering/engineering-security-engineer.md
-  accessibility: agents/testing/testing-accessibility-auditor.md
-  e2e:           agents/testing/testing-evidence-collector.md
-  frontend:      agents/testing/testing-accessibility-auditor.md
-  data:          agents/testing/testing-reality-checker.md
  default:     agents/engineering/engineering-code-reviewer.md
@@ -10,7 +10,8 @@ adapters:
  runtime: openclaw

 models:
-  provider: anthropic
+  default_max_tokens: 4096
+  default_temperature: 0
  capability_map:
    reasoning-heavy:
      anthropic: claude-opus-4-6
@@ -85,37 +85,18 @@ CREATE TABLE IF NOT EXISTS events (
    event_id    TEXT PRIMARY KEY,
    run_id      TEXT NOT NULL,
    brief_id    TEXT,             -- NULL for run-level events
-    kind        TEXT NOT NULL,    -- see _EVENT_KINDS
+    kind        TEXT NOT NULL,    -- spawned|completed|failed|escalated|retried
    detail      TEXT,             -- JSON
    created_at  TEXT NOT NULL,
    FOREIGN KEY (run_id) REFERENCES runs(run_id)
 );
-
-CREATE TABLE IF NOT EXISTS t3_task_lists (
-    entry_id       TEXT PRIMARY KEY,
-    run_id         TEXT NOT NULL,
-    workstream_id  TEXT NOT NULL,
-    t3_agent_id    TEXT NOT NULL,
-    status         TEXT NOT NULL DEFAULT 'draft',  -- draft|committed
-    tasks          TEXT NOT NULL DEFAULT '[]',     -- JSON array of T4 task descriptors
-    created_at     TEXT NOT NULL,
-    updated_at     TEXT NOT NULL,
-    FOREIGN KEY (run_id) REFERENCES runs(run_id)
-);
 """

 # Valid status values per table — used for input validation.
 _RUN_STATUSES = {"pending", "active", "review", "done", "failed"}
 _WS_STATUSES = {"pending", "active", "blocked", "done", "failed"}
 _BRIEF_STATUSES = {"pending", "active", "done", "failed"}
-_EVENT_KINDS = {
-    # Lifecycle
-    "spawned", "completed", "failed", "escalated", "retried",
-    # Visibility / gates
-    "gate_pending", "gate_approved", "gate_rejected", "gate_paused", "gate_resumed",
-    # Amendments / informational
-    "path_amendment", "log",
-}
+_EVENT_KINDS = {"spawned", "completed", "failed", "escalated", "retried"}


 # ---------------------------------------------------------------------------
@@ -379,194 +360,6 @@ class Blackboard:
    # Cleanup
    # ------------------------------------------------------------------

-    # ------------------------------------------------------------------
-    # Event queries
-    # ------------------------------------------------------------------
-
-    def get_events(
-        self,
-        kinds: Optional[list[str]] = None,
-        after_iso: Optional[str] = None,
-        brief_id: Optional[str] = None,
-        limit: int = 100,
-    ) -> list[dict[str, Any]]:
-        """
-        Query events for this run.
-
-        Parameters
-        ----------
-        kinds     : Filter by event kinds (OR).  None = all kinds.
-        after_iso : Only return events created after this ISO-8601 timestamp.
-        brief_id  : Filter by brief_id.  None = all briefs.
-        limit     : Maximum rows to return (most recent first).
-        """
-        conditions = ["run_id = ?"]
-        params: list[Any] = [self.run_id]
-
-        if kinds:
-            placeholders = ",".join("?" * len(kinds))
-            conditions.append(f"kind IN ({placeholders})")
-            params.extend(kinds)
-
-        if after_iso:
-            conditions.append("created_at > ?")
-            params.append(after_iso)
-
-        if brief_id:
-            conditions.append("brief_id = ?")
-            params.append(brief_id)
-
-        where = " AND ".join(conditions)
-        rows = self._execute(
-            f"SELECT * FROM events WHERE {where} ORDER BY created_at DESC LIMIT ?",
-            (*params, limit),
-        ).fetchall()
-        return [dict(r) for r in rows]
-
-    def get_latest_gate_event(
-        self, gate_name: str, after_iso: Optional[str] = None
-    ) -> Optional[dict[str, Any]]:
-        """
-        Return the most recent gate_approved or gate_rejected event for
-        *gate_name* written after *after_iso*.
-
-        The event detail JSON is expected to contain a ``"gate"`` field
-        matching *gate_name*.  Falls back to returning any gate resolution
-        event if none carry an explicit gate field (for CLI-written events
-        that omit it).
-        """
-        events = self.get_events(
-            kinds=["gate_approved", "gate_rejected"],
-            after_iso=after_iso,
-            limit=20,
-        )
-        # Prefer events whose detail.gate matches
-        for ev in events:
-            try:
-                detail = json.loads(ev.get("detail") or "{}")
-                if detail.get("gate") == gate_name or not detail.get("gate"):
-                    return ev
-            except (json.JSONDecodeError, TypeError):
-                return ev
-        return None
-
-    def get_all_events(self, limit: int = 500) -> list[dict[str, Any]]:
-        """Return all events for this run, oldest first."""
-        rows = self._execute(
-            "SELECT * FROM events WHERE run_id=? ORDER BY created_at ASC LIMIT ?",
-            (self.run_id, limit),
-        ).fetchall()
-        return [dict(r) for r in rows]
-
-    # ------------------------------------------------------------------
-    # T3 task lists
-    # ------------------------------------------------------------------
-
-    def create_t3_draft(
-        self,
-        *,
-        workstream_id: str,
-        t3_agent_id: str,
-    ) -> str:
-        """Insert a draft t3_task_list entry.  Returns entry_id."""
-        entry_id = _new_uuid()
-        now = _now_iso()
-        self._execute(
-            "INSERT OR IGNORE INTO t3_task_lists "
-            "(entry_id, run_id, workstream_id, t3_agent_id, status, tasks, created_at, updated_at) "
-            "VALUES (?, ?, ?, ?, 'draft', '[]', ?, ?)",
-            (entry_id, self.run_id, workstream_id, t3_agent_id, now, now),
-            commit=True,
-        )
-        return entry_id
-
-    def commit_t3_task_list(
-        self,
-        *,
-        workstream_id: str,
-        t3_agent_id: str,
-        tasks: list[Any],
-    ) -> None:
-        """Update a t3_task_list entry to committed with the final task list."""
-        now = _now_iso()
-        tasks_json = json.dumps(tasks)
-        self._execute(
-            "UPDATE t3_task_lists SET status='committed', tasks=?, updated_at=? "
-            "WHERE run_id=? AND workstream_id=? AND t3_agent_id=?",
-            (tasks_json, now, self.run_id, workstream_id, t3_agent_id),
-            commit=True,
-        )
-
-    def get_t3_task_lists(self, workstream_id: str) -> list[dict[str, Any]]:
-        """Return all t3_task_list entries for a workstream."""
-        rows = self._execute(
-            "SELECT * FROM t3_task_lists WHERE run_id=? AND workstream_id=? ORDER BY created_at ASC",
-            (self.run_id, workstream_id),
-        ).fetchall()
-        result = []
-        for r in rows:
-            d = dict(r)
-            try:
-                d["tasks"] = json.loads(d.get("tasks") or "[]")
-            except (json.JSONDecodeError, TypeError):
-                d["tasks"] = []
-            result.append(d)
-        return result
-
-    def all_t3_committed(self, workstream_id: str) -> bool:
-        """Return True if all t3_task_list entries for the workstream are committed."""
-        rows = self._execute(
-            "SELECT status FROM t3_task_lists WHERE run_id=? AND workstream_id=?",
-            (self.run_id, workstream_id),
-        ).fetchall()
-        if not rows:
-            return False
-        return all(r["status"] == "committed" for r in rows)
-
-    # ------------------------------------------------------------------
-    # Briefs query
-    # ------------------------------------------------------------------
-
-    def get_briefs(
-        self,
-        *,
-        status: Optional[str] = None,
-        tier: Optional[int] = None,
-        workstream_id: Optional[str] = None,
-    ) -> list[dict[str, Any]]:
-        """Query briefs with optional filters."""
-        conditions = ["run_id = ?"]
-        params: list[Any] = [self.run_id]
-
-        if status:
-            conditions.append("status = ?")
-            params.append(status)
-        if tier is not None:
-            conditions.append("tier = ?")
-            params.append(tier)
-        if workstream_id:
-            conditions.append("workstream_id = ?")
-            params.append(workstream_id)
-
-        where = " AND ".join(conditions)
-        rows = self._execute(
-            f"SELECT * FROM briefs WHERE {where} ORDER BY created_at ASC",
-            tuple(params),
-        ).fetchall()
-        return [dict(r) for r in rows]
-
-    def get_workstreams(self) -> list[dict[str, Any]]:
-        """Return all workstreams for this run."""
-        rows = self._execute(
-            "SELECT * FROM workstreams WHERE run_id=? ORDER BY created_at ASC",
-            (self.run_id,),
-        ).fetchall()
-        return [dict(r) for r in rows]
-
-    # ------------------------------------------------------------------
-    # Cleanup
-    # ------------------------------------------------------------------
-
    def close(self) -> None:
        """Close the database connection gracefully."""
        with self._lock:
@@ -1,507 +0,0 @@
-# Tiered Agent Team System — Build Spec
-
-_Started: 2026-03-15. Last updated: 2026-03-30._
-_See design.md for the design doc and decisions log._
-
---
-
-## Language & Runtime
-
-**Python 3.11+.** Reasons:
- Agent/AI tooling is Python-first
- Clean type hints + dataclasses for schemas
- Agents can read and modify their own orchestration code
- Runs anywhere — no Node, no OpenClaw dependency
-
---
-
-## Repository
-
-Standalone repo: `[email protected]:coding-with-hans-heinemann/the-agency.git`
-
-Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
-
---
-
-## Directory Structure
-
-```
-agent-teams/
-├── core/
-│   ├── team_runner.py       — run lifecycle, agent spawning
-│   ├── blackboard.py        — SQLite coordination state
-│   ├── task_brief.py        — schema + validation
-│   └── escalation.py        — retry logic, failure routing
-│
-├── adapters/
-│   ├── base/
-│   │   ├── llm.py           — abstract LLM interface
-│   │   ├── vcs.py           — abstract VCS interface
-│   │   ├── notify.py        — abstract notification interface
-│   │   └── runtime.py       — abstract agent runtime interface
-│   ├── llm/
-│   │   ├── anthropic.py     — Claude via direct Anthropic API
-│   │   ├── openai.py        — GPT / o-series
-│   │   └── ollama.py        — local models
-│   ├── vcs/
-│   │   └── github.py
-│   ├── notify/
-│   │   └── openclaw.py      — messages Hans who notifies Andrew
-│   └── runtime/
-│       ├── openclaw.py      — sessions_spawn (general purpose)
-│       └── claude_code.py   — coding agent runtime (file/git/exec tools)
-│
-├── agents/                  — git submodule: msitarzewski/agency-agents
-│   ├── engineering/
-│   ├── testing/
-│   ├── strategy/
-│   └── ...                  — full agency-agents roster
-│
-├── prompts/
-│   ├── t1_visionary.md      — fallback if no agent_personality set
-│   ├── t2_architect.md
-│   ├── t3_squad_lead.md
-│   ├── t4_implementer.md
-│   └── t5_verifier.md
-│
-├── config/
-│   ├── team.yaml            — example run configuration
-│   └── role_registry.yaml   — maps (tier, domain) → agent personality file
-│
-├── cli/
-│   └── agency.py            — run, watch, inspect, approve, reject, pause, resume
-│
-├── runs/                    — runtime state, one subdir per run_id
-│   └── .gitkeep
-│
-└── README.md
-```
-
---
-
-## Blackboard
-
-SQLite. One file per run at `runs/<run_id>/blackboard.db`.
-
-### Tables
-
-**runs**
-```sql
-CREATE TABLE runs (
-    run_id      TEXT PRIMARY KEY,
-    goal        TEXT NOT NULL,
-    status      TEXT NOT NULL,  -- pending | active | review | done | failed
-    created_at  TEXT NOT NULL,
-    updated_at  TEXT NOT NULL
-);
-```
-
-**workstreams**
-```sql
-CREATE TABLE workstreams (
-    workstream_id   TEXT PRIMARY KEY,
-    run_id          TEXT NOT NULL,
-    name            TEXT NOT NULL,
-    tier            INTEGER NOT NULL,
-    status          TEXT NOT NULL,  -- pending | active | blocked | done | failed
-    owner_agent_id  TEXT,
-    created_at      TEXT NOT NULL,
-    updated_at      TEXT NOT NULL
-);
-```
-
-**briefs**
-```sql
-CREATE TABLE briefs (
-    brief_id        TEXT PRIMARY KEY,
-    run_id          TEXT NOT NULL,
-    parent_brief_id TEXT,
-    workstream_id   TEXT,
-    tier            INTEGER NOT NULL,
-    role            TEXT NOT NULL,
-    status          TEXT NOT NULL,  -- pending | active | done | failed
-    payload         TEXT NOT NULL,  -- full JSON brief
-    result          TEXT,           -- JSON result when done
-    retry_count     INTEGER DEFAULT 0,
-    created_at      TEXT NOT NULL,
-    updated_at      TEXT NOT NULL
-);
-```
-
-**events**
-```sql
-CREATE TABLE events (
-    event_id    TEXT PRIMARY KEY,
-    run_id      TEXT NOT NULL,
-    brief_id    TEXT,
-    kind        TEXT NOT NULL,  -- see event vocabulary below
-    detail      TEXT,           -- JSON
-    created_at  TEXT NOT NULL
-);
-```
-
-**Event kind vocabulary:**
-```
-- lifecycle
-spawned | completed | failed | escalated | retried
-
-- visibility / gates
-gate_pending    -- runner hit an inspection gate, waiting for human
-gate_approved   -- human approved via CLI or notify
-gate_rejected   -- human rejected, tier re-invoked
-gate_paused     -- manual pause via CLI
-gate_resumed    -- manual resume via CLI
-
-- amendments / informational
-path_amendment  -- mid-run tier proposed a tier path change
-log             -- human-readable log line (detail: {level, message})
-```
-
-**t3_task_lists** *(T3 mesh coordination)*
-```sql
-CREATE TABLE t3_task_lists (
-    entry_id        TEXT PRIMARY KEY,
-    run_id          TEXT NOT NULL,
-    workstream_id   TEXT NOT NULL,
-    t3_agent_id     TEXT NOT NULL,
-    status          TEXT NOT NULL,  -- draft | committed
-    tasks           TEXT NOT NULL,  -- JSON array of proposed T4 task descriptors
-    created_at      TEXT NOT NULL,
-    updated_at      TEXT NOT NULL
-);
-```
-
---
-
-## Task Brief Schema
-
-Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
-
-```json
-{
-  "brief_id": "uuid",
-  "run_id": "uuid",
-  "parent_brief_id": "uuid | null",
-  "tier": 4,
-  "role": "implementer",
-  "goal_anchor": "Original T1 intent — always propagated unchanged",
-  "workstream": "backend-api",
-  "task": "Implement POST /webhooks/ingest endpoint",
-  "acceptance_criteria": [
-    "Accepts JSON payload",
-    "Returns 202 on success",
-    "Writes to queue"
-  ],
-  "constraints": [
-    "Use existing queue client in src/queue.py",
-    "No new dependencies"
-  ],
-  "context": {
-    "relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
-    "interface_contract": "..."
-  },
-  "retry_budget": 3,
-  "retry_count": 0,
-  "preferred_runtime": "coding_agent",
-  "agent_personality": "agents/engineering/engineering-code-reviewer.md",
-  "created_at": "ISO-8601"
-}
-```
-
-`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
-
-`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
-
-```
-```
-
---
-
-## Adapter Interfaces
-
-### LLM (`adapters/base/llm.py`)
-```python
-class LLMAdapter:
-    def complete(self, prompt: str, capability: str, context: dict) -> str
-    def resolve_model(self, capability: str) -> str
-    # capability: "reasoning-heavy" | "capable" | "fast-cheap"
-```
-
-### VCS (`adapters/base/vcs.py`)
-```python
-class VCSAdapter:
-    def create_branch(self, name: str) -> None
-    def commit(self, files: list[str], message: str) -> str       # returns commit sha
-    def create_pr(self, title: str, body: str, head: str, base: str) -> str  # returns pr url
-    def get_pr_status(self, pr_id: str) -> str                    # open | merged | closed
-```
-
-### Notify (`adapters/base/notify.py`)
-```python
-class NotifyAdapter:
-    def send(self, message: str, context: dict) -> None
-```
-
-### Runtime (`adapters/base/runtime.py`)
-```python
-class RuntimeAdapter:
-    def spawn(self, task: str, capability: str, context: dict) -> str  # returns agent_id
-    def get_result(self, agent_id: str, timeout_s: int) -> dict
-    def kill(self, agent_id: str) -> None
-
-# Two implementations:
-#   openclaw.py    — general purpose, uses sessions_spawn, suits T1/T2/T3
-#   claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
-#
-# The runner selects runtime based on brief.preferred_runtime:
-#   "standard"      → openclaw.py (default)
-#   "coding_agent"  → claude_code.py (falls back to standard if unavailable)
-#
-# Both implementations inject brief.agent_personality as the system prompt
-# when spawning, if present. Falls back to generic tier prompt otherwise.
-# claude_code.py passes the agent file via --system-prompt flag natively
-# (agency-agents was designed for Claude Code's agents/ directory).
-```
-
---
-
-## Run Config (`config/team.yaml`)
-
-```yaml
-run:
-  goal: "Build webhook ingestion system with retry logic and DLQ"
-  repo: "[email protected]:org/repo.git"
-  base_branch: "main"
-
-adapters:
-  llm: anthropic
-  vcs: github
-  notify: openclaw
-  runtime: openclaw
-
-models:
-  provider: anthropic          # default provider
-  capability_map:
-    reasoning-heavy:
-      anthropic: claude-opus-4-6
-      openai: o3
-    capable:
-      anthropic: claude-sonnet-4-6
-      openai: gpt-4o
-      ollama: llama3.1:70b
-    fast-cheap:
-      anthropic: claude-haiku-3-5
-      openai: gpt-4o-mini
-      ollama: llama3.2
-
-  # optional: override provider per tier
-  tier_overrides:
-    t1: { provider: openai, capability: reasoning-heavy }
-    t4: { provider: ollama, capability: fast-cheap }
-
-runtime:
-  default: openclaw
-  coding_agent: claude_code     # used for T4/T5 when available; omit to disable
-  native_teams: false           # Claude Code's experimental agent teams — opt-in only
-                                # when true: T3 hands full workstream to Claude Code,
-                                # which fans out internally. faster but less blackboard
-                                # visibility. default: false (explicit T4 spawning)
-  # tier_runtime_map (optional overrides):
-  #   t1: standard
-  #   t2: standard
-  #   t3: standard
-  #   t4: coding_agent
-  #   t5: coding_agent
-
-retry_defaults:
-  bad_output: 3
-  partial: 2
-  blocked: 0    # always escalate immediately
-
-visibility:
-  strict_mode: false          # true = all gates on (recommended for first runs)
-  log_level: normal           # normal | verbose (verbose = per-T4 start/done lines)
-  inspection_gates:
-    t1_plan: true             # always — required by design
-    t2_lead: false            # optional — review boundaries before specialists spawn
-    t2_synthesis: true        # recommended — review architecture before implementation
-    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
-    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
-  gate_timeout_minutes: 60    # auto-reject if no human response within this window
-
-t3_mesh_timeout_minutes: 10   # max time for T3s to commit task lists before runner escalates
-```
-
---
-
-## Role Registry (`config/role_registry.yaml`)
-
-Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
-
-```yaml
-t1:
-  default: agents/strategy/nexus-strategy.md
-
-t2:
-  backend:  agents/engineering/engineering-software-architect.md
-  frontend: agents/engineering/engineering-software-architect.md
-  infra:    agents/engineering/engineering-devops-automator.md
-  data:     agents/engineering/engineering-data-engineer.md
-  default:  agents/engineering/engineering-software-architect.md
-
-t3:
-  backend:  agents/engineering/engineering-senior-developer.md
-  frontend: agents/engineering/engineering-senior-developer.md
-  infra:    agents/engineering/engineering-sre.md
-  default:  agents/engineering/engineering-senior-developer.md
-
-t4:
-  frontend:  agents/engineering/engineering-frontend-developer.md
-  backend:   agents/engineering/engineering-backend-architect.md
-  database:  agents/engineering/engineering-database-optimizer.md
-  devops:    agents/engineering/engineering-devops-automator.md
-  mobile:    agents/engineering/engineering-mobile-app-builder.md
-  ai:        agents/engineering/engineering-ai-engineer.md
-  security:  agents/engineering/engineering-security-engineer.md
-  docs:      agents/engineering/engineering-technical-writer.md
-  default:   agents/engineering/engineering-senior-developer.md
-
-t5:
-  code:        agents/engineering/engineering-code-reviewer.md
-  integration: agents/testing/testing-reality-checker.md
-  api:         agents/testing/testing-api-tester.md
-  performance: agents/testing/testing-performance-benchmarker.md
-  security:    agents/engineering/engineering-security-engineer.md
-  default:     agents/engineering/engineering-code-reviewer.md
-```
-
-```yaml
-```
-
---
-
-## Key Flows
-
-### 1. Run Kickoff
-
-```
-User → team_runner.start(goal, config)  # via CLI or any caller
-  → generate run_id
-  → init blackboard (create runs/<run_id>/blackboard.db)
-  → build T1 brief (goal_anchor = goal, retry_budget from config)
-  → spawn T1 via runtime adapter
-  → await T1 workplan
-```
-
-### 2. T1 Scope Assessment
-
-```
-T1 receives brief
-  → assess complexity → decide depth
-  → identify workstreams
-  → set retry_budget multiplier per workstream (1x simple, 2x complex)
-  → emit N workstream briefs for T2 (or T3 if shallow)
-  → write workplan to blackboard
-  → team_runner spawns T2s in parallel
-```
-
-### 3. T4 Retry Loop (escalation.py)
-
-```
-spawn T4 with brief
-  → receive result
-  → classify: bad_output | blocked | partial | success
-
-  blocked:
-    → log event(escalated)
-    → pass to T3 immediately
-
-  bad_output, retries_remaining:
-    → amend brief with failure context, increment retry_count
-    → re-spawn T4
-    → log event(retried)
-
-  bad_output, retries_exhausted:
-    → log event(escalated)
-    → pass to T3
-
-  partial:
-    → write salvageable parts to blackboard
-    → re-task remainder with new brief
-
-  success:
-    → write result to blackboard
-    → log event(completed)
-    → notify T3
-```
-
-### 4. Inspection Gate Flow
-
-```
-runner reaches configured gate (e.g. t2_synthesis)
-  → write event(gate_pending, detail={tier, summary, what_happens_next})
-  → notify_adapter.send(tier summary + gate context)
-  → halt: poll blackboard for gate_approved or gate_rejected
-
-  gate_approved:
-    → write event(gate_approved)
-    → continue run
-
-  gate_rejected:
-    → write event(gate_rejected, detail={reason})
-    → re-invoke tier with rejection reason in brief context
-    → loop back to gate_pending when tier completes again
-
-  gate_timeout (gate_timeout_minutes elapsed):
-    → treat as gate_rejected
-    → notify Andrew: "Gate timed out, re-invoking tier"
-```
-
-### 5. Review Gate
-
-```
-T1 completes integration
-  → vcs_adapter.create_pr(
-      title="[agent-teams] <run_id>: <goal summary>",
-      body="<workplan + workstream summaries>",
-      head="integration/<run_id>",
-      base="main"
-    )
-  → notify_adapter.send(
-      "Run <run_id> complete. PR ready for review: <pr_url>",
-      context={run_id, goal, workstreams, pr_url}
-    )
-  → blackboard: update run status → "review"
-  → halt — no auto-merge
-```
-
---
-
-## Build Order
-
-1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
-2. `config/role_registry.yaml` — map tier+domain → agent personality files
-3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
-4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
-5. `adapters/base/*` — all four abstract interfaces
-6. `adapters/llm/anthropic.py` — first LLM implementation
-7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
-8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
-9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
-10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
-11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
-12. `prompts/` — fallback tier prompts (used when no agent_personality set)
-13. `adapters/vcs/github.py` — PR creation + branch management
-14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
-15. `config/team.yaml` — example config with full visibility block
-16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
-
---
-
-## Out of Scope (Phase 2)
-
- Cost accounting per tier + run rollup
- Parallel workstream progress dashboard
- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
- Persistent standing teams
- Web UI for run monitoring
@@ -1,681 +0,0 @@
-# Tiered Agent Team System — Design Document
-
-_Started: 2026-03-14. Last updated: 2026-03-30._
-
---
-
-## Resolved Design Decisions (formerly Open Questions)
-
-All eight open questions resolved 2026-03-30. Details in Decisions Log.
-
-1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
-
-2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
-
-3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
-
-4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
-
-5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
-
-6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
-
-7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
-
-8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
-
---
-
---
-
-## Overview
-
-A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
-
---
-
-## Core Principles
-
-**1. Tiers represent cognitive modes, not org chart levels.**
-Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
-
-**2. Depth is proportional to complexity.**
-Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
-
-**3. Goal anchoring at every level.**
-T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
-
-**4. Artifacts, not summaries.**
-Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
-
-**5. Verification is mandatory.**
-T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
-
-**6. Provider agnostic.**
-The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
-
-**7. Specialist talent pool.**
-Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
-
---
-
-## Tier Definitions
-
-| Tier | Role | Owns | Capability Level |
-|------|------|------|-----------------|
-| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
-| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
-| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
-| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
-| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
-
-T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
-
-Capability levels map to actual models per provider in config — the core system never references a specific model name.
-
---
-
-## Dispatch Model
-
-### T1 Owns the Plan
-
-T1 is not just a decomposer — it is the dispatch planner. Its output declares:
-
- **Workstreams** — the decomposed units of work
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
- **Parallelism** — which workstreams are independent and can run concurrently
-
-T1 does not prescribe how each tier operates internally. That is the tier's own concern.
-
-### T1 Lifecycle — Two Explicit Phases
-
-T1 is invoked twice per run, each with a distinct prompt and purpose:
-
-**Phase 1 — Plan:**
-1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
-2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
-3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
-
-**Phase 2 — Accept:**
-After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
-
-Both phases are named explicitly in the task brief schema and tracked on the blackboard.
-
-### Each Tier Owns the Layer Below
-
-Control flow is distributed, not centralised:
-
- T1 manages its T2s
- T2 Lead manages T2 specialists and their domain boundaries
- T2 specialists each own their T3s
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
-
-This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
-
-**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
-
-### Dynamic Paths
-
-Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
-
---
-
-## Orchestration Patterns Per Tier
-
-Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
-
-| Tier | Pattern | Rationale |
-|------|---------|-----------|
-| T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
-| T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
-| T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
-| T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
-| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
-| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
-
-### T2 Flow in Detail
-
-1. T1 spawns **T2 Lead Architect** with goal + workstream context
-2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
-3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
-4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
-5. Specialists work in parallel, each within their defined domain
-6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
-7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
-8. T1 (Accept phase) validates canonical architecture against goal anchor
-9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
-
---
-
-## Horizontal Scaling Within Tiers
-
-```
-T1 — Phase 1: Plan (self-critique → Andrew approval)
-│
-├── T2: Lead Architect (boundaries + shared assumptions first)
-│   ├── T2: Backend Architect  ─┐
-│   ├── T2: Frontend Architect  ├─ parallel, within defined domains
-│   └── T2: Infra Architect    ─┘
-│       │
-│       └── (Lead synthesises → conflict resolution if needed → canonical architecture)
-│
-├── T2 Backend Architect owns:
-│   ├── T3: API Squad Lead  ─┐
-│   └── T3: DB Squad Lead   ─┴─ light mesh within domain
-│           ├── T4: Worker A  ─┐
-│           ├── T4: Worker B  ─┼─ swarm / pipeline (T3 decides)
-│           └── T4: Worker C  ─┘
-│                   └── T5: Verifier(s) — fan-out + consensus
-│
-└── T1 — Phase 2: Accept (validates against goal anchor → PR)
-```
-
---
-
-## Use Case Flows
-
-T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
-
-### Full Stack — T1→T2→T3→T4→T5
-*Complex feature, new product, cross-domain changes*
-
-```
-T1 Plan
-  → assess complexity (high)
-  → output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
-  → self-critique pass
-  → GATE: surface to Andrew ← approval required
-
-T2 Lead (spawned by runner after approval)
-  → receive: goal + full workplan
-  → publish: domain boundaries + shared assumptions doc → blackboard
-  → GATE (optional): review boundaries before specialists spawn
-
-T2 Specialists (parallel fan-out, wait on Lead)
-  → each receives: their domain boundary + shared assumptions
-  → produce: architecture proposal for their slice
-  → Lead synthesises, drives conflict resolution if needed
-  → Lead writes: canonical architecture → blackboard
-  → GATE (recommended): review architecture before implementation
-
-Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
-
-T3s (light mesh within T2 domain)
-  → write draft task lists to blackboard
-  → read peers' lists, reconcile boundaries
-  → commit merged task plan before T4 dispatch
-  → GATE (optional): review task breakdown
-
-T4s
-  → swarm: independent tasks run in parallel
-  → pipeline: T4-A output feeds T4-B (T3 declares dependencies)
-  → commit to feature branches
-
-T5s (fan-out per T4 slice)
-  → each reviews its slice independently
-  → T3 collects results → joint verdict
-  → GATE (optional): review T5 verdict before T3 marks done
-  → partial: T3 retries only failed slices
-  → pass: T3 signals workstream done to T2
-
-T2 specialists → signal T2 Lead
-T2 Lead → writes integration summary → blackboard
-
-T1 Accept
-  → validate against goal anchor
-  → open PR, notify_adapter.send(pr summary + url)
-```
-
-### Medium Complexity — T1→T3→T4→T5
-*Config change, isolated bug fix — T1 determines no cross-domain design needed*
-
-```
-T1 Plan
-  → assess: contained scope, single domain, no T2 architecture needed
-  → workplan: tier paths [T3, T4, T5]
-  → GATE: Andrew approval
-
-T3s spawned directly by runner
-  → receives T1 brief with task context (no T2 architecture layer)
-  → T3 light mesh → T4 dispatch → T5 verify → signal done
-
-T1 Accept → PR
-```
-
-### Simple / Hotfix — T1→T4→T5
-*Single file, single function, trivial atomic task*
-
-```
-T1 Plan
-  → assess: trivial, single workstream
-  → tier path: [T4, T5]
-  → GATE: Andrew approval
-
-T4 (coding agent)
-  → single atomic task, commits
-
-T5 (single verifier, not full fan-out)
-  → code review + correctness check
-  → pass → T1 Accept → PR
-```
-
---
-
-## Resolved Mechanics
-
-### T3 Mesh via Blackboard
-
-T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
-
-1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
-2. Each T3 reads all sibling T3 draft lists in its T2 domain
-3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
-4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
-5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
-
-The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
-
---
-
-### T1 Plan Output Schema
-
-T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
-
-```json
-{
-  "run_id": "uuid",
-  "goal_anchor": "Original goal — immutable, propagated to every downstream brief",
-  "complexity": "high | medium | low",
-  "retry_budget_multiplier": 2,
-  "workstreams": [
-    {
-      "id": "ws-backend-api",
-      "name": "Backend API",
-      "domain": "backend",
-      "tier_path": ["t2", "t3", "t4", "t5"],
-      "parallel_group": "A",
-      "t2_specialist": "agents/engineering/engineering-software-architect.md",
-      "notes": "Focus on webhook ingest and retry queue"
-    }
-  ],
-  "parallelism": {
-    "groups": {
-      "A": ["ws-backend-api", "ws-frontend"],
-      "B": ["ws-infra"]
-    },
-    "sequence": ["A", "B"]
-  },
-  "self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
-}
-```
-
-`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
-
---
-
-### T5 Consensus & Verdict Schema
-
-T3 aggregates all T5 results into a joint verdict after fan-out completes.
-
-**Individual T5 result:**
-```json
-{
-  "verifier_id": "uuid",
-  "scope": "queue-client",
-  "verdict": "pass | fail",
-  "issues": ["issue description..."],
-  "notes": "human-readable summary"
-}
-```
-
-**T3 joint verdict (written to blackboard):**
-```json
-{
-  "t5_results": [...],
-  "joint_verdict": "pass | partial | fail",
-  "failed_scopes": ["queue-client"],
-  "summary": "Human-readable summary for gate surface and logs"
-}
-```
-
-**Split verdict handling:**
- `pass` → T3 marks workstream done, signals T2
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
- `fail` → T3 escalates to T2 (or T1 if shallow path)
-
---
-
-### Spawn Call Ownership
-
-The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
-
-**Flow:**
-1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
-2. Runner's spawn loop detects pending rows
-3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
-4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
-5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
-
-This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
-
---
-
-### Gate Approval UX
-
-**Core mechanic (platform-agnostic):**
-
-1. Runner writes `gate_pending` to blackboard
-2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
-3. Runner polls blackboard for `gate_approved` or `gate_rejected`
-4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
-
-Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
-
-**Adapter responsibility:**
-Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
-
-Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
-
---
-
-### T3 Mesh Timeout
-
-If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
-
-1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
-
-2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
-
-Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
-
---
-
-### Path Amendment Mechanism
-
-When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
-
-1. The discovering tier writes a `path_amendment` event to the blackboard:
-```json
-{
-  "kind": "path_amendment",
-  "proposed_by": "t3/ws-backend-api",
-  "reason": "Discovered auth dependency requires T2 architectural pass",
-  "amendment": {
-    "workstream": "ws-backend-api",
-    "add_tiers": ["t2"],
-    "insert_before": "t3"
-  }
-}
-```
-2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
-3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
-4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
-
-No agent needs callback plumbing. The runner is the notification bridge.
-
---
-
-## Shared State
-
-For software pipelines, **the repo is the primary blackboard**:
- T4 workers commit to feature branches
- T3 leads review and merge to workstream branches
- T2 architects own integration branches
- T1 does final integration and acceptance
-
-Supplemented by a SQLite coordination store per run tracking:
- In-flight workstreams and their current execution plans
- Handoff artifacts and tier status
- Retry counts and escalation history
- Path amendments (proposed, by whom, timestamp)
-
---
-
-## Failure Handling
-
-Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
-
-| Failure | Owner | Handler | Action |
-|---------|-------|---------|--------|
-| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
-| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
-| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
-| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
-| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
-| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
-| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
-| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
-| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
-
-**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
-
-Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
-
-T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
-
---
-
-## Agent Talent Pool
-
-The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
-
-**Division of responsibility:**
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
- Agency-agents provides: the specialist knowledge each agent brings to its role
-
-T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
-
-**Default tier-to-specialist mapping for software pipelines:**
-
-| Tier | Domain | Agent |
-|------|--------|-------|
-| T1 | Strategy | nexus-strategy |
-| T2 | Backend | software-architect |
-| T2 | Infra | devops-automator |
-| T2 | Data | data-engineer |
-| T3 | Backend | senior-developer |
-| T3 | Reliability | sre |
-| T4 | Frontend | frontend-developer |
-| T4 | Backend | backend-architect |
-| T4 | Database | database-optimizer |
-| T4 | DevOps | devops-automator |
-| T4 | Mobile | mobile-app-builder |
-| T4 | AI/ML | ai-engineer |
-| T4 | Security | security-engineer |
-| T4 | Docs | technical-writer |
-| T5 | Code review | code-reviewer |
-| T5 | Integration | testing-reality-checker |
-| T5 | API | testing-api-tester |
-| T5 | Performance | testing-performance-benchmarker |
-| T5 | Security | security-engineer |
-
-The roster is not fixed — T1 can select any agent from the library based on workstream needs.
-
---
-
-## Adapter Layers
-
-Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
-
-```
-Core (platform-agnostic)
-├── team_runner      — thin bootstrap: spawn T1, monitor blackboard, handle result
-├── blackboard       — SQLite coordination state
-├── task_brief       — schema + validation
-└── escalation       — retry logic, failure routing
-
-Adapters (swappable)
-├── llm/             — anthropic (now), openai, ollama, any API
-├── notify/          — openclaw (now), slack, email, webhook...
-├── vcs/             — github (now), gitlab, gitea, bare git...
-└── runtime/
-    ├── standard     — openclaw sessions_spawn (T1/T2/T3)
-    └── coding_agent — claude_code (T4/T5 default), codex, aider...
-```
-
-Swapping providers means writing a new adapter file — nothing in core changes.
-
-T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
-
---
-
-## Run Visibility Layer
-
-Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
-
-### 1. Human-Readable Live Log
-
-Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
-
-```
-[abc123] 12:30:01  T1   PLAN_START    Assessing scope: "Build webhook ingestion system"
-[abc123] 12:30:14  T1   PLAN_DONE     3 workstreams — backend-api, infra, docs (2 parallel)
-[abc123] 12:30:14  GATE APPROVAL      ⏸  Waiting on approval before T2 spawns
-[abc123] 12:31:02  GATE APPROVED      ✓  Approved — continuing
-[abc123] 12:31:03  T2   LEAD_START    Lead Architect spawned
-[abc123] 12:31:41  T2   BOUNDS_READY  Domain boundaries + shared assumptions published
-[abc123] 12:31:42  T2   SPEC_START    3 specialists spawned (parallel): backend, infra, docs
-[abc123] 12:32:15  T2   SPEC_DONE     backend-api architecture draft ready
-[abc123] 12:32:58  T2   SYNTH_DONE    Canonical architecture written to blackboard
-[abc123] 12:32:58  GATE INSPECTION    ⏸  T2 synthesis ready for review
-[abc123] 12:33:44  T3   MESH_START    backend-api: 2 squad leads negotiating task boundaries
-[abc123] 12:34:01  T3   MESH_DONE     Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
-[abc123] 12:34:02  T4   SWARM_START   5 workers spawned in parallel
-[abc123] 12:35:10  T4   DONE          worker-3 auth-middleware ✓
-[abc123] 12:35:22  T4   FAIL          worker-4 queue-client ✗  (retry 1/3)
-[abc123] 12:36:04  T4   DONE          worker-4 queue-client ✓  (retry resolved)
-[abc123] 12:36:05  T5   VERIFY_START  4 verifiers spawned
-[abc123] 12:36:45  T5   VERDICT       partial — queue-client needs rework
-[abc123] 12:37:12  T5   VERDICT       ✓  all pass — workstream backend-api done
-```
-
-Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
-
-### 2. Inspection Gates
-
-Configurable pause points. When the runner hits a gate, it:
-1. Writes a `gate_pending` event to the blackboard
-2. Fires `notify_adapter.send()` with the tier summary + gate context
-3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
-
-The tier summary surfaced at each gate includes:
- **What was produced** (the tier artifact in readable form)
- **What happens next** (which agents will spawn, doing what)
- **Any anomalies** flagged by the tier itself
-
-Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
-
-```yaml
-visibility:
-  strict_mode: false
-  log_level: normal           # normal | verbose
-  inspection_gates:
-    t1_plan: true             # always — required by design
-    t2_lead: false            # optional — review boundaries before specialists
-    t2_synthesis: true        # recommended — review architecture before implementation
-    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
-    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
-  gate_timeout_minutes: 60    # auto-reject if no response within this window
-```
-
-### 3. Inspection CLI — `cli/agency.py`
-
-```
-agency run <config.yaml>               # start a run, returns run_id
-agency watch <run_id>                  # tail live log (follows blackboard events)
-agency inspect <run_id>                # interactive tree view of run state
-agency inspect <run_id> --tier t2      # jump to T2 artifacts
-agency inspect <run_id> --brief <id>   # show full brief + result JSON
-
-agency approve <run_id>                # approve current gate → continue
-agency approve <run_id> --note "..."   # approve with a note written to blackboard
-agency reject <run_id> --reason "..."  # reject → tier re-invoked
-agency pause <run_id>                  # force-pause at next tier boundary
-agency resume <run_id>                 # release a manual pause
-```
-
-`agency inspect` (no flags) renders a live tree:
-```
-Run abc123 — "Build webhook ingestion system"
-├── T1 Plan ✓
-│   └── [view workplan]
-├── T2 Architecture ✓  [GATE: pending review]
-│   ├── [view domain boundaries]
-│   ├── [view shared assumptions]
-│   └── [view canonical architecture]
-├── T3 backend-api (active)
-│   ├── [view task breakdown]
-│   └── T4 workers: 3/7 done, 1 retrying, 3 pending
-└── T3 infra (pending)
-```
-
-### Blackboard Event Vocabulary (extended)
-
-```python
-# existing
-"spawned" | "completed" | "failed" | "escalated" | "retried"
-
-# new — visibility layer
-"gate_pending"     # runner hit a gate, waiting for human
-"gate_approved"    # human approved, run continues
-"gate_rejected"    # human rejected, tier re-invoked
-"gate_paused"      # manual pause via CLI
-"gate_resumed"     # manual resume via CLI
-"path_amendment"   # mid-run tier proposed path change
-"log"              # human-readable log line (level + message)
-```
-
---
-
-## Decisions Log
-
-**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
-
-**T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
-
-**T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
-
-**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
-
-**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
-
-**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
-
-**T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
-
-**T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
-
-**T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
-
-**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
-
-**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
-
-**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
-
-**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
-
-**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
-
-**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
-
-**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
-
-**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
-
-**Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
-
-**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
-
-**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
-
-**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
-
-**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
-
-**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
-
-**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
-
-**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
Author	SHA1	Message	Date
hans-heinemann	71316b3090	refactor(team_runner): replace static adapter imports with dynamic importlib loading Concrete adapter classes (AnthropicAdapter, GitHubAdapter, etc.) are no longer imported at the top of team_runner.py. Instead, each registry maps short names to 'module.path:ClassName' strings resolved lazily via importlib.import_module at instantiation time. This means: - Adding a new adapter requires only an entry in the registry string dict (or a full dotted path directly in team.yaml) — no changes to TeamRunner. - Third-party / custom adapters work out of the box: set e.g. adapters.llm: mypackage.llm.openai:OpenAIAdapter in team.yaml. - The runner no longer hard-wires knowledge of which concrete classes exist. Addresses tandrewng review comment on PR #1.	2026-03-16 00:30:28 -04:00
hans-heinemann	bd96a83069	fix: derive LLM provider from adapter, not config Remove redundant models.provider from team.yaml. Each adapter knows its own provider key — AnthropicAdapter always looks up 'anthropic' in the capability_map. This avoids a footgun where adapters.llm and models.provider could disagree. Future adapters (OpenAIAdapter, OllamaAdapter) will hardcode their own key the same way.	2026-03-15 23:47:52 -04:00
hans-heinemann	60576fbf2f	fix: remove hardcoded max_tokens/temperature from _dispatch_via_llm Both values are now sourced from team.yaml (models.default_max_tokens and models.default_temperature) via the adapter's __init__, eliminating the last hardcoded magic numbers. Callers can still override per-call via context dict if needed.	2026-03-15 21:43:01 -04:00
hans-heinemann	8524b63a76	fix: read default_temperature from team.yaml; update docstrings - Add default_temperature: 0 to config/team.yaml models block - Read self._default_temperature from models cfg in __init__ - Use self._default_temperature as fallback in complete() instead of hardcoded 0 - Update class docstring to document both default_max_tokens and default_temperature - Update complete() context param docs to reference team.yaml keys	2026-03-15 21:40:05 -04:00
hans-heinemann	6856f10c27	fix(adapter/llm): make max_tokens configurable via team.yaml models.default_max_tokens	2026-03-15 18:55:57 -04:00
hans-heinemannandClaude Sonnet 4.6	e097f4be21	feat(core): implement TeamRunner orchestration loop Full T1→T5 pipeline orchestration with adapter registry, escalation, and blackboard event emission. Key design decisions: - Adapter registry maps config keys to concrete classes; VCS and notify are optional (swallow init errors and degrade gracefully) - _dispatch_brief() routes to LLM adapter (standard) or coding runtime (coding_agent) based on brief.preferred_runtime - _run_with_escalation() drives the retry/salvage loop: persists amended briefs to the Blackboard before each re-submission - Tier parsers (_parse_t1/t2/t3_output) build child TaskBriefs, preserving the goal_anchor invariant and resolving agent personalities from the registry - T5 Verifier is always spawned after T4; VCS commit only happens on verified pass (status "passed" or "done") - --dry-run flag: logs all actions, skips LLM, VCS, and notify calls - Exposes CLI via `python -m core.team_runner` with --config, --dry-run, --verbose flags Co-Authored-By: Claude Sonnet 4.6 <[email protected]>	2026-03-15 03:15:37 -04:00
hans-heinemannandClaude Sonnet 4.6	97e7be80d1	feat(adapter/runtime): implement OpenClaw and ClaudeCode runtime adapters OpenClawRuntimeAdapter: - spawn() shells out to `openclaw session spawn --task <t> --mode run` - get_result() polls `openclaw session get <id>` until terminal status or timeout - kill() calls `openclaw session kill <id>`, silently succeeds if finished - Parses JSON or raw-text session IDs; raises NotImplementedError with helpful message when openclaw CLI is absent from PATH ClaudeCodeRuntimeAdapter: - spawn() launches `claude --permission-mode bypassPermissions --print <task>` in a temp dir (or context["workdir"]), returns a UUID job_id - Tracks all Popen instances in a thread-safe dict - get_result() calls communicate(timeout=...), raises TimeoutError on timeout - kill() terminates the Popen; silently ignores already-finished processes Co-Authored-By: Claude Sonnet 4.6 <[email protected]>	2026-03-15 03:15:21 -04:00
hans-heinemannandClaude Sonnet 4.6	c88c4309ac	feat(adapter/notify): implement OpenClawNotifyAdapter Sends notifications via `openclaw system event --text <msg> --mode now`. - Always logs locally (info/warning/error) regardless of CLI availability - Gracefully handles FileNotFoundError (openclaw not on PATH) and TimeoutExpired; notifications are best-effort and never crash the pipeline - OPENCLAW_SIGNAL_NUMBER env var stored for future direct-signal support Co-Authored-By: Claude Sonnet 4.6 <[email protected]>	2026-03-15 03:15:13 -04:00
hans-heinemannandClaude Sonnet 4.6	b212082b58	feat(adapter/vcs): implement GitHubAdapter Uses PyGithub to interact with the GitHub REST API. - Reads GITHUB_TOKEN from env; parses owner/repo from SSH or HTTPS URL - create_branch() creates a branch off the configured base branch - commit() accepts dict[str, str] {path: content} or list[str] of local paths; uses Contents API (create_file / update_file) - create_pr() and get_pr_status() delegate to PyGithub pull-request API Co-Authored-By: Claude Sonnet 4.6 <[email protected]>	2026-03-15 03:15:06 -04:00
hans-heinemannandClaude Sonnet 4.6	9646a146bc	feat(adapter/llm): implement AnthropicAdapter Implements AnthropicAdapter using the anthropic SDK. - Reads ANTHROPIC_API_KEY from env; raises ValueError if missing - resolve_model() looks up capability_map in team.yaml config, falls back to "capable" tier then hard-coded claude-sonnet-4-6 - complete() supports system_prompt, max_tokens (default 4096), and temperature (default 0) via the context dict - Adds PyGithub to requirements.txt (needed by GitHubAdapter) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>	2026-03-15 03:15:01 -04:00