chore: update submodule URL to Gitea

docs: add CLAUDE.md agent quick reference
2026-04-02 10:05:09 -04:00 · 2026-03-30 15:19:07 -04:00 · 2026-03-30 15:18:48 -04:00 · 2026-03-30 15:18:30 -04:00 · 2026-03-30 14:31:55 -04:00 · 2026-03-30 14:22:39 -04:00
12 changed files with 2009 additions and 141 deletions
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,3 @@
 [submodule "agents"]
 	path = agents
-	url = https://github.com/coding-with-hans-heinemann/agency-agents.git
+	url = https://git.tandrewng.com/cw-hans/agency-agents.git
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,48 @@
 # CLAUDE.md — Agent Quick Reference
 Read this before exploring the codebase. It saves tokens.
 ## What This Is
 A tiered multi-agent orchestration framework. T1 decomposes goals → T2 architects → T3 leads → T4 implements → T5 verifies. SQLite blackboard tracks state. All external dependencies (LLM, VCS, notify, runtime) are pluggable adapters.
 ## Key Docs
 - `docs/design.md` — architecture decisions, tier design, key choices
 - `docs/buildspec.md` — 15-step build order, phase breakdown
 ## Project Layout
 ```
 core/           — task_brief.py, blackboard.py, escalation.py, team_runner.py
 adapters/base/  — abstract base classes (LLMAdapter, VCSAdapter, NotifyAdapter, RuntimeAdapter)
 adapters/llm/   — anthropic.py
 adapters/vcs/   — github.py
 adapters/notify/— openclaw.py
 adapters/runtime— openclaw.py, claude_code.py
 prompts/        — T1–T5 system prompt .md files
 config/         — team.yaml (run config), role_registry.yaml (tier→role→persona)
 agents/         — git submodule, agent persona .md files
 runs/           — per-run blackboard.db files (gitignored)
 ```
 ## Conventions
 - **Never commit or push directly to `main`** — always branch (`hans/...` or `feature/...`) and PR
 - New adapters: subclass the relevant `adapters/base/*.py` abstract class
 - New roles: add persona `.md` to `agents/` submodule + entry in `config/role_registry.yaml`
 - Failure handling lives in `core/escalation.py` — extend `FailureType` there
 - `TaskBrief` is the canonical work unit — all tiers pass briefs to each other
 - Blackboard is the single source of truth per run — always write events there
 ## Current State
 Phase 2 adapter implementations exist. `core/team_runner.py` may still have stubs — check before assuming it's wired up end-to-end.
 ## Running
 ```bash
 python -m venv .venv && source .venv/bin/activate
 pip install -r requirements.txt
 python -m core.team_runner --config config/team.yaml
 ```
--- a/adapters/llm/anthropic.py
+++ b/adapters/llm/anthropic.py
@@ -1,16 +1,15 @@
 """
 adapters/llm/anthropic.py
-Anthropic Claude adapter — Phase 2 stub.
+Anthropic Claude LLM adapter — Phase 2 implementation.
-TODO (Phase 2):
+Uses the ``anthropic`` SDK to call Claude models.  Model selection is driven
-  - Implement complete() using the anthropic SDK (anthropic.Anthropic client).
+by the capability_map in team.yaml so the adapter stays provider-agnostic in
-  - Implement resolve_model() by reading config/team.yaml capability_map.
+configuration.
  - Handle streaming responses, rate-limit retries, and token counting.
  - Support system-prompt injection via context["system_prompt"].
  - Map capability → model using the provider's capability_map config.
 """
 from __future__ import annotations
 import os
 from adapters.base.llm import LLMAdapter
@@ -18,27 +17,123 @@ class AnthropicAdapter(LLMAdapter):
    """
    LLM adapter for Anthropic Claude models.
-    Reads model configuration from config/team.yaml:
+    Reads model configuration from the loaded team.yaml config dict::
-        models.provider: anthropic
+
-        models.capability_map.reasoning-heavy.anthropic: claude-opus-4-6
+        models:
-        models.capability_map.capable.anthropic: claude-sonnet-4-6
+          default_max_tokens: 4096   # fallback max_tokens for all calls
-        models.capability_map.fast-cheap.anthropic: claude-haiku-3-5
+          default_temperature: 0     # fallback temperature for all calls
          capability_map:
            reasoning-heavy:
              anthropic: claude-opus-4-6
            capable:
              anthropic: claude-sonnet-4-6
            fast-cheap:
              anthropic: claude-haiku-3-5
    The provider key used when looking up ``capability_map`` is hardcoded to
    ``"anthropic"`` — the adapter knows its own provider; there is no need for
    a separate ``models.provider`` config field.
    Both ``default_max_tokens`` and ``default_temperature`` can be overridden
    per-call via the ``context`` dict passed to :meth:`complete`.
    Environment variables
    ---------------------
    ANTHROPIC_API_KEY : Required. Authenticates with the Anthropic API.
    """
    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
+        """
-        # Extract API key from environment (ANTHROPIC_API_KEY).
+        Initialise the Anthropic adapter.
-        # Initialise the anthropic.Anthropic() client.
+
-        raise NotImplementedError("AnthropicAdapter.__init__ is not yet implemented.")
+        Parameters
        ----------
        config : Loaded team.yaml config dict.
        Raises
        ------
        ValueError
            If ANTHROPIC_API_KEY is not set in the environment.
        """
        try:
            import anthropic as _anthropic
        except ModuleNotFoundError as exc:
            raise ImportError(
                "The 'anthropic' package is required for AnthropicAdapter. "
                "Install it with: pip install anthropic"
            ) from exc
        self._config = config
        api_key = os.environ.get("ANTHROPIC_API_KEY")
        if not api_key:
            raise ValueError(
                "ANTHROPIC_API_KEY environment variable is not set. "
                "Export it before running the-agency."
            )
        self._client = _anthropic.Anthropic(api_key=api_key)
        self._models_cfg: dict = config.get("models", {})
        self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
        self._default_temperature: float = self._models_cfg.get("default_temperature", 0)
    def complete(self, prompt: str, capability: str, context: dict) -> str:
-        # TODO (Phase 2): Call anthropic client messages.create().
+        """
-        # Use resolve_model(capability) to pick the model.
+        Send a prompt to a Claude model and return the text response.
-        # Support context keys: system_prompt, max_tokens, temperature.
+
-        # Return response text as a plain string.
+        Parameters
-        raise NotImplementedError("AnthropicAdapter.complete is not yet implemented.")
+        ----------
        prompt      : User-role prompt content.
        capability  : One of "reasoning-heavy" | "capable" | "fast-cheap".
        context     : Optional per-call overrides:
                        system_prompt (str)   — prepended as the system turn.
                        max_tokens    (int)   — defaults to models.default_max_tokens in team.yaml.
                        temperature   (float) — defaults to models.default_temperature in team.yaml.
        Returns
        -------
        The model's text completion as a plain string.
        """
        model = self.resolve_model(capability)
        max_tokens: int = context.get("max_tokens", self._default_max_tokens)
        temperature: float = context.get("temperature", self._default_temperature)
        system_prompt: str = context.get("system_prompt", "")
        create_kwargs: dict = {
            "model": model,
            "max_tokens": max_tokens,
            "messages": [{"role": "user", "content": prompt}],
        }
        if system_prompt:
            create_kwargs["system"] = system_prompt
        if temperature != 0.0:
            create_kwargs["temperature"] = temperature
        response = self._client.messages.create(**create_kwargs)
        return response.content[0].text
    def resolve_model(self, capability: str) -> str:
-        # TODO (Phase 2): Look up capability in team.yaml capability_map.
+        """
-        # Fall back to "capable" tier model if capability is unknown.
+        Map a capability string to the Anthropic model identifier.
-        raise NotImplementedError("AnthropicAdapter.resolve_model is not yet implemented.")
+
        Looks up ``config.models.capability_map[capability][provider]``.
        Falls back to the "capable" tier model if the capability is unknown.
        Parameters
        ----------
        capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
        Returns
        -------
        Anthropic model identifier (e.g. "claude-opus-4-6").
        """
        # The adapter knows its own provider — no need to read it from config.
        cap_map: dict = self._models_cfg.get("capability_map", {})
        if capability in cap_map and "anthropic" in cap_map[capability]:
            return cap_map[capability]["anthropic"]
        # Fall back to "capable" tier
        if "capable" in cap_map and "anthropic" in cap_map["capable"]:
            return cap_map["capable"]["anthropic"]
        # Hard-coded last resort
        return "claude-sonnet-4-6"
--- a/adapters/notify/openclaw.py
+++ b/adapters/notify/openclaw.py
@@ -1,35 +1,93 @@
 """
 adapters/notify/openclaw.py
-OpenClaw notification adapter — Phase 2 stub.
+OpenClaw notification adapter — Phase 2 implementation.
-TODO (Phase 2):
+Sends notifications by shelling out to the ``openclaw`` CLI::
-  - Implement send() to dispatch notifications via the OpenClaw API.
+
-  - Support context keys: channel, severity, run_id, brief_id.
+    openclaw system event --text "<message>" --mode now
-  - Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
+
-  - Handle rate limiting and delivery retries.
+If the binary is not on PATH the method logs a warning and returns without
 raising — notifications are best-effort and should never crash the pipeline.
 """
 from __future__ import annotations
 import logging
 import os
 import subprocess
 from adapters.base.notify import NotifyAdapter
 logger = logging.getLogger(__name__)
 class OpenClawNotifyAdapter(NotifyAdapter):
    """
-    Notification adapter that sends messages via OpenClaw.
+    Notification adapter that dispatches messages via the ``openclaw`` CLI.
-    Expects environment variables:
+    Environment variables
-        OPENCLAW_API_KEY  — authentication token
+    ---------------------
-        OPENCLAW_URL      — base URL for the OpenClaw API (optional, defaults to hosted)
+    OPENCLAW_SIGNAL_NUMBER : Optional. Direct signal target for OpenClaw sends.
    """
    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
+        """
-        # Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
+        Initialise the OpenClaw notification adapter.
-        # Initialise an HTTP client (e.g. httpx or requests).
+
-        raise NotImplementedError("OpenClawNotifyAdapter.__init__ is not yet implemented.")
+        Parameters
        ----------
        config : Loaded team.yaml config dict (reserved for future options).
        """
        self._config = config
        self._signal_number: str = os.environ.get("OPENCLAW_SIGNAL_NUMBER", "")
    def send(self, message: str, context: dict) -> None:
-        # TODO (Phase 2): POST notification payload to OpenClaw API.
+        """
-        # Include message, context (channel, severity, run_id, brief_id).
+        Send a notification via ``openclaw system event``.
-        # Log delivery confirmation or raise on failure.
+
-        raise NotImplementedError("OpenClawNotifyAdapter.send is not yet implemented.")
+        Parameters
        ----------
        message : Human-readable notification text.
        context : Optional metadata.  Recognised keys:
                    level    (str) — "info" | "warning" | "error"; logged locally.
                    run_id   (str) — included in the local log record.
                    brief_id (str) — included in the local log record.
        Notes
        -----
        If the ``openclaw`` binary is not present on PATH, the method logs a
        warning and returns silently.  Notifications are best-effort.
        """
        level: str = context.get("level", "info")
        run_id: str = context.get("run_id", "")
        brief_id: str = context.get("brief_id", "")
        # Always log locally regardless of CLI availability.
        log_msg = "[notify:%s] %s (run=%s brief=%s)" % (level, message, run_id, brief_id)
        if level == "error":
            logger.error(log_msg)
        elif level == "warning":
            logger.warning(log_msg)
        else:
            logger.info(log_msg)
        cmd = ["openclaw", "system", "event", "--text", message, "--mode", "now"]
        try:
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=30,
            )
            if result.returncode != 0:
                logger.warning(
                    "openclaw event returned non-zero exit %d: %s",
                    result.returncode,
                    result.stderr.strip(),
                )
        except FileNotFoundError:
            logger.warning(
                "openclaw CLI not found on PATH; notification not delivered: %s",
                message,
            )
        except subprocess.TimeoutExpired:
            logger.warning("openclaw event timed out for message: %s", message)
--- a/adapters/runtime/claude_code.py
+++ b/adapters/runtime/claude_code.py
@@ -1,51 +1,163 @@
 """
 adapters/runtime/claude_code.py
-Claude Code agent runtime adapter — Phase 2 stub.
+Claude Code sub-agent runtime adapter — Phase 2 implementation.
-TODO (Phase 2):
+Spawns the ``claude`` CLI as a non-interactive subprocess for T4/T5
-  - Implement spawn() to launch a Claude Code sub-agent via the Agent SDK.
+implementation tasks::
-  - Implement get_result() to await agent completion and parse the output.
+
-  - Implement kill() to terminate the sub-agent process or session.
+    claude --permission-mode bypassPermissions --print "<task>"
-  - Map task brief context (files, constraints, artifacts) into the agent's
+
-    system prompt and tool context.
+Each spawned process is tracked by a UUID job_id so callers can later poll
-  - Handle Claude Code tool-use responses and extract structured output.
+for the result or terminate the job.  Stdout is captured and returned as the
 agent output; stderr is included for debugging.
 """
 from __future__ import annotations
 import logging
 import subprocess
 import tempfile
 import threading
 import uuid
 from adapters.base.runtime import RuntimeAdapter
 logger = logging.getLogger(__name__)
 class ClaudeCodeRuntimeAdapter(RuntimeAdapter):
    """
-    Runtime adapter that spawns Claude Code sub-agents for coding tasks.
+    Runtime adapter that spawns ``claude`` CLI sub-agents for coding tasks.
-    Used when a TaskBrief has preferred_runtime == "coding_agent".
+    Credentials are inherited from the environment (``ANTHROPIC_API_KEY``).
    The ``claude`` CLI must be installed and reachable on PATH.
-    Expects the Claude Code CLI / Agent SDK to be available in the environment.
+    Used when a TaskBrief has ``preferred_runtime == "coding_agent"``.
    Credentials are inherited from the environment (ANTHROPIC_API_KEY).
    """
    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
+        """
-        # Validate that Claude Code CLI or SDK is accessible.
+        Initialise the Claude Code runtime adapter.
-        # Initialise any agent session management state.
+
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.__init__ is not yet implemented.")
+        Parameters
        ----------
        config : Loaded team.yaml config dict (reserved for future options).
        """
        self._config = config
        # Maps job_id → running Popen instance.
        self._jobs: dict[str, subprocess.Popen] = {}
        self._lock = threading.Lock()
    # ------------------------------------------------------------------
    # RuntimeAdapter interface
    # ------------------------------------------------------------------
    def spawn(self, task: str, capability: str, context: dict) -> str:
-        # TODO (Phase 2): Launch a Claude Code sub-agent.
+        """
-        # Compose a structured system prompt from task + context.
+        Launch ``claude --permission-mode bypassPermissions --print "<task>"``
-        # Inject relevant files and constraints as tool context.
+        as a non-interactive subprocess.
-        # Return an agent_id that maps to a running agent session.
+
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.spawn is not yet implemented.")
+        Parameters
        ----------
        task       : Full task description (typically a JSON-serialised brief).
        capability : Capability hint (not forwarded; Claude Code resolves its
                     own model from the local environment).
        context    : Optional keys:
                       workdir (str) — cwd for the subprocess.  A fresh
                                       temporary directory is created if omitted.
        Returns
        -------
        A UUID job_id string that uniquely identifies this subprocess.
        """
        workdir: str = context.get("workdir") or tempfile.mkdtemp(
            prefix="agency-claude-"
        )
        job_id = str(uuid.uuid4())
        logger.info("Spawning Claude Code job %s in %s", job_id, workdir)
        proc = subprocess.Popen(
            ["claude", "--permission-mode", "bypassPermissions", "--print", task],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            cwd=workdir,
        )
        with self._lock:
            self._jobs[job_id] = proc
        return job_id
    def get_result(self, agent_id: str, timeout_s: int) -> dict:
-        # TODO (Phase 2): Await the Claude Code agent session to complete.
+        """
-        # Parse the agent's final message for structured JSON output.
+        Wait for the Claude Code subprocess to complete and return its output.
-        # Return dict with: {"status": ..., "output": ..., "artifacts": [...]}.
+
-        # Raise TimeoutError if timeout_s elapses.
+        Parameters
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.get_result is not yet implemented.")
+        ----------
        agent_id  : Job id returned by spawn().
        timeout_s : Maximum seconds to wait before raising TimeoutError.
        Returns
        -------
        dict with keys:
            status    ("completed" | "failed")
            output    (str — full stdout)
            artifacts (list — always empty; callers must parse output)
            stderr    (str — full stderr)
        Raises
        ------
        KeyError
            If agent_id does not correspond to a known job.
        TimeoutError
            If the subprocess does not finish within timeout_s seconds.
        """
        with self._lock:
            proc = self._jobs.get(agent_id)
        if proc is None:
            raise KeyError(f"No Claude Code job found for agent_id={agent_id!r}")
        try:
            stdout, stderr = proc.communicate(timeout=timeout_s)
        except subprocess.TimeoutExpired:
            proc.kill()
            stdout, stderr = proc.communicate()
            raise TimeoutError(
                f"Claude Code job {agent_id!r} did not complete within {timeout_s}s."
            )
        status = "completed" if proc.returncode == 0 else "failed"
        logger.info(
            "Claude Code job %s finished: status=%s returncode=%d",
            agent_id,
            status,
            proc.returncode,
        )
        return {
            "status": status,
            "output": stdout,
            "artifacts": [],
            "stderr": stderr,
        }
    def kill(self, agent_id: str) -> None:
-        # TODO (Phase 2): Terminate the Claude Code agent session.
+        """
-        # Clean up any temporary files or session state.
+        Terminate a running Claude Code subprocess.
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.kill is not yet implemented.")
+
        Silently succeeds if the job has already finished or the id is unknown.
        Parameters
        ----------
        agent_id : Job id returned by spawn().
        """
        with self._lock:
            proc = self._jobs.get(agent_id)
        if proc is not None:
            try:
                proc.terminate()
                logger.info("Terminated Claude Code job %s", agent_id)
            except OSError:
                pass  # Process already gone — that is fine.
--- a/adapters/runtime/openclaw.py
+++ b/adapters/runtime/openclaw.py
@@ -1,48 +1,241 @@
 """
 adapters/runtime/openclaw.py
-OpenClaw agent runtime adapter — Phase 2 stub.
+OpenClaw agent runtime adapter — Phase 2 implementation.
-TODO (Phase 2):
+Spawns sub-agents by shelling out to the ``openclaw`` CLI::
-  - Implement spawn() to submit a task to an OpenClaw worker pool.
+
-  - Implement get_result() to poll or subscribe for agent completion.
+    openclaw session spawn --task "<task>" --mode run
-  - Implement kill() to cancel a running OpenClaw agent job.
+    openclaw session get   <session_id>
-  - Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
+    openclaw session kill  <session_id>
-  - Map capability hint to an appropriate worker class/queue.
+
 If the ``openclaw`` binary is unavailable, all methods raise
 ``NotImplementedError`` with a helpful message rather than crashing with a
 raw ``FileNotFoundError``.
 """
 from __future__ import annotations
 import json
 import logging
 import re
 import subprocess
 import time
 from adapters.base.runtime import RuntimeAdapter
 logger = logging.getLogger(__name__)
 # Status strings from the openclaw CLI that indicate a session has finished.
 _TERMINAL_STATUSES = frozenset(
    {"done", "completed", "failed", "partial", "blocked", "error"}
 )
 class OpenClawRuntimeAdapter(RuntimeAdapter):
    """
-    Runtime adapter that dispatches agent tasks to OpenClaw workers.
+    Runtime adapter that dispatches agent tasks to OpenClaw worker sessions.
-    Expects environment variables:
+    All interactions use the ``openclaw`` CLI.  No additional credentials are
-        OPENCLAW_API_KEY  — authentication token
+    required beyond what OpenClaw manages in the local environment.
        OPENCLAW_URL      — base URL for the OpenClaw API
    """
    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
+        """
-        # Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
+        Initialise the OpenClaw runtime adapter.
-        # Initialise HTTP client and any job-tracking state.
+
-        raise NotImplementedError("OpenClawRuntimeAdapter.__init__ is not yet implemented.")
+        Parameters
        ----------
        config : Loaded team.yaml config dict (reserved for future options).
        """
        self._config = config
    # ------------------------------------------------------------------
    # RuntimeAdapter interface
    # ------------------------------------------------------------------
    def spawn(self, task: str, capability: str, context: dict) -> str:
-        # TODO (Phase 2): Submit task to OpenClaw worker pool.
+        """
-        # Map capability ("reasoning-heavy" | "capable" | "fast-cheap") to
+        Spawn an OpenClaw agent session for the given task.
-        # an appropriate worker queue or model hint.
+
-        # Return an agent_id string that can be used to poll for results.
+        Parameters
-        raise NotImplementedError("OpenClawRuntimeAdapter.spawn is not yet implemented.")
+        ----------
        task       : Natural-language task description.
        capability : Capability hint ("reasoning-heavy" | "capable" | "fast-cheap").
                     Passed informally; actual routing is handled by OpenClaw.
        context    : Arbitrary context bag (currently unused by this adapter).
        Returns
        -------
        session_id string parsed from the CLI output.
        Raises
        ------
        NotImplementedError
            If the ``openclaw`` CLI is not available on PATH.
        RuntimeError
            If the session_id cannot be parsed from the CLI output.
        """
        # TODO: map capability to an openclaw worker tier / model hint if the
        # openclaw CLI gains that flag in a future release.
        cmd = ["openclaw", "session", "spawn", "--task", task, "--mode", "run"]
        try:
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                check=True,
            )
        except FileNotFoundError:
            raise NotImplementedError(
                "openclaw CLI not found on PATH. "
                "Install OpenClaw or configure a different runtime adapter "
                "(e.g. adapters.runtime.claude_code.ClaudeCodeRuntimeAdapter)."
            )
        except subprocess.CalledProcessError as exc:
            raise RuntimeError(
                f"openclaw session spawn failed (exit {exc.returncode}): "
                f"{exc.stderr.strip()}"
            ) from exc
        return self._parse_session_id(result.stdout)
    def get_result(self, agent_id: str, timeout_s: int) -> dict:
-        # TODO (Phase 2): Poll or long-poll the OpenClaw API for job completion.
+        """
-        # Raise TimeoutError if timeout_s elapses before the job finishes.
+        Poll ``openclaw session get`` until the session reaches a terminal
-        # Return a dict with at minimum: {"status": ..., "output": ..., "artifacts": [...]}.
+        state or *timeout_s* seconds elapse.
-        raise NotImplementedError("OpenClawRuntimeAdapter.get_result is not yet implemented.")
+
        Parameters
        ----------
        agent_id  : Session ID returned by spawn().
        timeout_s : Maximum seconds to wait before raising TimeoutError.
        Returns
        -------
        dict with keys: ``status``, ``output``, ``artifacts``.
        Raises
        ------
        TimeoutError
            If the session does not finish within timeout_s seconds.
        NotImplementedError
            If the ``openclaw`` CLI is not available on PATH.
        """
        deadline = time.monotonic() + timeout_s
        poll_interval = 2.0
        while time.monotonic() < deadline:
            try:
                result = subprocess.run(
                    ["openclaw", "session", "get", agent_id],
                    capture_output=True,
                    text=True,
                    timeout=15,
                )
            except FileNotFoundError:
                raise NotImplementedError(
                    "openclaw CLI not found on PATH. "
                    "Install OpenClaw or switch to a different runtime adapter."
                )
            except subprocess.TimeoutExpired:
                logger.debug("openclaw session get timed out; will retry")
                time.sleep(poll_interval)
                continue
            if result.returncode == 0 and result.stdout.strip():
                parsed = self._parse_get_output(result.stdout)
                if parsed.get("status", "").lower() in _TERMINAL_STATUSES:
                    return parsed
            else:
                logger.debug(
                    "openclaw session get returned exit=%d; retrying. stderr=%s",
                    result.returncode,
                    result.stderr.strip(),
                )
            time.sleep(poll_interval)
        raise TimeoutError(
            f"Agent {agent_id!r} did not complete within {timeout_s}s."
        )
    def kill(self, agent_id: str) -> None:
-        # TODO (Phase 2): Send a cancellation request to the OpenClaw API.
+        """
-        # Silently succeed if the agent has already finished.
+        Terminate an OpenClaw session unconditionally.
-        raise NotImplementedError("OpenClawRuntimeAdapter.kill is not yet implemented.")
+
        Silently succeeds if the session has already finished.
        Parameters
        ----------
        agent_id : Session ID returned by spawn().
        Raises
        ------
        NotImplementedError
            If the ``openclaw`` CLI is not available on PATH.
        """
        try:
            subprocess.run(
                ["openclaw", "session", "kill", agent_id],
                capture_output=True,
                text=True,
                timeout=15,
            )
        except FileNotFoundError:
            raise NotImplementedError(
                "openclaw CLI not found on PATH. "
                "Install OpenClaw or switch to a different runtime adapter."
            )
        except subprocess.TimeoutExpired:
            logger.warning("openclaw session kill timed out for agent %s", agent_id)
    # ------------------------------------------------------------------
    # Private helpers
    # ------------------------------------------------------------------
    def _parse_session_id(self, output: str) -> str:
        """Extract a session_id from the raw stdout of ``openclaw session spawn``."""
        output = output.strip()
        # Prefer structured JSON output.
        try:
            data = json.loads(output)
            for key in ("session_id", "sessionId", "id"):
                if key in data:
                    return str(data[key])
        except (json.JSONDecodeError, TypeError):
            pass
        # Regex: look for "session_id: <id>" or similar.
        m = re.search(
            r"(?:session[_\s]?id|sessionId)[:\s]+([a-zA-Z0-9_\-]+)",
            output,
            re.IGNORECASE,
        )
        if m:
            return m.group(1)
        # Last resort: return the first non-empty line.
        lines = [ln.strip() for ln in output.splitlines() if ln.strip()]
        if lines:
            return lines[0]
        raise RuntimeError(
            f"Could not parse session_id from openclaw output: {output!r}"
        )
    def _parse_get_output(self, output: str) -> dict:
        """Parse the stdout of ``openclaw session get`` into a result dict."""
        output = output.strip()
        try:
            data = json.loads(output)
            return {
                "status": data.get("status", "done"),
                "output": data.get("output", output),
                "artifacts": data.get("artifacts", []),
            }
        except (json.JSONDecodeError, TypeError):
            # Non-JSON output — treat as completed with raw text output.
            return {
                "status": "done",
                "output": output,
                "artifacts": [],
            }
--- a/adapters/vcs/github.py
+++ b/adapters/vcs/github.py
@@ -1,16 +1,30 @@
 """
 adapters/vcs/github.py
-GitHub VCS adapter — Phase 2 stub.
+GitHub VCS adapter — Phase 2 implementation.
-TODO (Phase 2):
+Uses PyGithub (``pip install PyGithub``) to interact with the GitHub REST API.
-  - Implement create_branch() using PyGithub or gh CLI subprocess.
+Reads the repository URL and base branch from the team.yaml config dict.
-  - Implement commit() — stage files and push via git subprocess or API.
+
-  - Implement create_pr() using GitHub REST API (POST /repos/{owner}/{repo}/pulls).
+Note on commit() signature
-  - Implement get_pr_status() using GET /repos/{owner}/{repo}/pulls/{pull_number}.
+--------------------------
-  - Read repo and credentials from config/team.yaml and environment (GITHUB_TOKEN).
+The base class declares ``commit(files: list[str], message: str)``, which is
 insufficient for the GitHub Contents API (which requires file *content*, not
 just paths).  This implementation extends the signature to accept either:
 * ``dict[str, str]`` — ``{path: content}`` mapping (preferred; uses the API).
 * ``list[str]``      — local file paths; content is read from disk and pushed.
 The optional ``branch`` keyword argument targets a specific branch; it
 defaults to the configured base branch.
 """
 from __future__ import annotations
 import os
 import re
 from typing import Union
 from github import Github, GithubException
 from adapters.base.vcs import VCSAdapter
@@ -18,34 +32,175 @@ class GitHubAdapter(VCSAdapter):
    """
    VCS adapter for GitHub repositories.
-    Expects environment variable GITHUB_TOKEN and config values:
+    Authenticates via GITHUB_TOKEN and interacts with the GitHub REST API
-        run.repo        — SSH or HTTPS clone URL
+    through PyGithub.
-        run.base_branch — default base branch (e.g. "main")
+
    Environment variables
    ---------------------
    GITHUB_TOKEN : Required. Personal access token or GitHub App installation token.
    Config keys (from team.yaml)
    ----------------------------
    run.repo        : SSH or HTTPS clone URL (e.g. "git@github.com:org/repo.git").
    run.base_branch : Default base branch (e.g. "main").
    """
    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
+        """
-        # Extract GITHUB_TOKEN from environment.
+        Initialise the GitHub adapter.
-        # Parse owner/repo from config.run.repo.
+
-        raise NotImplementedError("GitHubAdapter.__init__ is not yet implemented.")
+        Parameters
        ----------
        config : Loaded team.yaml config dict.
        Raises
        ------
        ValueError
            If GITHUB_TOKEN is not set or the repo URL cannot be parsed.
        """
        self._config = config
        token = os.environ.get("GITHUB_TOKEN")
        if not token:
            raise ValueError(
                "GITHUB_TOKEN environment variable is not set. "
                "Create a personal access token and export it before running the-agency."
            )
        self._g = Github(token)
        run_cfg: dict = config.get("run", {})
        repo_url: str = run_cfg.get("repo", "")
        self._base_branch: str = run_cfg.get("base_branch", "main")
        self._owner, self._repo_name = self._parse_repo_url(repo_url)
        self._repo = self._g.get_repo(f"{self._owner}/{self._repo_name}")
    # ------------------------------------------------------------------
    # Helpers
    # ------------------------------------------------------------------
    def _parse_repo_url(self, url: str) -> tuple[str, str]:
        """Parse *owner* and *repo* name from an SSH or HTTPS GitHub URL."""
        # git@github.com:owner/repo.git
        m = re.match(r"git@github\.com:([^/]+)/([^/]+?)(?:\.git)?$", url)
        if m:
            return m.group(1), m.group(2)
        # https://github.com/owner/repo[.git]
        m = re.match(r"https?://github\.com/([^/]+)/([^/]+?)(?:\.git)?/?$", url)
        if m:
            return m.group(1), m.group(2)
        raise ValueError(
            f"Cannot parse GitHub owner/repo from URL: {url!r}. "
            "Expected SSH (git@github.com:owner/repo.git) or "
            "HTTPS (https://github.com/owner/repo.git) format."
        )
    # ------------------------------------------------------------------
    # VCSAdapter interface
    # ------------------------------------------------------------------
    def create_branch(self, name: str) -> None:
-        # TODO (Phase 2): Create branch via GitHub API or local git subprocess.
+        """
-        # Use config.run.base_branch as the branch point.
+        Create a new branch off ``self._base_branch`` on the remote.
        raise NotImplementedError("GitHubAdapter.create_branch is not yet implemented.")
-    def commit(self, files: list[str], message: str) -> str:
+        Parameters
-        # TODO (Phase 2): Stage files (git add), create commit (git commit), push.
+        ----------
-        # Return the resulting commit SHA.
+        name : New branch name (e.g. "feat/webhook-ingestion").
-        raise NotImplementedError("GitHubAdapter.commit is not yet implemented.")
+        """
        base_ref = self._repo.get_git_ref(f"heads/{self._base_branch}")
        self._repo.create_git_ref(f"refs/heads/{name}", base_ref.object.sha)
    def commit(
        self,
        files: Union[dict[str, str], list[str]],
        message: str,
        branch: str | None = None,
    ) -> str:
        """
        Commit files to the repository via the GitHub Contents API.
        Parameters
        ----------
        files   : Either a ``dict[path, content]`` mapping (preferred), or a
                  ``list[path]`` of local file paths whose content is read from
                  disk.
        message : Commit message.
        branch  : Target branch.  Defaults to ``self._base_branch``.
        Returns
        -------
        SHA of the last created/updated commit, or empty string if no files
        were committed.
        """
        target_branch = branch or self._base_branch
        # Normalise to {path: content}
        if isinstance(files, list):
            files_dict: dict[str, str] = {}
            for path in files:
                with open(path, "r", encoding="utf-8") as fh:
                    files_dict[path] = fh.read()
        else:
            files_dict = files
        last_sha: str = ""
        for path, content in files_dict.items():
            try:
                existing = self._repo.get_contents(path, ref=target_branch)
                result = self._repo.update_file(
                    path=path,
                    message=message,
                    content=content,
                    sha=existing.sha,  # type: ignore[union-attr]
                    branch=target_branch,
                )
            except GithubException:
                # File does not exist yet — create it
                result = self._repo.create_file(
                    path=path,
                    message=message,
                    content=content,
                    branch=target_branch,
                )
            last_sha = result["commit"].sha
        return last_sha
    def create_pr(self, title: str, body: str, head: str, base: str) -> str:
-        # TODO (Phase 2): POST to GitHub API /repos/{owner}/{repo}/pulls.
+        """
-        # Return the HTML URL of the created PR.
+        Open a pull request on GitHub.
-        raise NotImplementedError("GitHubAdapter.create_pr is not yet implemented.")
+
        Parameters
        ----------
        title : PR title.
        body  : PR description / body markdown.
        head  : Head branch name (the branch with changes).
        base  : Base branch name (e.g. "main").
        Returns
        -------
        HTML URL of the created pull request.
        """
        pr = self._repo.create_pull(
            title=title,
            body=body,
            head=head,
            base=base,
        )
        return pr.html_url
    def get_pr_status(self, pr_id: str) -> str:
-        # TODO (Phase 2): GET /repos/{owner}/{repo}/pulls/{number}.
+        """
-        # Map GitHub PR state ("open", "closed") + merged flag to
+        Fetch the current status of a pull request.
-        # our schema: "open" | "merged" | "closed".
+
-        raise NotImplementedError("GitHubAdapter.get_pr_status is not yet implemented.")
+        Parameters
        ----------
        pr_id : Pull request number as a string (e.g. "42").
        Returns
        -------
        One of: "open" | "merged" | "closed".
        """
        pr = self._repo.get_pull(int(pr_id))
        if pr.merged:
            return "merged"
        return pr.state  # "open" or "closed"
--- a/2
+++ b/2
--- a/config/role_registry.yaml
+++ b/config/role_registry.yaml
@@ -2,28 +2,40 @@ t1:
  default: agents/strategy/nexus-strategy.md
 t2:
-  backend:  agents/engineering/engineering-software-architect.md
+  backend:  agents/engineering/engineering-backend-architect.md
-  frontend: agents/engineering/engineering-software-architect.md
+  frontend: agents/engineering/engineering-frontend-architect.md
  infra:    agents/engineering/engineering-devops-automator.md
  data:     agents/engineering/engineering-data-engineer.md
  ai:       agents/engineering/engineering-software-architect.md
  security: agents/engineering/engineering-security-engineer.md
  mobile:   agents/engineering/engineering-software-architect.md
  default:  agents/engineering/engineering-software-architect.md
 t3:
-  backend:  agents/engineering/engineering-senior-developer.md
+  backend:  agents/engineering/engineering-senior-backend-developer.md
-  frontend: agents/engineering/engineering-senior-developer.md
+  frontend: agents/engineering/engineering-senior-frontend-developer.md
  infra:    agents/engineering/engineering-sre.md
-  default:  agents/engineering/engineering-senior-developer.md
+  data:     agents/engineering/engineering-data-engineer.md
  ai:       agents/engineering/engineering-ai-engineer.md
  security: agents/engineering/engineering-security-engineer.md
  mobile:   agents/engineering/engineering-mobile-app-builder.md
  database: agents/engineering/engineering-database-optimizer.md
  devops:   agents/engineering/engineering-sre.md
  docs:     agents/engineering/engineering-technical-writer.md
  default:  agents/engineering/engineering-backend-developer.md
 t4:
  frontend:  agents/engineering/engineering-frontend-developer.md
-  backend:   agents/engineering/engineering-backend-architect.md
+  backend:   agents/engineering/engineering-backend-developer.md
  database:  agents/engineering/engineering-database-optimizer.md
  devops:    agents/engineering/engineering-devops-automator.md
  mobile:    agents/engineering/engineering-mobile-app-builder.md
  ai:        agents/engineering/engineering-ai-engineer.md
  security:  agents/engineering/engineering-security-engineer.md
  docs:      agents/engineering/engineering-technical-writer.md
-  default:   agents/engineering/engineering-senior-developer.md
+  data:      agents/engineering/engineering-data-engineer.md
  embedded:  agents/engineering/engineering-embedded-firmware-engineer.md
  default:   agents/engineering/engineering-backend-developer.md
 t5:
  code:          agents/engineering/engineering-code-reviewer.md
@@ -31,4 +43,8 @@ t5:
  api:           agents/testing/testing-api-tester.md
  performance:   agents/testing/testing-performance-benchmarker.md
  security:      agents/engineering/engineering-security-engineer.md
  accessibility: agents/testing/testing-accessibility-auditor.md
  e2e:           agents/testing/testing-evidence-collector.md
  frontend:      agents/testing/testing-accessibility-auditor.md
  data:          agents/testing/testing-reality-checker.md
  default:       agents/engineering/engineering-code-reviewer.md
--- a/docs/buildspec.md
+++ b/docs/buildspec.md
@@ -0,0 +1,507 @@
 # Tiered Agent Team System — Build Spec
 _Started: 2026-03-15. Last updated: 2026-03-30._
 _See design.md for the design doc and decisions log._
 ---
 ## Language & Runtime
 **Python 3.11+.** Reasons:
 - Agent/AI tooling is Python-first
 - Clean type hints + dataclasses for schemas
 - Agents can read and modify their own orchestration code
 - Runs anywhere — no Node, no OpenClaw dependency
 ---
 ## Repository
 Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`
 Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
 ---
 ## Directory Structure
 ```
 agent-teams/
 ├── core/
 │   ├── team_runner.py       — run lifecycle, agent spawning
 │   ├── blackboard.py        — SQLite coordination state
 │   ├── task_brief.py        — schema + validation
 │   └── escalation.py        — retry logic, failure routing
 │
 ├── adapters/
 │   ├── base/
 │   │   ├── llm.py           — abstract LLM interface
 │   │   ├── vcs.py           — abstract VCS interface
 │   │   ├── notify.py        — abstract notification interface
 │   │   └── runtime.py       — abstract agent runtime interface
 │   ├── llm/
 │   │   ├── anthropic.py     — Claude via direct Anthropic API
 │   │   ├── openai.py        — GPT / o-series
 │   │   └── ollama.py        — local models
 │   ├── vcs/
 │   │   └── github.py
 │   ├── notify/
 │   │   └── openclaw.py      — messages Hans who notifies Andrew
 │   └── runtime/
 │       ├── openclaw.py      — sessions_spawn (general purpose)
 │       └── claude_code.py   — coding agent runtime (file/git/exec tools)
 │
 ├── agents/                  — git submodule: msitarzewski/agency-agents
 │   ├── engineering/
 │   ├── testing/
 │   ├── strategy/
 │   └── ...                  — full agency-agents roster
 │
 ├── prompts/
 │   ├── t1_visionary.md      — fallback if no agent_personality set
 │   ├── t2_architect.md
 │   ├── t3_squad_lead.md
 │   ├── t4_implementer.md
 │   └── t5_verifier.md
 │
 ├── config/
 │   ├── team.yaml            — example run configuration
 │   └── role_registry.yaml   — maps (tier, domain) → agent personality file
 │
 ├── cli/
 │   └── agency.py            — run, watch, inspect, approve, reject, pause, resume
 │
 ├── runs/                    — runtime state, one subdir per run_id
 │   └── .gitkeep
 │
 └── README.md
 ```
 ---
 ## Blackboard
 SQLite. One file per run at `runs/<run_id>/blackboard.db`.
 ### Tables
 **runs**
 ```sql
 CREATE TABLE runs (
    run_id      TEXT PRIMARY KEY,
    goal        TEXT NOT NULL,
    status      TEXT NOT NULL,  -- pending | active | review | done | failed
    created_at  TEXT NOT NULL,
    updated_at  TEXT NOT NULL
 );
 ```
 **workstreams**
 ```sql
 CREATE TABLE workstreams (
    workstream_id   TEXT PRIMARY KEY,
    run_id          TEXT NOT NULL,
    name            TEXT NOT NULL,
    tier            INTEGER NOT NULL,
    status          TEXT NOT NULL,  -- pending | active | blocked | done | failed
    owner_agent_id  TEXT,
    created_at      TEXT NOT NULL,
    updated_at      TEXT NOT NULL
 );
 ```
 **briefs**
 ```sql
 CREATE TABLE briefs (
    brief_id        TEXT PRIMARY KEY,
    run_id          TEXT NOT NULL,
    parent_brief_id TEXT,
    workstream_id   TEXT,
    tier            INTEGER NOT NULL,
    role            TEXT NOT NULL,
    status          TEXT NOT NULL,  -- pending | active | done | failed
    payload         TEXT NOT NULL,  -- full JSON brief
    result          TEXT,           -- JSON result when done
    retry_count     INTEGER DEFAULT 0,
    created_at      TEXT NOT NULL,
    updated_at      TEXT NOT NULL
 );
 ```
 **events**
 ```sql
 CREATE TABLE events (
    event_id    TEXT PRIMARY KEY,
    run_id      TEXT NOT NULL,
    brief_id    TEXT,
    kind        TEXT NOT NULL,  -- see event vocabulary below
    detail      TEXT,           -- JSON
    created_at  TEXT NOT NULL
 );
 ```
 **Event kind vocabulary:**
 ```
 -- lifecycle
 spawned | completed | failed | escalated | retried
 -- visibility / gates
 gate_pending    -- runner hit an inspection gate, waiting for human
 gate_approved   -- human approved via CLI or notify
 gate_rejected   -- human rejected, tier re-invoked
 gate_paused     -- manual pause via CLI
 gate_resumed    -- manual resume via CLI
 -- amendments / informational
 path_amendment  -- mid-run tier proposed a tier path change
 log             -- human-readable log line (detail: {level, message})
 ```
 **t3_task_lists** *(T3 mesh coordination)*
 ```sql
 CREATE TABLE t3_task_lists (
    entry_id        TEXT PRIMARY KEY,
    run_id          TEXT NOT NULL,
    workstream_id   TEXT NOT NULL,
    t3_agent_id     TEXT NOT NULL,
    status          TEXT NOT NULL,  -- draft | committed
    tasks           TEXT NOT NULL,  -- JSON array of proposed T4 task descriptors
    created_at      TEXT NOT NULL,
    updated_at      TEXT NOT NULL
 );
 ```
 ---
 ## Task Brief Schema
 Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
 ```json
 {
  "brief_id": "uuid",
  "run_id": "uuid",
  "parent_brief_id": "uuid | null",
  "tier": 4,
  "role": "implementer",
  "goal_anchor": "Original T1 intent — always propagated unchanged",
  "workstream": "backend-api",
  "task": "Implement POST /webhooks/ingest endpoint",
  "acceptance_criteria": [
    "Accepts JSON payload",
    "Returns 202 on success",
    "Writes to queue"
  ],
  "constraints": [
    "Use existing queue client in src/queue.py",
    "No new dependencies"
  ],
  "context": {
    "relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
    "interface_contract": "..."
  },
  "retry_budget": 3,
  "retry_count": 0,
  "preferred_runtime": "coding_agent",
  "agent_personality": "agents/engineering/engineering-code-reviewer.md",
  "created_at": "ISO-8601"
 }
 ```
 `preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
 `agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
 ```
 ```
 ---
 ## Adapter Interfaces
 ### LLM (`adapters/base/llm.py`)
 ```python
 class LLMAdapter:
    def complete(self, prompt: str, capability: str, context: dict) -> str
    def resolve_model(self, capability: str) -> str
    # capability: "reasoning-heavy" | "capable" | "fast-cheap"
 ```
 ### VCS (`adapters/base/vcs.py`)
 ```python
 class VCSAdapter:
    def create_branch(self, name: str) -> None
    def commit(self, files: list[str], message: str) -> str       # returns commit sha
    def create_pr(self, title: str, body: str, head: str, base: str) -> str  # returns pr url
    def get_pr_status(self, pr_id: str) -> str                    # open | merged | closed
 ```
 ### Notify (`adapters/base/notify.py`)
 ```python
 class NotifyAdapter:
    def send(self, message: str, context: dict) -> None
 ```
 ### Runtime (`adapters/base/runtime.py`)
 ```python
 class RuntimeAdapter:
    def spawn(self, task: str, capability: str, context: dict) -> str  # returns agent_id
    def get_result(self, agent_id: str, timeout_s: int) -> dict
    def kill(self, agent_id: str) -> None
 # Two implementations:
 #   openclaw.py    — general purpose, uses sessions_spawn, suits T1/T2/T3
 #   claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
 #
 # The runner selects runtime based on brief.preferred_runtime:
 #   "standard"      → openclaw.py (default)
 #   "coding_agent"  → claude_code.py (falls back to standard if unavailable)
 #
 # Both implementations inject brief.agent_personality as the system prompt
 # when spawning, if present. Falls back to generic tier prompt otherwise.
 # claude_code.py passes the agent file via --system-prompt flag natively
 # (agency-agents was designed for Claude Code's agents/ directory).
 ```
 ---
 ## Run Config (`config/team.yaml`)
 ```yaml
 run:
  goal: "Build webhook ingestion system with retry logic and DLQ"
  repo: "git@github.com:org/repo.git"
  base_branch: "main"
 adapters:
  llm: anthropic
  vcs: github
  notify: openclaw
  runtime: openclaw
 models:
  provider: anthropic          # default provider
  capability_map:
    reasoning-heavy:
      anthropic: claude-opus-4-6
      openai: o3
    capable:
      anthropic: claude-sonnet-4-6
      openai: gpt-4o
      ollama: llama3.1:70b
    fast-cheap:
      anthropic: claude-haiku-3-5
      openai: gpt-4o-mini
      ollama: llama3.2
  # optional: override provider per tier
  tier_overrides:
    t1: { provider: openai, capability: reasoning-heavy }
    t4: { provider: ollama, capability: fast-cheap }
 runtime:
  default: openclaw
  coding_agent: claude_code     # used for T4/T5 when available; omit to disable
  native_teams: false           # Claude Code's experimental agent teams — opt-in only
                                # when true: T3 hands full workstream to Claude Code,
                                # which fans out internally. faster but less blackboard
                                # visibility. default: false (explicit T4 spawning)
  # tier_runtime_map (optional overrides):
  #   t1: standard
  #   t2: standard
  #   t3: standard
  #   t4: coding_agent
  #   t5: coding_agent
 retry_defaults:
  bad_output: 3
  partial: 2
  blocked: 0    # always escalate immediately
 visibility:
  strict_mode: false          # true = all gates on (recommended for first runs)
  log_level: normal           # normal | verbose (verbose = per-T4 start/done lines)
  inspection_gates:
    t1_plan: true             # always — required by design
    t2_lead: false            # optional — review boundaries before specialists spawn
    t2_synthesis: true        # recommended — review architecture before implementation
    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
  gate_timeout_minutes: 60    # auto-reject if no human response within this window
 t3_mesh_timeout_minutes: 10   # max time for T3s to commit task lists before runner escalates
 ```
 ---
 ## Role Registry (`config/role_registry.yaml`)
 Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
 ```yaml
 t1:
  default: agents/strategy/nexus-strategy.md
 t2:
  backend:  agents/engineering/engineering-software-architect.md
  frontend: agents/engineering/engineering-software-architect.md
  infra:    agents/engineering/engineering-devops-automator.md
  data:     agents/engineering/engineering-data-engineer.md
  default:  agents/engineering/engineering-software-architect.md
 t3:
  backend:  agents/engineering/engineering-senior-developer.md
  frontend: agents/engineering/engineering-senior-developer.md
  infra:    agents/engineering/engineering-sre.md
  default:  agents/engineering/engineering-senior-developer.md
 t4:
  frontend:  agents/engineering/engineering-frontend-developer.md
  backend:   agents/engineering/engineering-backend-architect.md
  database:  agents/engineering/engineering-database-optimizer.md
  devops:    agents/engineering/engineering-devops-automator.md
  mobile:    agents/engineering/engineering-mobile-app-builder.md
  ai:        agents/engineering/engineering-ai-engineer.md
  security:  agents/engineering/engineering-security-engineer.md
  docs:      agents/engineering/engineering-technical-writer.md
  default:   agents/engineering/engineering-senior-developer.md
 t5:
  code:        agents/engineering/engineering-code-reviewer.md
  integration: agents/testing/testing-reality-checker.md
  api:         agents/testing/testing-api-tester.md
  performance: agents/testing/testing-performance-benchmarker.md
  security:    agents/engineering/engineering-security-engineer.md
  default:     agents/engineering/engineering-code-reviewer.md
 ```
 ```yaml
 ```
 ---
 ## Key Flows
 ### 1. Run Kickoff
 ```
 User → team_runner.start(goal, config)  # via CLI or any caller
  → generate run_id
  → init blackboard (create runs/<run_id>/blackboard.db)
  → build T1 brief (goal_anchor = goal, retry_budget from config)
  → spawn T1 via runtime adapter
  → await T1 workplan
 ```
 ### 2. T1 Scope Assessment
 ```
 T1 receives brief
  → assess complexity → decide depth
  → identify workstreams
  → set retry_budget multiplier per workstream (1x simple, 2x complex)
  → emit N workstream briefs for T2 (or T3 if shallow)
  → write workplan to blackboard
  → team_runner spawns T2s in parallel
 ```
 ### 3. T4 Retry Loop (escalation.py)
 ```
 spawn T4 with brief
  → receive result
  → classify: bad_output | blocked | partial | success
  blocked:
    → log event(escalated)
    → pass to T3 immediately
  bad_output, retries_remaining:
    → amend brief with failure context, increment retry_count
    → re-spawn T4
    → log event(retried)
  bad_output, retries_exhausted:
    → log event(escalated)
    → pass to T3
  partial:
    → write salvageable parts to blackboard
    → re-task remainder with new brief
  success:
    → write result to blackboard
    → log event(completed)
    → notify T3
 ```
 ### 4. Inspection Gate Flow
 ```
 runner reaches configured gate (e.g. t2_synthesis)
  → write event(gate_pending, detail={tier, summary, what_happens_next})
  → notify_adapter.send(tier summary + gate context)
  → halt: poll blackboard for gate_approved or gate_rejected
  gate_approved:
    → write event(gate_approved)
    → continue run
  gate_rejected:
    → write event(gate_rejected, detail={reason})
    → re-invoke tier with rejection reason in brief context
    → loop back to gate_pending when tier completes again
  gate_timeout (gate_timeout_minutes elapsed):
    → treat as gate_rejected
    → notify Andrew: "Gate timed out, re-invoking tier"
 ```
 ### 5. Review Gate
 ```
 T1 completes integration
  → vcs_adapter.create_pr(
      title="[agent-teams] <run_id>: <goal summary>",
      body="<workplan + workstream summaries>",
      head="integration/<run_id>",
      base="main"
    )
  → notify_adapter.send(
      "Run <run_id> complete. PR ready for review: <pr_url>",
      context={run_id, goal, workstreams, pr_url}
    )
  → blackboard: update run status → "review"
  → halt — no auto-merge
 ```
 ---
 ## Build Order
 1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
 2. `config/role_registry.yaml` — map tier+domain → agent personality files
 3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
 4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
 5. `adapters/base/*` — all four abstract interfaces
 6. `adapters/llm/anthropic.py` — first LLM implementation
 7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
 8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
 9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
 10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
 11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
 12. `prompts/` — fallback tier prompts (used when no agent_personality set)
 13. `adapters/vcs/github.py` — PR creation + branch management
 14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
 15. `config/team.yaml` — example config with full visibility block
 16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
 ---
 ## Out of Scope (Phase 2)
 - Cost accounting per tier + run rollup
 - Parallel workstream progress dashboard
 - Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
 - Persistent standing teams
 - Web UI for run monitoring
--- a/docs/design.md
+++ b/docs/design.md
@@ -0,0 +1,681 @@
 # Tiered Agent Team System — Design Document
 _Started: 2026-03-14. Last updated: 2026-03-30._
 ---
 ## Resolved Design Decisions (formerly Open Questions)
 All eight open questions resolved 2026-03-30. Details in Decisions Log.
 1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
 2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
 3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
 4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
 5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
 6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
 7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
 8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
 ---
 ---
 ## Overview
 A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
 ---
 ## Core Principles
 **1. Tiers represent cognitive modes, not org chart levels.**
 Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
 **2. Depth is proportional to complexity.**
 Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
 **3. Goal anchoring at every level.**
 T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
 **4. Artifacts, not summaries.**
 Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
 **5. Verification is mandatory.**
 T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
 **6. Provider agnostic.**
 The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
 **7. Specialist talent pool.**
 Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
 ---
 ## Tier Definitions
 | Tier | Role | Owns | Capability Level |
 |------|------|------|-----------------|
 | T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
 | T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
 | T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
 | T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
 | T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
 T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
 Capability levels map to actual models per provider in config — the core system never references a specific model name.
 ---
 ## Dispatch Model
 ### T1 Owns the Plan
 T1 is not just a decomposer — it is the dispatch planner. Its output declares:
 - **Workstreams** — the decomposed units of work
 - **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
 - **Parallelism** — which workstreams are independent and can run concurrently
 T1 does not prescribe how each tier operates internally. That is the tier's own concern.
 ### T1 Lifecycle — Two Explicit Phases
 T1 is invoked twice per run, each with a distinct prompt and purpose:
 **Phase 1 — Plan:**
 1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
 2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
 3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
 **Phase 2 — Accept:**
 After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
 Both phases are named explicitly in the task brief schema and tracked on the blackboard.
 ### Each Tier Owns the Layer Below
 Control flow is distributed, not centralised:
 - T1 manages its T2s
 - T2 Lead manages T2 specialists and their domain boundaries
 - T2 specialists each own their T3s
 - **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
 - The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
 This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
 **Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
 ### Dynamic Paths
 Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
 ---
 ## Orchestration Patterns Per Tier
 Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
 | Tier | Pattern | Rationale |
 |------|---------|-----------|
 | T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
 | T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
 | T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
 | T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
 | T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
 | T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
 ### T2 Flow in Detail
 1. T1 spawns **T2 Lead Architect** with goal + workstream context
 2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
 3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
 4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
 5. Specialists work in parallel, each within their defined domain
 6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
 7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
 8. T1 (Accept phase) validates canonical architecture against goal anchor
 9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
 ---
 ## Horizontal Scaling Within Tiers
 ```
 T1 — Phase 1: Plan (self-critique → Andrew approval)
 │
 ├── T2: Lead Architect (boundaries + shared assumptions first)
 │   ├── T2: Backend Architect  ─┐
 │   ├── T2: Frontend Architect  ├─ parallel, within defined domains
 │   └── T2: Infra Architect    ─┘
 │       │
 │       └── (Lead synthesises → conflict resolution if needed → canonical architecture)
 │
 ├── T2 Backend Architect owns:
 │   ├── T3: API Squad Lead  ─┐
 │   └── T3: DB Squad Lead   ─┴─ light mesh within domain
 │           ├── T4: Worker A  ─┐
 │           ├── T4: Worker B  ─┼─ swarm / pipeline (T3 decides)
 │           └── T4: Worker C  ─┘
 │                   └── T5: Verifier(s) — fan-out + consensus
 │
 └── T1 — Phase 2: Accept (validates against goal anchor → PR)
 ```
 ---
 ## Use Case Flows
 T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
 ### Full Stack — T1→T2→T3→T4→T5
 *Complex feature, new product, cross-domain changes*
 ```
 T1 Plan
  → assess complexity (high)
  → output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
  → self-critique pass
  → GATE: surface to Andrew ← approval required
 T2 Lead (spawned by runner after approval)
  → receive: goal + full workplan
  → publish: domain boundaries + shared assumptions doc → blackboard
  → GATE (optional): review boundaries before specialists spawn
 T2 Specialists (parallel fan-out, wait on Lead)
  → each receives: their domain boundary + shared assumptions
  → produce: architecture proposal for their slice
  → Lead synthesises, drives conflict resolution if needed
  → Lead writes: canonical architecture → blackboard
  → GATE (recommended): review architecture before implementation
 Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
 T3s (light mesh within T2 domain)
  → write draft task lists to blackboard
  → read peers' lists, reconcile boundaries
  → commit merged task plan before T4 dispatch
  → GATE (optional): review task breakdown
 T4s
  → swarm: independent tasks run in parallel
  → pipeline: T4-A output feeds T4-B (T3 declares dependencies)
  → commit to feature branches
 T5s (fan-out per T4 slice)
  → each reviews its slice independently
  → T3 collects results → joint verdict
  → GATE (optional): review T5 verdict before T3 marks done
  → partial: T3 retries only failed slices
  → pass: T3 signals workstream done to T2
 T2 specialists → signal T2 Lead
 T2 Lead → writes integration summary → blackboard
 T1 Accept
  → validate against goal anchor
  → open PR, notify_adapter.send(pr summary + url)
 ```
 ### Medium Complexity — T1→T3→T4→T5
 *Config change, isolated bug fix — T1 determines no cross-domain design needed*
 ```
 T1 Plan
  → assess: contained scope, single domain, no T2 architecture needed
  → workplan: tier paths [T3, T4, T5]
  → GATE: Andrew approval
 T3s spawned directly by runner
  → receives T1 brief with task context (no T2 architecture layer)
  → T3 light mesh → T4 dispatch → T5 verify → signal done
 T1 Accept → PR
 ```
 ### Simple / Hotfix — T1→T4→T5
 *Single file, single function, trivial atomic task*
 ```
 T1 Plan
  → assess: trivial, single workstream
  → tier path: [T4, T5]
  → GATE: Andrew approval
 T4 (coding agent)
  → single atomic task, commits
 T5 (single verifier, not full fan-out)
  → code review + correctness check
  → pass → T1 Accept → PR
 ```
 ---
 ## Resolved Mechanics
 ### T3 Mesh via Blackboard
 T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
 1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
 2. Each T3 reads all sibling T3 draft lists in its T2 domain
 3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
 4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
 5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
 The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
 ---
 ### T1 Plan Output Schema
 T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
 ```json
 {
  "run_id": "uuid",
  "goal_anchor": "Original goal — immutable, propagated to every downstream brief",
  "complexity": "high | medium | low",
  "retry_budget_multiplier": 2,
  "workstreams": [
    {
      "id": "ws-backend-api",
      "name": "Backend API",
      "domain": "backend",
      "tier_path": ["t2", "t3", "t4", "t5"],
      "parallel_group": "A",
      "t2_specialist": "agents/engineering/engineering-software-architect.md",
      "notes": "Focus on webhook ingest and retry queue"
    }
  ],
  "parallelism": {
    "groups": {
      "A": ["ws-backend-api", "ws-frontend"],
      "B": ["ws-infra"]
    },
    "sequence": ["A", "B"]
  },
  "self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
 }
 ```
 `parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
 ---
 ### T5 Consensus & Verdict Schema
 T3 aggregates all T5 results into a joint verdict after fan-out completes.
 **Individual T5 result:**
 ```json
 {
  "verifier_id": "uuid",
  "scope": "queue-client",
  "verdict": "pass | fail",
  "issues": ["issue description..."],
  "notes": "human-readable summary"
 }
 ```
 **T3 joint verdict (written to blackboard):**
 ```json
 {
  "t5_results": [...],
  "joint_verdict": "pass | partial | fail",
  "failed_scopes": ["queue-client"],
  "summary": "Human-readable summary for gate surface and logs"
 }
 ```
 **Split verdict handling:**
 - `pass` → T3 marks workstream done, signals T2
 - `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
 - `fail` → T3 escalates to T2 (or T1 if shallow path)
 ---
 ### Spawn Call Ownership
 The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
 **Flow:**
 1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
 2. Runner's spawn loop detects pending rows
 3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
 4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
 5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
 This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
 ---
 ### Gate Approval UX
 **Core mechanic (platform-agnostic):**
 1. Runner writes `gate_pending` to blackboard
 2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
 3. Runner polls blackboard for `gate_approved` or `gate_rejected`
 4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
 Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
 **Adapter responsibility:**
 Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
 Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
 ---
 ### T3 Mesh Timeout
 If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
 1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
 2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
 Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
 ---
 ### Path Amendment Mechanism
 When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
 1. The discovering tier writes a `path_amendment` event to the blackboard:
 ```json
 {
  "kind": "path_amendment",
  "proposed_by": "t3/ws-backend-api",
  "reason": "Discovered auth dependency requires T2 architectural pass",
  "amendment": {
    "workstream": "ws-backend-api",
    "add_tiers": ["t2"],
    "insert_before": "t3"
  }
 }
 ```
 2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
 3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
 4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
 No agent needs callback plumbing. The runner is the notification bridge.
 ---
 ## Shared State
 For software pipelines, **the repo is the primary blackboard**:
 - T4 workers commit to feature branches
 - T3 leads review and merge to workstream branches
 - T2 architects own integration branches
 - T1 does final integration and acceptance
 Supplemented by a SQLite coordination store per run tracking:
 - In-flight workstreams and their current execution plans
 - Handoff artifacts and tier status
 - Retry counts and escalation history
 - Path amendments (proposed, by whom, timestamp)
 ---
 ## Failure Handling
 Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
 | Failure | Owner | Handler | Action |
 |---------|-------|---------|--------|
 | T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
 | T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
 | T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
 | T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
 | T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
 | T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
 | T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
 | T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
 | Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
 **Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
 Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
 T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
 ---
 ## Agent Talent Pool
 The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
 **Division of responsibility:**
 - Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
 - Agency-agents provides: the specialist knowledge each agent brings to its role
 T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
 **Default tier-to-specialist mapping for software pipelines:**
 | Tier | Domain | Agent |
 |------|--------|-------|
 | T1 | Strategy | nexus-strategy |
 | T2 | Backend | software-architect |
 | T2 | Infra | devops-automator |
 | T2 | Data | data-engineer |
 | T3 | Backend | senior-developer |
 | T3 | Reliability | sre |
 | T4 | Frontend | frontend-developer |
 | T4 | Backend | backend-architect |
 | T4 | Database | database-optimizer |
 | T4 | DevOps | devops-automator |
 | T4 | Mobile | mobile-app-builder |
 | T4 | AI/ML | ai-engineer |
 | T4 | Security | security-engineer |
 | T4 | Docs | technical-writer |
 | T5 | Code review | code-reviewer |
 | T5 | Integration | testing-reality-checker |
 | T5 | API | testing-api-tester |
 | T5 | Performance | testing-performance-benchmarker |
 | T5 | Security | security-engineer |
 The roster is not fixed — T1 can select any agent from the library based on workstream needs.
 ---
 ## Adapter Layers
 Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
 ```
 Core (platform-agnostic)
 ├── team_runner      — thin bootstrap: spawn T1, monitor blackboard, handle result
 ├── blackboard       — SQLite coordination state
 ├── task_brief       — schema + validation
 └── escalation       — retry logic, failure routing
 Adapters (swappable)
 ├── llm/             — anthropic (now), openai, ollama, any API
 ├── notify/          — openclaw (now), slack, email, webhook...
 ├── vcs/             — github (now), gitlab, gitea, bare git...
 └── runtime/
    ├── standard     — openclaw sessions_spawn (T1/T2/T3)
    └── coding_agent — claude_code (T4/T5 default), codex, aider...
 ```
 Swapping providers means writing a new adapter file — nothing in core changes.
 T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
 ---
 ## Run Visibility Layer
 Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
 ### 1. Human-Readable Live Log
 Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
 ```
 [abc123] 12:30:01  T1   PLAN_START    Assessing scope: "Build webhook ingestion system"
 [abc123] 12:30:14  T1   PLAN_DONE     3 workstreams — backend-api, infra, docs (2 parallel)
 [abc123] 12:30:14  GATE APPROVAL      ⏸  Waiting on approval before T2 spawns
 [abc123] 12:31:02  GATE APPROVED      ✓  Approved — continuing
 [abc123] 12:31:03  T2   LEAD_START    Lead Architect spawned
 [abc123] 12:31:41  T2   BOUNDS_READY  Domain boundaries + shared assumptions published
 [abc123] 12:31:42  T2   SPEC_START    3 specialists spawned (parallel): backend, infra, docs
 [abc123] 12:32:15  T2   SPEC_DONE     backend-api architecture draft ready
 [abc123] 12:32:58  T2   SYNTH_DONE    Canonical architecture written to blackboard
 [abc123] 12:32:58  GATE INSPECTION    ⏸  T2 synthesis ready for review
 [abc123] 12:33:44  T3   MESH_START    backend-api: 2 squad leads negotiating task boundaries
 [abc123] 12:34:01  T3   MESH_DONE     Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
 [abc123] 12:34:02  T4   SWARM_START   5 workers spawned in parallel
 [abc123] 12:35:10  T4   DONE          worker-3 auth-middleware ✓
 [abc123] 12:35:22  T4   FAIL          worker-4 queue-client ✗  (retry 1/3)
 [abc123] 12:36:04  T4   DONE          worker-4 queue-client ✓  (retry resolved)
 [abc123] 12:36:05  T5   VERIFY_START  4 verifiers spawned
 [abc123] 12:36:45  T5   VERDICT       partial — queue-client needs rework
 [abc123] 12:37:12  T5   VERDICT       ✓  all pass — workstream backend-api done
 ```
 Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
 ### 2. Inspection Gates
 Configurable pause points. When the runner hits a gate, it:
 1. Writes a `gate_pending` event to the blackboard
 2. Fires `notify_adapter.send()` with the tier summary + gate context
 3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
 The tier summary surfaced at each gate includes:
 - **What was produced** (the tier artifact in readable form)
 - **What happens next** (which agents will spawn, doing what)
 - **Any anomalies** flagged by the tier itself
 Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
 ```yaml
 visibility:
  strict_mode: false
  log_level: normal           # normal | verbose
  inspection_gates:
    t1_plan: true             # always — required by design
    t2_lead: false            # optional — review boundaries before specialists
    t2_synthesis: true        # recommended — review architecture before implementation
    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
  gate_timeout_minutes: 60    # auto-reject if no response within this window
 ```
 ### 3. Inspection CLI — `cli/agency.py`
 ```
 agency run <config.yaml>               # start a run, returns run_id
 agency watch <run_id>                  # tail live log (follows blackboard events)
 agency inspect <run_id>                # interactive tree view of run state
 agency inspect <run_id> --tier t2      # jump to T2 artifacts
 agency inspect <run_id> --brief <id>   # show full brief + result JSON
 agency approve <run_id>                # approve current gate → continue
 agency approve <run_id> --note "..."   # approve with a note written to blackboard
 agency reject <run_id> --reason "..."  # reject → tier re-invoked
 agency pause <run_id>                  # force-pause at next tier boundary
 agency resume <run_id>                 # release a manual pause
 ```
 `agency inspect` (no flags) renders a live tree:
 ```
 Run abc123 — "Build webhook ingestion system"
 ├── T1 Plan ✓
 │   └── [view workplan]
 ├── T2 Architecture ✓  [GATE: pending review]
 │   ├── [view domain boundaries]
 │   ├── [view shared assumptions]
 │   └── [view canonical architecture]
 ├── T3 backend-api (active)
 │   ├── [view task breakdown]
 │   └── T4 workers: 3/7 done, 1 retrying, 3 pending
 └── T3 infra (pending)
 ```
 ### Blackboard Event Vocabulary (extended)
 ```python
 # existing
 "spawned" | "completed" | "failed" | "escalated" | "retried"
 # new — visibility layer
 "gate_pending"     # runner hit a gate, waiting for human
 "gate_approved"    # human approved, run continues
 "gate_rejected"    # human rejected, tier re-invoked
 "gate_paused"      # manual pause via CLI
 "gate_resumed"     # manual resume via CLI
 "path_amendment"   # mid-run tier proposed path change
 "log"              # human-readable log line (level + message)
 ```
 ---
 ## Decisions Log
 **T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
 **T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
 **T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
 **Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
 **T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
 **T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
 **T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
 **T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
 **T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
 **Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
 **Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
 **Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
 **LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
 **Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
 **Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
 **Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
 **Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
 **Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
 **T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
 **T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
 **T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
 **T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
 **Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
 **Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
 **Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
--- a/requirements.txt
+++ b/requirements.txt
@@ -10,6 +10,9 @@ pyyaml
 # Environment variable management
 python-dotenv
 # GitHub VCS adapter
 PyGithub
 # --- stdlib-only (no pip install needed) ---
 # sqlite3   — blackboard persistence
 # dataclasses — task_brief schema
Author	SHA1	Message	Date
Hans Heinemann	342832fa5e	chore: update submodule URL to Gitea	2026-04-02 10:05:09 -04:00
hansheinemann	641f122cdb	docs: add CLAUDE.md agent quick reference docs: add CLAUDE.md agent quick reference	2026-03-30 15:19:07 -04:00
hansheinemann	54afa0f53f	docs: resolve all design questions + visibility layer + portability audit docs: resolve all design questions + visibility layer + portability audit	2026-03-30 15:18:48 -04:00
hansheinemann	f228061c4d	docs: update design doc with new architecture decisions docs: update design doc with new architecture decisions	2026-03-30 15:18:30 -04:00
Hans Heinemann	1c99e40f98	docs: purge OpenClaw/Hans specifics from core design Portability audit — all platform-specific concerns moved to adapter layer: - Gate Approval UX (Resolved Mechanics): rewritten as platform-agnostic. Core: runner writes gate_pending, calls notify_adapter.send(), polls blackboard for gate_approved. Universal path: agency CLI writes directly to blackboard. Adapter handles its own inbound response bridge internally. - pending_gates.json removed from core directory structure and runner responsibilities — adapter-internal state, not a core concern. - 'User → Hans → team_runner.start()' → 'User → team_runner.start()' Core has no dependency on a specific caller. - 'notify_adapter.send(...to Andrew via Hans)' → 'notify_adapter.send()' throughout design.md and buildspec.md. - anthropic.py description: 'via OpenClaw or direct API' → 'direct API' (anthropic adapter never goes via OpenClaw) - Output/review decision: 'Hans messages Andrew' → 'notify_adapter.send()' - Run visibility decision: 'Andrew via Hans' → 'via notify_adapter.send()' - Decisions log: gate approval and visibility entries rewritten accordingly Adapter layer correctly unchanged: adapters/notify/openclaw.py — OpenClaw-specific, owns its inbound bridge adapters/runtime/openclaw.py — OpenClaw sessions_spawn, correctly isolated team.yaml example config — adapter selection is config, not core	2026-03-30 14:31:55 -04:00
Hans Heinemann	8f143e779d	docs: resolve remaining 3 design questions (spawn ownership, gate UX, mesh timeout) - Spawn calls: runner owns all runtime_adapter.spawn() calls; tiers write status=pending briefs to blackboard, runner's spawn loop acts on them. Gate logic lives in the spawn loop — no gate plumbing needed in agents. - Gate approval UX: Signal reply via Hans + direct CLI both supported. Both write gate_approved to blackboard; runner doesn't care which path. Hans uses pending_gates.json for multi-run disambiguation. - T3 mesh timeout: escalate to T2 (domain boundary problem). If T2 also exhausts retry budget, normal escalation ladder handles it. No force-commit. Add pending_gates.json to directory structure and buildspec. Update runner step in build order with full spawn loop responsibilities.	2026-03-30 14:22:39 -04:00
Hans Heinemann	a721db63f6	docs: lock in visibility layer, resolve all 5 open design questions - Resolve T3 mesh mechanics: blackboard-based draft/commit cycle - Resolve T1 plan output schema: formal JSON structure with workstreams + parallelism groups - Resolve T5 consensus: T3 aggregates joint verdict (pass/partial/fail), partial retries failed slices only - Resolve path amendment mechanism: event-based, runner notifies higher tier, no approval gate - Resolve failure handling: confirmed distributed ownership, runner owns T1 + terminal only Add run visibility layer: - Human-readable live log (normal + verbose modes) - Configurable inspection gates (t1_plan always, t2_synthesis recommended, others optional) - strict_mode flag for full gating on early runs - cli/agency.py: run, watch, inspect, approve, reject, pause, resume - gate_pending halt loop in team_runner, gate_approved/rejected resume - Expanded blackboard event vocabulary (gate_*, path_amendment, log) - t3_task_lists table for mesh coordination state - Inspection gate flow added to buildspec Key Flows Build order updated: 16 steps (added cli/ step, clarified runner gate responsibilities)	2026-03-30 13:43:19 -04:00
Hans Heinemann	882b769d21	chore: sync agency-agents submodule with upstream	2026-03-30 09:00:16 -04:00
Hans Heinemann	ce3c020de2	docs: add open design questions section	2026-03-16 20:45:47 -04:00
Hans Heinemann	b54436f474	docs: T1 two-phase lifecycle, T2 Lead Architect, shared assumptions, conflict resolution	2026-03-16 20:41:13 -04:00
Hans Heinemann	1ed7023c08	docs: update design — dynamic dispatch, distributed ownership, orchestration patterns	2026-03-16 16:13:33 -04:00
Hans Heinemann	9efbb3b010	docs: add CLAUDE.md agent quick reference	2026-03-16 15:52:44 -04:00
hansheinemann	72bd744664	docs: add design doc and buildspec (#5 )	2026-03-16 15:51:14 -04:00
hansheinemann	084cfb0bb2	feat: implement all adapter layers (#2 ) Adapters implemented: - adapters/llm/anthropic.py — Anthropic Claude SDK, capability-based model selection, max_tokens + temperature configurable via team.yaml, lazy SDK import - adapters/vcs/github.py — GitHub PR/branch operations via gh CLI - adapters/notify/openclaw.py — OpenClaw system event notifications - adapters/runtime/openclaw.py — OpenClaw sessions_spawn for agent execution - adapters/runtime/claude_code.py — Claude Code CLI for T4/T5 coding tasks All adapters follow the abstract base interfaces from Phase 1. Config-driven model selection via capability_map in team.yaml.	2026-03-16 11:45:11 -04:00
hansheinemann	ce1ce85b87	feat: expand role_registry with specialist roles + update agency-agents submodule (#4 ) Role registry changes: - T2 backend: software-architect → backend-architect - T2 frontend: software-architect → frontend-architect - T3 backend: senior-developer → senior-backend-developer (NEW) - T3 frontend: senior-developer → senior-frontend-developer (NEW) - T4 backend: backend-architect → backend-developer - T4 default: senior-developer → backend-developer - Added coverage for: ai, security, mobile, database, devops, docs, data, embedded, e2e, accessibility Submodule updated to include: frontend-architect, backend-developer, senior-backend-developer, senior-frontend-developer. Clean tier separation: T2 = architects (design) T3 = senior devs (lead + implement-or-delegate) T4 = developers (pure implementation) T5 = reviewers/testers (verification)	2026-03-16 11:44:54 -04:00