chore: update submodule URL to Gitea

docs: add CLAUDE.md agent quick reference
2026-04-02 10:05:09 -04:00 · 2026-03-30 15:19:07 -04:00 · 2026-03-30 15:18:48 -04:00 · 2026-03-30 15:18:30 -04:00 · 2026-03-30 14:31:55 -04:00 · 2026-03-30 14:22:39 -04:00
12 changed files with 2009 additions and 141 deletions
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,3 @@
 [submodule "agents"]
 	path = agents
-	url = https://github.com/coding-with-hans-heinemann/agency-agents.git
+	url = https://git.tandrewng.com/cw-hans/agency-agents.git
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,48 @@
+# CLAUDE.md — Agent Quick Reference
+
+Read this before exploring the codebase. It saves tokens.
+
+## What This Is
+
+A tiered multi-agent orchestration framework. T1 decomposes goals → T2 architects → T3 leads → T4 implements → T5 verifies. SQLite blackboard tracks state. All external dependencies (LLM, VCS, notify, runtime) are pluggable adapters.
+
+## Key Docs
+
+- `docs/design.md` — architecture decisions, tier design, key choices
+- `docs/buildspec.md` — 15-step build order, phase breakdown
+
+## Project Layout
+
+```
+core/           — task_brief.py, blackboard.py, escalation.py, team_runner.py
+adapters/base/  — abstract base classes (LLMAdapter, VCSAdapter, NotifyAdapter, RuntimeAdapter)
+adapters/llm/   — anthropic.py
+adapters/vcs/   — github.py
+adapters/notify/— openclaw.py
+adapters/runtime— openclaw.py, claude_code.py
+prompts/        — T1–T5 system prompt .md files
+config/         — team.yaml (run config), role_registry.yaml (tier→role→persona)
+agents/         — git submodule, agent persona .md files
+runs/           — per-run blackboard.db files (gitignored)
+```
+
+## Conventions
+
+- **Never commit or push directly to `main`** — always branch (`hans/...` or `feature/...`) and PR
+- New adapters: subclass the relevant `adapters/base/*.py` abstract class
+- New roles: add persona `.md` to `agents/` submodule + entry in `config/role_registry.yaml`
+- Failure handling lives in `core/escalation.py` — extend `FailureType` there
+- `TaskBrief` is the canonical work unit — all tiers pass briefs to each other
+- Blackboard is the single source of truth per run — always write events there
+
+## Current State
+
+Phase 2 adapter implementations exist. `core/team_runner.py` may still have stubs — check before assuming it's wired up end-to-end.
+
+## Running
+
+```bash
+python -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt
+python -m core.team_runner --config config/team.yaml
+```
--- a/adapters/llm/anthropic.py
+++ b/adapters/llm/anthropic.py
@@ -1,16 +1,15 @@
 """
 adapters/llm/anthropic.py
-Anthropic Claude adapter — Phase 2 stub.
+Anthropic Claude LLM adapter — Phase 2 implementation.

-TODO (Phase 2):
-  - Implement complete() using the anthropic SDK (anthropic.Anthropic client).
-  - Implement resolve_model() by reading config/team.yaml capability_map.
-  - Handle streaming responses, rate-limit retries, and token counting.
-  - Support system-prompt injection via context["system_prompt"].
-  - Map capability → model using the provider's capability_map config.
+Uses the ``anthropic`` SDK to call Claude models.  Model selection is driven
+by the capability_map in team.yaml so the adapter stays provider-agnostic in
+configuration.
 """
 from __future__ import annotations

+import os
+
 from adapters.base.llm import LLMAdapter


@@ -18,27 +17,123 @@ class AnthropicAdapter(LLMAdapter):
    """
    LLM adapter for Anthropic Claude models.

-    Reads model configuration from config/team.yaml:
-        models.provider: anthropic
-        models.capability_map.reasoning-heavy.anthropic: claude-opus-4-6
-        models.capability_map.capable.anthropic: claude-sonnet-4-6
-        models.capability_map.fast-cheap.anthropic: claude-haiku-3-5
+    Reads model configuration from the loaded team.yaml config dict::
+
+        models:
+          default_max_tokens: 4096   # fallback max_tokens for all calls
+          default_temperature: 0     # fallback temperature for all calls
+          capability_map:
+            reasoning-heavy:
+              anthropic: claude-opus-4-6
+            capable:
+              anthropic: claude-sonnet-4-6
+            fast-cheap:
+              anthropic: claude-haiku-3-5
+
+    The provider key used when looking up ``capability_map`` is hardcoded to
+    ``"anthropic"`` — the adapter knows its own provider; there is no need for
+    a separate ``models.provider`` config field.
+
+    Both ``default_max_tokens`` and ``default_temperature`` can be overridden
+    per-call via the ``context`` dict passed to :meth:`complete`.
+
+    Environment variables
+    ---------------------
+    ANTHROPIC_API_KEY : Required. Authenticates with the Anthropic API.
    """

    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
-        # Extract API key from environment (ANTHROPIC_API_KEY).
-        # Initialise the anthropic.Anthropic() client.
-        raise NotImplementedError("AnthropicAdapter.__init__ is not yet implemented.")
+        """
+        Initialise the Anthropic adapter.
+
+        Parameters
+        ----------
+        config : Loaded team.yaml config dict.
+
+        Raises
+        ------
+        ValueError
+            If ANTHROPIC_API_KEY is not set in the environment.
+        """
+        try:
+            import anthropic as _anthropic
+        except ModuleNotFoundError as exc:
+            raise ImportError(
+                "The 'anthropic' package is required for AnthropicAdapter. "
+                "Install it with: pip install anthropic"
+            ) from exc
+
+        self._config = config
+        api_key = os.environ.get("ANTHROPIC_API_KEY")
+        if not api_key:
+            raise ValueError(
+                "ANTHROPIC_API_KEY environment variable is not set. "
+                "Export it before running the-agency."
+            )
+        self._client = _anthropic.Anthropic(api_key=api_key)
+        self._models_cfg: dict = config.get("models", {})
+        self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
+        self._default_temperature: float = self._models_cfg.get("default_temperature", 0)

    def complete(self, prompt: str, capability: str, context: dict) -> str:
-        # TODO (Phase 2): Call anthropic client messages.create().
-        # Use resolve_model(capability) to pick the model.
-        # Support context keys: system_prompt, max_tokens, temperature.
-        # Return response text as a plain string.
-        raise NotImplementedError("AnthropicAdapter.complete is not yet implemented.")
+        """
+        Send a prompt to a Claude model and return the text response.
+
+        Parameters
+        ----------
+        prompt      : User-role prompt content.
+        capability  : One of "reasoning-heavy" | "capable" | "fast-cheap".
+        context     : Optional per-call overrides:
+                        system_prompt (str)   — prepended as the system turn.
+                        max_tokens    (int)   — defaults to models.default_max_tokens in team.yaml.
+                        temperature   (float) — defaults to models.default_temperature in team.yaml.
+
+        Returns
+        -------
+        The model's text completion as a plain string.
+        """
+        model = self.resolve_model(capability)
+        max_tokens: int = context.get("max_tokens", self._default_max_tokens)
+        temperature: float = context.get("temperature", self._default_temperature)
+        system_prompt: str = context.get("system_prompt", "")
+
+        create_kwargs: dict = {
+            "model": model,
+            "max_tokens": max_tokens,
+            "messages": [{"role": "user", "content": prompt}],
+        }
+        if system_prompt:
+            create_kwargs["system"] = system_prompt
+        if temperature != 0.0:
+            create_kwargs["temperature"] = temperature
+
+        response = self._client.messages.create(**create_kwargs)
+        return response.content[0].text

    def resolve_model(self, capability: str) -> str:
-        # TODO (Phase 2): Look up capability in team.yaml capability_map.
-        # Fall back to "capable" tier model if capability is unknown.
-        raise NotImplementedError("AnthropicAdapter.resolve_model is not yet implemented.")
+        """
+        Map a capability string to the Anthropic model identifier.
+
+        Looks up ``config.models.capability_map[capability][provider]``.
+        Falls back to the "capable" tier model if the capability is unknown.
+
+        Parameters
+        ----------
+        capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
+
+        Returns
+        -------
+        Anthropic model identifier (e.g. "claude-opus-4-6").
+        """
+        # The adapter knows its own provider — no need to read it from config.
+        cap_map: dict = self._models_cfg.get("capability_map", {})
+
+        if capability in cap_map and "anthropic" in cap_map[capability]:
+            return cap_map[capability]["anthropic"]
+
+        # Fall back to "capable" tier
+        if "capable" in cap_map and "anthropic" in cap_map["capable"]:
+            return cap_map["capable"]["anthropic"]
+
+        # Hard-coded last resort
+        return "claude-sonnet-4-6"
--- a/adapters/notify/openclaw.py
+++ b/adapters/notify/openclaw.py
@@ -1,35 +1,93 @@
 """
 adapters/notify/openclaw.py
-OpenClaw notification adapter — Phase 2 stub.
+OpenClaw notification adapter — Phase 2 implementation.

-TODO (Phase 2):
-  - Implement send() to dispatch notifications via the OpenClaw API.
-  - Support context keys: channel, severity, run_id, brief_id.
-  - Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
-  - Handle rate limiting and delivery retries.
+Sends notifications by shelling out to the ``openclaw`` CLI::
+
+    openclaw system event --text "<message>" --mode now
+
+If the binary is not on PATH the method logs a warning and returns without
+raising — notifications are best-effort and should never crash the pipeline.
 """
 from __future__ import annotations

+import logging
+import os
+import subprocess
+
 from adapters.base.notify import NotifyAdapter

+logger = logging.getLogger(__name__)
+

 class OpenClawNotifyAdapter(NotifyAdapter):
    """
-    Notification adapter that sends messages via OpenClaw.
+    Notification adapter that dispatches messages via the ``openclaw`` CLI.

-    Expects environment variables:
-        OPENCLAW_API_KEY  — authentication token
-        OPENCLAW_URL      — base URL for the OpenClaw API (optional, defaults to hosted)
+    Environment variables
+    ---------------------
+    OPENCLAW_SIGNAL_NUMBER : Optional. Direct signal target for OpenClaw sends.
    """

    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
-        # Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
-        # Initialise an HTTP client (e.g. httpx or requests).
-        raise NotImplementedError("OpenClawNotifyAdapter.__init__ is not yet implemented.")
+        """
+        Initialise the OpenClaw notification adapter.
+
+        Parameters
+        ----------
+        config : Loaded team.yaml config dict (reserved for future options).
+        """
+        self._config = config
+        self._signal_number: str = os.environ.get("OPENCLAW_SIGNAL_NUMBER", "")

    def send(self, message: str, context: dict) -> None:
-        # TODO (Phase 2): POST notification payload to OpenClaw API.
-        # Include message, context (channel, severity, run_id, brief_id).
-        # Log delivery confirmation or raise on failure.
-        raise NotImplementedError("OpenClawNotifyAdapter.send is not yet implemented.")
+        """
+        Send a notification via ``openclaw system event``.
+
+        Parameters
+        ----------
+        message : Human-readable notification text.
+        context : Optional metadata.  Recognised keys:
+                    level    (str) — "info" | "warning" | "error"; logged locally.
+                    run_id   (str) — included in the local log record.
+                    brief_id (str) — included in the local log record.
+
+        Notes
+        -----
+        If the ``openclaw`` binary is not present on PATH, the method logs a
+        warning and returns silently.  Notifications are best-effort.
+        """
+        level: str = context.get("level", "info")
+        run_id: str = context.get("run_id", "")
+        brief_id: str = context.get("brief_id", "")
+
+        # Always log locally regardless of CLI availability.
+        log_msg = "[notify:%s] %s (run=%s brief=%s)" % (level, message, run_id, brief_id)
+        if level == "error":
+            logger.error(log_msg)
+        elif level == "warning":
+            logger.warning(log_msg)
+        else:
+            logger.info(log_msg)
+
+        cmd = ["openclaw", "system", "event", "--text", message, "--mode", "now"]
+        try:
+            result = subprocess.run(
+                cmd,
+                capture_output=True,
+                text=True,
+                timeout=30,
+            )
+            if result.returncode != 0:
+                logger.warning(
+                    "openclaw event returned non-zero exit %d: %s",
+                    result.returncode,
+                    result.stderr.strip(),
+                )
+        except FileNotFoundError:
+            logger.warning(
+                "openclaw CLI not found on PATH; notification not delivered: %s",
+                message,
+            )
+        except subprocess.TimeoutExpired:
+            logger.warning("openclaw event timed out for message: %s", message)
--- a/adapters/runtime/claude_code.py
+++ b/adapters/runtime/claude_code.py
@@ -1,51 +1,163 @@
 """
 adapters/runtime/claude_code.py
-Claude Code agent runtime adapter — Phase 2 stub.
+Claude Code sub-agent runtime adapter — Phase 2 implementation.

-TODO (Phase 2):
-  - Implement spawn() to launch a Claude Code sub-agent via the Agent SDK.
-  - Implement get_result() to await agent completion and parse the output.
-  - Implement kill() to terminate the sub-agent process or session.
-  - Map task brief context (files, constraints, artifacts) into the agent's
-    system prompt and tool context.
-  - Handle Claude Code tool-use responses and extract structured output.
+Spawns the ``claude`` CLI as a non-interactive subprocess for T4/T5
+implementation tasks::
+
+    claude --permission-mode bypassPermissions --print "<task>"
+
+Each spawned process is tracked by a UUID job_id so callers can later poll
+for the result or terminate the job.  Stdout is captured and returned as the
+agent output; stderr is included for debugging.
 """
 from __future__ import annotations

+import logging
+import subprocess
+import tempfile
+import threading
+import uuid
+
 from adapters.base.runtime import RuntimeAdapter

+logger = logging.getLogger(__name__)
+

 class ClaudeCodeRuntimeAdapter(RuntimeAdapter):
    """
-    Runtime adapter that spawns Claude Code sub-agents for coding tasks.
+    Runtime adapter that spawns ``claude`` CLI sub-agents for coding tasks.

-    Used when a TaskBrief has preferred_runtime == "coding_agent".
+    Credentials are inherited from the environment (``ANTHROPIC_API_KEY``).
+    The ``claude`` CLI must be installed and reachable on PATH.

-    Expects the Claude Code CLI / Agent SDK to be available in the environment.
-    Credentials are inherited from the environment (ANTHROPIC_API_KEY).
+    Used when a TaskBrief has ``preferred_runtime == "coding_agent"``.
    """

    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
-        # Validate that Claude Code CLI or SDK is accessible.
-        # Initialise any agent session management state.
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.__init__ is not yet implemented.")
+        """
+        Initialise the Claude Code runtime adapter.
+
+        Parameters
+        ----------
+        config : Loaded team.yaml config dict (reserved for future options).
+        """
+        self._config = config
+        # Maps job_id → running Popen instance.
+        self._jobs: dict[str, subprocess.Popen] = {}
+        self._lock = threading.Lock()
+
+    # ------------------------------------------------------------------
+    # RuntimeAdapter interface
+    # ------------------------------------------------------------------

    def spawn(self, task: str, capability: str, context: dict) -> str:
-        # TODO (Phase 2): Launch a Claude Code sub-agent.
-        # Compose a structured system prompt from task + context.
-        # Inject relevant files and constraints as tool context.
-        # Return an agent_id that maps to a running agent session.
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.spawn is not yet implemented.")
+        """
+        Launch ``claude --permission-mode bypassPermissions --print "<task>"``
+        as a non-interactive subprocess.
+
+        Parameters
+        ----------
+        task       : Full task description (typically a JSON-serialised brief).
+        capability : Capability hint (not forwarded; Claude Code resolves its
+                     own model from the local environment).
+        context    : Optional keys:
+                       workdir (str) — cwd for the subprocess.  A fresh
+                                       temporary directory is created if omitted.
+
+        Returns
+        -------
+        A UUID job_id string that uniquely identifies this subprocess.
+        """
+        workdir: str = context.get("workdir") or tempfile.mkdtemp(
+            prefix="agency-claude-"
+        )
+        job_id = str(uuid.uuid4())
+        logger.info("Spawning Claude Code job %s in %s", job_id, workdir)
+
+        proc = subprocess.Popen(
+            ["claude", "--permission-mode", "bypassPermissions", "--print", task],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            text=True,
+            cwd=workdir,
+        )
+
+        with self._lock:
+            self._jobs[job_id] = proc
+
+        return job_id

    def get_result(self, agent_id: str, timeout_s: int) -> dict:
-        # TODO (Phase 2): Await the Claude Code agent session to complete.
-        # Parse the agent's final message for structured JSON output.
-        # Return dict with: {"status": ..., "output": ..., "artifacts": [...]}.
-        # Raise TimeoutError if timeout_s elapses.
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.get_result is not yet implemented.")
+        """
+        Wait for the Claude Code subprocess to complete and return its output.
+
+        Parameters
+        ----------
+        agent_id  : Job id returned by spawn().
+        timeout_s : Maximum seconds to wait before raising TimeoutError.
+
+        Returns
+        -------
+        dict with keys:
+            status    ("completed" | "failed")
+            output    (str — full stdout)
+            artifacts (list — always empty; callers must parse output)
+            stderr    (str — full stderr)
+
+        Raises
+        ------
+        KeyError
+            If agent_id does not correspond to a known job.
+        TimeoutError
+            If the subprocess does not finish within timeout_s seconds.
+        """
+        with self._lock:
+            proc = self._jobs.get(agent_id)
+
+        if proc is None:
+            raise KeyError(f"No Claude Code job found for agent_id={agent_id!r}")
+
+        try:
+            stdout, stderr = proc.communicate(timeout=timeout_s)
+        except subprocess.TimeoutExpired:
+            proc.kill()
+            stdout, stderr = proc.communicate()
+            raise TimeoutError(
+                f"Claude Code job {agent_id!r} did not complete within {timeout_s}s."
+            )
+
+        status = "completed" if proc.returncode == 0 else "failed"
+        logger.info(
+            "Claude Code job %s finished: status=%s returncode=%d",
+            agent_id,
+            status,
+            proc.returncode,
+        )
+
+        return {
+            "status": status,
+            "output": stdout,
+            "artifacts": [],
+            "stderr": stderr,
+        }

    def kill(self, agent_id: str) -> None:
-        # TODO (Phase 2): Terminate the Claude Code agent session.
-        # Clean up any temporary files or session state.
-        raise NotImplementedError("ClaudeCodeRuntimeAdapter.kill is not yet implemented.")
+        """
+        Terminate a running Claude Code subprocess.
+
+        Silently succeeds if the job has already finished or the id is unknown.
+
+        Parameters
+        ----------
+        agent_id : Job id returned by spawn().
+        """
+        with self._lock:
+            proc = self._jobs.get(agent_id)
+
+        if proc is not None:
+            try:
+                proc.terminate()
+                logger.info("Terminated Claude Code job %s", agent_id)
+            except OSError:
+                pass  # Process already gone — that is fine.
--- a/adapters/runtime/openclaw.py
+++ b/adapters/runtime/openclaw.py
@@ -1,48 +1,241 @@
 """
 adapters/runtime/openclaw.py
-OpenClaw agent runtime adapter — Phase 2 stub.
+OpenClaw agent runtime adapter — Phase 2 implementation.

-TODO (Phase 2):
-  - Implement spawn() to submit a task to an OpenClaw worker pool.
-  - Implement get_result() to poll or subscribe for agent completion.
-  - Implement kill() to cancel a running OpenClaw agent job.
-  - Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
-  - Map capability hint to an appropriate worker class/queue.
+Spawns sub-agents by shelling out to the ``openclaw`` CLI::
+
+    openclaw session spawn --task "<task>" --mode run
+    openclaw session get   <session_id>
+    openclaw session kill  <session_id>
+
+If the ``openclaw`` binary is unavailable, all methods raise
+``NotImplementedError`` with a helpful message rather than crashing with a
+raw ``FileNotFoundError``.
 """
 from __future__ import annotations

+import json
+import logging
+import re
+import subprocess
+import time
+
 from adapters.base.runtime import RuntimeAdapter

+logger = logging.getLogger(__name__)
+
+# Status strings from the openclaw CLI that indicate a session has finished.
+_TERMINAL_STATUSES = frozenset(
+    {"done", "completed", "failed", "partial", "blocked", "error"}
+)
+

 class OpenClawRuntimeAdapter(RuntimeAdapter):
    """
-    Runtime adapter that dispatches agent tasks to OpenClaw workers.
+    Runtime adapter that dispatches agent tasks to OpenClaw worker sessions.

-    Expects environment variables:
-        OPENCLAW_API_KEY  — authentication token
-        OPENCLAW_URL      — base URL for the OpenClaw API
+    All interactions use the ``openclaw`` CLI.  No additional credentials are
+    required beyond what OpenClaw manages in the local environment.
    """

    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
-        # Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
-        # Initialise HTTP client and any job-tracking state.
-        raise NotImplementedError("OpenClawRuntimeAdapter.__init__ is not yet implemented.")
+        """
+        Initialise the OpenClaw runtime adapter.
+
+        Parameters
+        ----------
+        config : Loaded team.yaml config dict (reserved for future options).
+        """
+        self._config = config
+
+    # ------------------------------------------------------------------
+    # RuntimeAdapter interface
+    # ------------------------------------------------------------------

    def spawn(self, task: str, capability: str, context: dict) -> str:
-        # TODO (Phase 2): Submit task to OpenClaw worker pool.
-        # Map capability ("reasoning-heavy" | "capable" | "fast-cheap") to
-        # an appropriate worker queue or model hint.
-        # Return an agent_id string that can be used to poll for results.
-        raise NotImplementedError("OpenClawRuntimeAdapter.spawn is not yet implemented.")
+        """
+        Spawn an OpenClaw agent session for the given task.
+
+        Parameters
+        ----------
+        task       : Natural-language task description.
+        capability : Capability hint ("reasoning-heavy" | "capable" | "fast-cheap").
+                     Passed informally; actual routing is handled by OpenClaw.
+        context    : Arbitrary context bag (currently unused by this adapter).
+
+        Returns
+        -------
+        session_id string parsed from the CLI output.
+
+        Raises
+        ------
+        NotImplementedError
+            If the ``openclaw`` CLI is not available on PATH.
+        RuntimeError
+            If the session_id cannot be parsed from the CLI output.
+        """
+        # TODO: map capability to an openclaw worker tier / model hint if the
+        # openclaw CLI gains that flag in a future release.
+        cmd = ["openclaw", "session", "spawn", "--task", task, "--mode", "run"]
+        try:
+            result = subprocess.run(
+                cmd,
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+        except FileNotFoundError:
+            raise NotImplementedError(
+                "openclaw CLI not found on PATH. "
+                "Install OpenClaw or configure a different runtime adapter "
+                "(e.g. adapters.runtime.claude_code.ClaudeCodeRuntimeAdapter)."
+            )
+        except subprocess.CalledProcessError as exc:
+            raise RuntimeError(
+                f"openclaw session spawn failed (exit {exc.returncode}): "
+                f"{exc.stderr.strip()}"
+            ) from exc
+
+        return self._parse_session_id(result.stdout)

    def get_result(self, agent_id: str, timeout_s: int) -> dict:
-        # TODO (Phase 2): Poll or long-poll the OpenClaw API for job completion.
-        # Raise TimeoutError if timeout_s elapses before the job finishes.
-        # Return a dict with at minimum: {"status": ..., "output": ..., "artifacts": [...]}.
-        raise NotImplementedError("OpenClawRuntimeAdapter.get_result is not yet implemented.")
+        """
+        Poll ``openclaw session get`` until the session reaches a terminal
+        state or *timeout_s* seconds elapse.
+
+        Parameters
+        ----------
+        agent_id  : Session ID returned by spawn().
+        timeout_s : Maximum seconds to wait before raising TimeoutError.
+
+        Returns
+        -------
+        dict with keys: ``status``, ``output``, ``artifacts``.
+
+        Raises
+        ------
+        TimeoutError
+            If the session does not finish within timeout_s seconds.
+        NotImplementedError
+            If the ``openclaw`` CLI is not available on PATH.
+        """
+        deadline = time.monotonic() + timeout_s
+        poll_interval = 2.0
+
+        while time.monotonic() < deadline:
+            try:
+                result = subprocess.run(
+                    ["openclaw", "session", "get", agent_id],
+                    capture_output=True,
+                    text=True,
+                    timeout=15,
+                )
+            except FileNotFoundError:
+                raise NotImplementedError(
+                    "openclaw CLI not found on PATH. "
+                    "Install OpenClaw or switch to a different runtime adapter."
+                )
+            except subprocess.TimeoutExpired:
+                logger.debug("openclaw session get timed out; will retry")
+                time.sleep(poll_interval)
+                continue
+
+            if result.returncode == 0 and result.stdout.strip():
+                parsed = self._parse_get_output(result.stdout)
+                if parsed.get("status", "").lower() in _TERMINAL_STATUSES:
+                    return parsed
+            else:
+                logger.debug(
+                    "openclaw session get returned exit=%d; retrying. stderr=%s",
+                    result.returncode,
+                    result.stderr.strip(),
+                )
+
+            time.sleep(poll_interval)
+
+        raise TimeoutError(
+            f"Agent {agent_id!r} did not complete within {timeout_s}s."
+        )

    def kill(self, agent_id: str) -> None:
-        # TODO (Phase 2): Send a cancellation request to the OpenClaw API.
-        # Silently succeed if the agent has already finished.
-        raise NotImplementedError("OpenClawRuntimeAdapter.kill is not yet implemented.")
+        """
+        Terminate an OpenClaw session unconditionally.
+
+        Silently succeeds if the session has already finished.
+
+        Parameters
+        ----------
+        agent_id : Session ID returned by spawn().
+
+        Raises
+        ------
+        NotImplementedError
+            If the ``openclaw`` CLI is not available on PATH.
+        """
+        try:
+            subprocess.run(
+                ["openclaw", "session", "kill", agent_id],
+                capture_output=True,
+                text=True,
+                timeout=15,
+            )
+        except FileNotFoundError:
+            raise NotImplementedError(
+                "openclaw CLI not found on PATH. "
+                "Install OpenClaw or switch to a different runtime adapter."
+            )
+        except subprocess.TimeoutExpired:
+            logger.warning("openclaw session kill timed out for agent %s", agent_id)
+
+    # ------------------------------------------------------------------
+    # Private helpers
+    # ------------------------------------------------------------------
+
+    def _parse_session_id(self, output: str) -> str:
+        """Extract a session_id from the raw stdout of ``openclaw session spawn``."""
+        output = output.strip()
+
+        # Prefer structured JSON output.
+        try:
+            data = json.loads(output)
+            for key in ("session_id", "sessionId", "id"):
+                if key in data:
+                    return str(data[key])
+        except (json.JSONDecodeError, TypeError):
+            pass
+
+        # Regex: look for "session_id: <id>" or similar.
+        m = re.search(
+            r"(?:session[_\s]?id|sessionId)[:\s]+([a-zA-Z0-9_\-]+)",
+            output,
+            re.IGNORECASE,
+        )
+        if m:
+            return m.group(1)
+
+        # Last resort: return the first non-empty line.
+        lines = [ln.strip() for ln in output.splitlines() if ln.strip()]
+        if lines:
+            return lines[0]
+
+        raise RuntimeError(
+            f"Could not parse session_id from openclaw output: {output!r}"
+        )
+
+    def _parse_get_output(self, output: str) -> dict:
+        """Parse the stdout of ``openclaw session get`` into a result dict."""
+        output = output.strip()
+        try:
+            data = json.loads(output)
+            return {
+                "status": data.get("status", "done"),
+                "output": data.get("output", output),
+                "artifacts": data.get("artifacts", []),
+            }
+        except (json.JSONDecodeError, TypeError):
+            # Non-JSON output — treat as completed with raw text output.
+            return {
+                "status": "done",
+                "output": output,
+                "artifacts": [],
+            }
--- a/adapters/vcs/github.py
+++ b/adapters/vcs/github.py
@@ -1,16 +1,30 @@
 """
 adapters/vcs/github.py
-GitHub VCS adapter — Phase 2 stub.
+GitHub VCS adapter — Phase 2 implementation.

-TODO (Phase 2):
-  - Implement create_branch() using PyGithub or gh CLI subprocess.
-  - Implement commit() — stage files and push via git subprocess or API.
-  - Implement create_pr() using GitHub REST API (POST /repos/{owner}/{repo}/pulls).
-  - Implement get_pr_status() using GET /repos/{owner}/{repo}/pulls/{pull_number}.
-  - Read repo and credentials from config/team.yaml and environment (GITHUB_TOKEN).
+Uses PyGithub (``pip install PyGithub``) to interact with the GitHub REST API.
+Reads the repository URL and base branch from the team.yaml config dict.
+
+Note on commit() signature
+--------------------------
+The base class declares ``commit(files: list[str], message: str)``, which is
+insufficient for the GitHub Contents API (which requires file *content*, not
+just paths).  This implementation extends the signature to accept either:
+
+* ``dict[str, str]`` — ``{path: content}`` mapping (preferred; uses the API).
+* ``list[str]``      — local file paths; content is read from disk and pushed.
+
+The optional ``branch`` keyword argument targets a specific branch; it
+defaults to the configured base branch.
 """
 from __future__ import annotations

+import os
+import re
+from typing import Union
+
+from github import Github, GithubException
+
 from adapters.base.vcs import VCSAdapter


@@ -18,34 +32,175 @@ class GitHubAdapter(VCSAdapter):
    """
    VCS adapter for GitHub repositories.

-    Expects environment variable GITHUB_TOKEN and config values:
-        run.repo        — SSH or HTTPS clone URL
-        run.base_branch — default base branch (e.g. "main")
+    Authenticates via GITHUB_TOKEN and interacts with the GitHub REST API
+    through PyGithub.
+
+    Environment variables
+    ---------------------
+    GITHUB_TOKEN : Required. Personal access token or GitHub App installation token.
+
+    Config keys (from team.yaml)
+    ----------------------------
+    run.repo        : SSH or HTTPS clone URL (e.g. "git@github.com:org/repo.git").
+    run.base_branch : Default base branch (e.g. "main").
    """

    def __init__(self, config: dict) -> None:
-        # TODO (Phase 2): Accept loaded team.yaml config dict.
-        # Extract GITHUB_TOKEN from environment.
-        # Parse owner/repo from config.run.repo.
-        raise NotImplementedError("GitHubAdapter.__init__ is not yet implemented.")
+        """
+        Initialise the GitHub adapter.
+
+        Parameters
+        ----------
+        config : Loaded team.yaml config dict.
+
+        Raises
+        ------
+        ValueError
+            If GITHUB_TOKEN is not set or the repo URL cannot be parsed.
+        """
+        self._config = config
+        token = os.environ.get("GITHUB_TOKEN")
+        if not token:
+            raise ValueError(
+                "GITHUB_TOKEN environment variable is not set. "
+                "Create a personal access token and export it before running the-agency."
+            )
+        self._g = Github(token)
+
+        run_cfg: dict = config.get("run", {})
+        repo_url: str = run_cfg.get("repo", "")
+        self._base_branch: str = run_cfg.get("base_branch", "main")
+
+        self._owner, self._repo_name = self._parse_repo_url(repo_url)
+        self._repo = self._g.get_repo(f"{self._owner}/{self._repo_name}")
+
+    # ------------------------------------------------------------------
+    # Helpers
+    # ------------------------------------------------------------------
+
+    def _parse_repo_url(self, url: str) -> tuple[str, str]:
+        """Parse *owner* and *repo* name from an SSH or HTTPS GitHub URL."""
+        # git@github.com:owner/repo.git
+        m = re.match(r"git@github\.com:([^/]+)/([^/]+?)(?:\.git)?$", url)
+        if m:
+            return m.group(1), m.group(2)
+        # https://github.com/owner/repo[.git]
+        m = re.match(r"https?://github\.com/([^/]+)/([^/]+?)(?:\.git)?/?$", url)
+        if m:
+            return m.group(1), m.group(2)
+        raise ValueError(
+            f"Cannot parse GitHub owner/repo from URL: {url!r}. "
+            "Expected SSH (git@github.com:owner/repo.git) or "
+            "HTTPS (https://github.com/owner/repo.git) format."
+        )
+
+    # ------------------------------------------------------------------
+    # VCSAdapter interface
+    # ------------------------------------------------------------------

    def create_branch(self, name: str) -> None:
-        # TODO (Phase 2): Create branch via GitHub API or local git subprocess.
-        # Use config.run.base_branch as the branch point.
-        raise NotImplementedError("GitHubAdapter.create_branch is not yet implemented.")
+        """
+        Create a new branch off ``self._base_branch`` on the remote.

-    def commit(self, files: list[str], message: str) -> str:
-        # TODO (Phase 2): Stage files (git add), create commit (git commit), push.
-        # Return the resulting commit SHA.
-        raise NotImplementedError("GitHubAdapter.commit is not yet implemented.")
+        Parameters
+        ----------
+        name : New branch name (e.g. "feat/webhook-ingestion").
+        """
+        base_ref = self._repo.get_git_ref(f"heads/{self._base_branch}")
+        self._repo.create_git_ref(f"refs/heads/{name}", base_ref.object.sha)
+
+    def commit(
+        self,
+        files: Union[dict[str, str], list[str]],
+        message: str,
+        branch: str | None = None,
+    ) -> str:
+        """
+        Commit files to the repository via the GitHub Contents API.
+
+        Parameters
+        ----------
+        files   : Either a ``dict[path, content]`` mapping (preferred), or a
+                  ``list[path]`` of local file paths whose content is read from
+                  disk.
+        message : Commit message.
+        branch  : Target branch.  Defaults to ``self._base_branch``.
+
+        Returns
+        -------
+        SHA of the last created/updated commit, or empty string if no files
+        were committed.
+        """
+        target_branch = branch or self._base_branch
+
+        # Normalise to {path: content}
+        if isinstance(files, list):
+            files_dict: dict[str, str] = {}
+            for path in files:
+                with open(path, "r", encoding="utf-8") as fh:
+                    files_dict[path] = fh.read()
+        else:
+            files_dict = files
+
+        last_sha: str = ""
+        for path, content in files_dict.items():
+            try:
+                existing = self._repo.get_contents(path, ref=target_branch)
+                result = self._repo.update_file(
+                    path=path,
+                    message=message,
+                    content=content,
+                    sha=existing.sha,  # type: ignore[union-attr]
+                    branch=target_branch,
+                )
+            except GithubException:
+                # File does not exist yet — create it
+                result = self._repo.create_file(
+                    path=path,
+                    message=message,
+                    content=content,
+                    branch=target_branch,
+                )
+            last_sha = result["commit"].sha
+
+        return last_sha

    def create_pr(self, title: str, body: str, head: str, base: str) -> str:
-        # TODO (Phase 2): POST to GitHub API /repos/{owner}/{repo}/pulls.
-        # Return the HTML URL of the created PR.
-        raise NotImplementedError("GitHubAdapter.create_pr is not yet implemented.")
+        """
+        Open a pull request on GitHub.
+
+        Parameters
+        ----------
+        title : PR title.
+        body  : PR description / body markdown.
+        head  : Head branch name (the branch with changes).
+        base  : Base branch name (e.g. "main").
+
+        Returns
+        -------
+        HTML URL of the created pull request.
+        """
+        pr = self._repo.create_pull(
+            title=title,
+            body=body,
+            head=head,
+            base=base,
+        )
+        return pr.html_url

    def get_pr_status(self, pr_id: str) -> str:
-        # TODO (Phase 2): GET /repos/{owner}/{repo}/pulls/{number}.
-        # Map GitHub PR state ("open", "closed") + merged flag to
-        # our schema: "open" | "merged" | "closed".
-        raise NotImplementedError("GitHubAdapter.get_pr_status is not yet implemented.")
+        """
+        Fetch the current status of a pull request.
+
+        Parameters
+        ----------
+        pr_id : Pull request number as a string (e.g. "42").
+
+        Returns
+        -------
+        One of: "open" | "merged" | "closed".
+        """
+        pr = self._repo.get_pull(int(pr_id))
+        if pr.merged:
+            return "merged"
+        return pr.state  # "open" or "closed"
--- a/2
+++ b/2
--- a/config/role_registry.yaml
+++ b/config/role_registry.yaml
@@ -2,33 +2,49 @@ t1:
  default: agents/strategy/nexus-strategy.md

 t2:
-  backend:  agents/engineering/engineering-software-architect.md
-  frontend: agents/engineering/engineering-software-architect.md
+  backend:  agents/engineering/engineering-backend-architect.md
+  frontend: agents/engineering/engineering-frontend-architect.md
  infra:    agents/engineering/engineering-devops-automator.md
  data:     agents/engineering/engineering-data-engineer.md
+  ai:       agents/engineering/engineering-software-architect.md
+  security: agents/engineering/engineering-security-engineer.md
+  mobile:   agents/engineering/engineering-software-architect.md
  default:  agents/engineering/engineering-software-architect.md

 t3:
-  backend:  agents/engineering/engineering-senior-developer.md
-  frontend: agents/engineering/engineering-senior-developer.md
+  backend:  agents/engineering/engineering-senior-backend-developer.md
+  frontend: agents/engineering/engineering-senior-frontend-developer.md
  infra:    agents/engineering/engineering-sre.md
-  default:  agents/engineering/engineering-senior-developer.md
+  data:     agents/engineering/engineering-data-engineer.md
+  ai:       agents/engineering/engineering-ai-engineer.md
+  security: agents/engineering/engineering-security-engineer.md
+  mobile:   agents/engineering/engineering-mobile-app-builder.md
+  database: agents/engineering/engineering-database-optimizer.md
+  devops:   agents/engineering/engineering-sre.md
+  docs:     agents/engineering/engineering-technical-writer.md
+  default:  agents/engineering/engineering-backend-developer.md

 t4:
  frontend:  agents/engineering/engineering-frontend-developer.md
-  backend:   agents/engineering/engineering-backend-architect.md
+  backend:   agents/engineering/engineering-backend-developer.md
  database:  agents/engineering/engineering-database-optimizer.md
  devops:    agents/engineering/engineering-devops-automator.md
  mobile:    agents/engineering/engineering-mobile-app-builder.md
  ai:        agents/engineering/engineering-ai-engineer.md
  security:  agents/engineering/engineering-security-engineer.md
  docs:      agents/engineering/engineering-technical-writer.md
-  default:   agents/engineering/engineering-senior-developer.md
+  data:      agents/engineering/engineering-data-engineer.md
+  embedded:  agents/engineering/engineering-embedded-firmware-engineer.md
+  default:   agents/engineering/engineering-backend-developer.md

 t5:
-  code:        agents/engineering/engineering-code-reviewer.md
-  integration: agents/testing/testing-reality-checker.md
-  api:         agents/testing/testing-api-tester.md
-  performance: agents/testing/testing-performance-benchmarker.md
-  security:    agents/engineering/engineering-security-engineer.md
-  default:     agents/engineering/engineering-code-reviewer.md
+  code:          agents/engineering/engineering-code-reviewer.md
+  integration:   agents/testing/testing-reality-checker.md
+  api:           agents/testing/testing-api-tester.md
+  performance:   agents/testing/testing-performance-benchmarker.md
+  security:      agents/engineering/engineering-security-engineer.md
+  accessibility: agents/testing/testing-accessibility-auditor.md
+  e2e:           agents/testing/testing-evidence-collector.md
+  frontend:      agents/testing/testing-accessibility-auditor.md
+  data:          agents/testing/testing-reality-checker.md
+  default:       agents/engineering/engineering-code-reviewer.md
--- a/docs/buildspec.md
+++ b/docs/buildspec.md
@@ -0,0 +1,507 @@
+# Tiered Agent Team System — Build Spec
+
+_Started: 2026-03-15. Last updated: 2026-03-30._
+_See design.md for the design doc and decisions log._
+
+---
+
+## Language & Runtime
+
+**Python 3.11+.** Reasons:
+- Agent/AI tooling is Python-first
+- Clean type hints + dataclasses for schemas
+- Agents can read and modify their own orchestration code
+- Runs anywhere — no Node, no OpenClaw dependency
+
+---
+
+## Repository
+
+Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`
+
+Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
+
+---
+
+## Directory Structure
+
+```
+agent-teams/
+├── core/
+│   ├── team_runner.py       — run lifecycle, agent spawning
+│   ├── blackboard.py        — SQLite coordination state
+│   ├── task_brief.py        — schema + validation
+│   └── escalation.py        — retry logic, failure routing
+│
+├── adapters/
+│   ├── base/
+│   │   ├── llm.py           — abstract LLM interface
+│   │   ├── vcs.py           — abstract VCS interface
+│   │   ├── notify.py        — abstract notification interface
+│   │   └── runtime.py       — abstract agent runtime interface
+│   ├── llm/
+│   │   ├── anthropic.py     — Claude via direct Anthropic API
+│   │   ├── openai.py        — GPT / o-series
+│   │   └── ollama.py        — local models
+│   ├── vcs/
+│   │   └── github.py
+│   ├── notify/
+│   │   └── openclaw.py      — messages Hans who notifies Andrew
+│   └── runtime/
+│       ├── openclaw.py      — sessions_spawn (general purpose)
+│       └── claude_code.py   — coding agent runtime (file/git/exec tools)
+│
+├── agents/                  — git submodule: msitarzewski/agency-agents
+│   ├── engineering/
+│   ├── testing/
+│   ├── strategy/
+│   └── ...                  — full agency-agents roster
+│
+├── prompts/
+│   ├── t1_visionary.md      — fallback if no agent_personality set
+│   ├── t2_architect.md
+│   ├── t3_squad_lead.md
+│   ├── t4_implementer.md
+│   └── t5_verifier.md
+│
+├── config/
+│   ├── team.yaml            — example run configuration
+│   └── role_registry.yaml   — maps (tier, domain) → agent personality file
+│
+├── cli/
+│   └── agency.py            — run, watch, inspect, approve, reject, pause, resume
+│
+├── runs/                    — runtime state, one subdir per run_id
+│   └── .gitkeep
+│
+└── README.md
+```
+
+---
+
+## Blackboard
+
+SQLite. One file per run at `runs/<run_id>/blackboard.db`.
+
+### Tables
+
+**runs**
+```sql
+CREATE TABLE runs (
+    run_id      TEXT PRIMARY KEY,
+    goal        TEXT NOT NULL,
+    status      TEXT NOT NULL,  -- pending | active | review | done | failed
+    created_at  TEXT NOT NULL,
+    updated_at  TEXT NOT NULL
+);
+```
+
+**workstreams**
+```sql
+CREATE TABLE workstreams (
+    workstream_id   TEXT PRIMARY KEY,
+    run_id          TEXT NOT NULL,
+    name            TEXT NOT NULL,
+    tier            INTEGER NOT NULL,
+    status          TEXT NOT NULL,  -- pending | active | blocked | done | failed
+    owner_agent_id  TEXT,
+    created_at      TEXT NOT NULL,
+    updated_at      TEXT NOT NULL
+);
+```
+
+**briefs**
+```sql
+CREATE TABLE briefs (
+    brief_id        TEXT PRIMARY KEY,
+    run_id          TEXT NOT NULL,
+    parent_brief_id TEXT,
+    workstream_id   TEXT,
+    tier            INTEGER NOT NULL,
+    role            TEXT NOT NULL,
+    status          TEXT NOT NULL,  -- pending | active | done | failed
+    payload         TEXT NOT NULL,  -- full JSON brief
+    result          TEXT,           -- JSON result when done
+    retry_count     INTEGER DEFAULT 0,
+    created_at      TEXT NOT NULL,
+    updated_at      TEXT NOT NULL
+);
+```
+
+**events**
+```sql
+CREATE TABLE events (
+    event_id    TEXT PRIMARY KEY,
+    run_id      TEXT NOT NULL,
+    brief_id    TEXT,
+    kind        TEXT NOT NULL,  -- see event vocabulary below
+    detail      TEXT,           -- JSON
+    created_at  TEXT NOT NULL
+);
+```
+
+**Event kind vocabulary:**
+```
+-- lifecycle
+spawned | completed | failed | escalated | retried
+
+-- visibility / gates
+gate_pending    -- runner hit an inspection gate, waiting for human
+gate_approved   -- human approved via CLI or notify
+gate_rejected   -- human rejected, tier re-invoked
+gate_paused     -- manual pause via CLI
+gate_resumed    -- manual resume via CLI
+
+-- amendments / informational
+path_amendment  -- mid-run tier proposed a tier path change
+log             -- human-readable log line (detail: {level, message})
+```
+
+**t3_task_lists** *(T3 mesh coordination)*
+```sql
+CREATE TABLE t3_task_lists (
+    entry_id        TEXT PRIMARY KEY,
+    run_id          TEXT NOT NULL,
+    workstream_id   TEXT NOT NULL,
+    t3_agent_id     TEXT NOT NULL,
+    status          TEXT NOT NULL,  -- draft | committed
+    tasks           TEXT NOT NULL,  -- JSON array of proposed T4 task descriptors
+    created_at      TEXT NOT NULL,
+    updated_at      TEXT NOT NULL
+);
+```
+
+---
+
+## Task Brief Schema
+
+Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
+
+```json
+{
+  "brief_id": "uuid",
+  "run_id": "uuid",
+  "parent_brief_id": "uuid | null",
+  "tier": 4,
+  "role": "implementer",
+  "goal_anchor": "Original T1 intent — always propagated unchanged",
+  "workstream": "backend-api",
+  "task": "Implement POST /webhooks/ingest endpoint",
+  "acceptance_criteria": [
+    "Accepts JSON payload",
+    "Returns 202 on success",
+    "Writes to queue"
+  ],
+  "constraints": [
+    "Use existing queue client in src/queue.py",
+    "No new dependencies"
+  ],
+  "context": {
+    "relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
+    "interface_contract": "..."
+  },
+  "retry_budget": 3,
+  "retry_count": 0,
+  "preferred_runtime": "coding_agent",
+  "agent_personality": "agents/engineering/engineering-code-reviewer.md",
+  "created_at": "ISO-8601"
+}
+```
+
+`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
+
+`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
+
+```
+```
+
+---
+
+## Adapter Interfaces
+
+### LLM (`adapters/base/llm.py`)
+```python
+class LLMAdapter:
+    def complete(self, prompt: str, capability: str, context: dict) -> str
+    def resolve_model(self, capability: str) -> str
+    # capability: "reasoning-heavy" | "capable" | "fast-cheap"
+```
+
+### VCS (`adapters/base/vcs.py`)
+```python
+class VCSAdapter:
+    def create_branch(self, name: str) -> None
+    def commit(self, files: list[str], message: str) -> str       # returns commit sha
+    def create_pr(self, title: str, body: str, head: str, base: str) -> str  # returns pr url
+    def get_pr_status(self, pr_id: str) -> str                    # open | merged | closed
+```
+
+### Notify (`adapters/base/notify.py`)
+```python
+class NotifyAdapter:
+    def send(self, message: str, context: dict) -> None
+```
+
+### Runtime (`adapters/base/runtime.py`)
+```python
+class RuntimeAdapter:
+    def spawn(self, task: str, capability: str, context: dict) -> str  # returns agent_id
+    def get_result(self, agent_id: str, timeout_s: int) -> dict
+    def kill(self, agent_id: str) -> None
+
+# Two implementations:
+#   openclaw.py    — general purpose, uses sessions_spawn, suits T1/T2/T3
+#   claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
+#
+# The runner selects runtime based on brief.preferred_runtime:
+#   "standard"      → openclaw.py (default)
+#   "coding_agent"  → claude_code.py (falls back to standard if unavailable)
+#
+# Both implementations inject brief.agent_personality as the system prompt
+# when spawning, if present. Falls back to generic tier prompt otherwise.
+# claude_code.py passes the agent file via --system-prompt flag natively
+# (agency-agents was designed for Claude Code's agents/ directory).
+```
+
+---
+
+## Run Config (`config/team.yaml`)
+
+```yaml
+run:
+  goal: "Build webhook ingestion system with retry logic and DLQ"
+  repo: "git@github.com:org/repo.git"
+  base_branch: "main"
+
+adapters:
+  llm: anthropic
+  vcs: github
+  notify: openclaw
+  runtime: openclaw
+
+models:
+  provider: anthropic          # default provider
+  capability_map:
+    reasoning-heavy:
+      anthropic: claude-opus-4-6
+      openai: o3
+    capable:
+      anthropic: claude-sonnet-4-6
+      openai: gpt-4o
+      ollama: llama3.1:70b
+    fast-cheap:
+      anthropic: claude-haiku-3-5
+      openai: gpt-4o-mini
+      ollama: llama3.2
+
+  # optional: override provider per tier
+  tier_overrides:
+    t1: { provider: openai, capability: reasoning-heavy }
+    t4: { provider: ollama, capability: fast-cheap }
+
+runtime:
+  default: openclaw
+  coding_agent: claude_code     # used for T4/T5 when available; omit to disable
+  native_teams: false           # Claude Code's experimental agent teams — opt-in only
+                                # when true: T3 hands full workstream to Claude Code,
+                                # which fans out internally. faster but less blackboard
+                                # visibility. default: false (explicit T4 spawning)
+  # tier_runtime_map (optional overrides):
+  #   t1: standard
+  #   t2: standard
+  #   t3: standard
+  #   t4: coding_agent
+  #   t5: coding_agent
+
+retry_defaults:
+  bad_output: 3
+  partial: 2
+  blocked: 0    # always escalate immediately
+
+visibility:
+  strict_mode: false          # true = all gates on (recommended for first runs)
+  log_level: normal           # normal | verbose (verbose = per-T4 start/done lines)
+  inspection_gates:
+    t1_plan: true             # always — required by design
+    t2_lead: false            # optional — review boundaries before specialists spawn
+    t2_synthesis: true        # recommended — review architecture before implementation
+    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
+    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
+  gate_timeout_minutes: 60    # auto-reject if no human response within this window
+
+t3_mesh_timeout_minutes: 10   # max time for T3s to commit task lists before runner escalates
+```
+
+---
+
+## Role Registry (`config/role_registry.yaml`)
+
+Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
+
+```yaml
+t1:
+  default: agents/strategy/nexus-strategy.md
+
+t2:
+  backend:  agents/engineering/engineering-software-architect.md
+  frontend: agents/engineering/engineering-software-architect.md
+  infra:    agents/engineering/engineering-devops-automator.md
+  data:     agents/engineering/engineering-data-engineer.md
+  default:  agents/engineering/engineering-software-architect.md
+
+t3:
+  backend:  agents/engineering/engineering-senior-developer.md
+  frontend: agents/engineering/engineering-senior-developer.md
+  infra:    agents/engineering/engineering-sre.md
+  default:  agents/engineering/engineering-senior-developer.md
+
+t4:
+  frontend:  agents/engineering/engineering-frontend-developer.md
+  backend:   agents/engineering/engineering-backend-architect.md
+  database:  agents/engineering/engineering-database-optimizer.md
+  devops:    agents/engineering/engineering-devops-automator.md
+  mobile:    agents/engineering/engineering-mobile-app-builder.md
+  ai:        agents/engineering/engineering-ai-engineer.md
+  security:  agents/engineering/engineering-security-engineer.md
+  docs:      agents/engineering/engineering-technical-writer.md
+  default:   agents/engineering/engineering-senior-developer.md
+
+t5:
+  code:        agents/engineering/engineering-code-reviewer.md
+  integration: agents/testing/testing-reality-checker.md
+  api:         agents/testing/testing-api-tester.md
+  performance: agents/testing/testing-performance-benchmarker.md
+  security:    agents/engineering/engineering-security-engineer.md
+  default:     agents/engineering/engineering-code-reviewer.md
+```
+
+```yaml
+```
+
+---
+
+## Key Flows
+
+### 1. Run Kickoff
+
+```
+User → team_runner.start(goal, config)  # via CLI or any caller
+  → generate run_id
+  → init blackboard (create runs/<run_id>/blackboard.db)
+  → build T1 brief (goal_anchor = goal, retry_budget from config)
+  → spawn T1 via runtime adapter
+  → await T1 workplan
+```
+
+### 2. T1 Scope Assessment
+
+```
+T1 receives brief
+  → assess complexity → decide depth
+  → identify workstreams
+  → set retry_budget multiplier per workstream (1x simple, 2x complex)
+  → emit N workstream briefs for T2 (or T3 if shallow)
+  → write workplan to blackboard
+  → team_runner spawns T2s in parallel
+```
+
+### 3. T4 Retry Loop (escalation.py)
+
+```
+spawn T4 with brief
+  → receive result
+  → classify: bad_output | blocked | partial | success
+
+  blocked:
+    → log event(escalated)
+    → pass to T3 immediately
+
+  bad_output, retries_remaining:
+    → amend brief with failure context, increment retry_count
+    → re-spawn T4
+    → log event(retried)
+
+  bad_output, retries_exhausted:
+    → log event(escalated)
+    → pass to T3
+
+  partial:
+    → write salvageable parts to blackboard
+    → re-task remainder with new brief
+
+  success:
+    → write result to blackboard
+    → log event(completed)
+    → notify T3
+```
+
+### 4. Inspection Gate Flow
+
+```
+runner reaches configured gate (e.g. t2_synthesis)
+  → write event(gate_pending, detail={tier, summary, what_happens_next})
+  → notify_adapter.send(tier summary + gate context)
+  → halt: poll blackboard for gate_approved or gate_rejected
+
+  gate_approved:
+    → write event(gate_approved)
+    → continue run
+
+  gate_rejected:
+    → write event(gate_rejected, detail={reason})
+    → re-invoke tier with rejection reason in brief context
+    → loop back to gate_pending when tier completes again
+
+  gate_timeout (gate_timeout_minutes elapsed):
+    → treat as gate_rejected
+    → notify Andrew: "Gate timed out, re-invoking tier"
+```
+
+### 5. Review Gate
+
+```
+T1 completes integration
+  → vcs_adapter.create_pr(
+      title="[agent-teams] <run_id>: <goal summary>",
+      body="<workplan + workstream summaries>",
+      head="integration/<run_id>",
+      base="main"
+    )
+  → notify_adapter.send(
+      "Run <run_id> complete. PR ready for review: <pr_url>",
+      context={run_id, goal, workstreams, pr_url}
+    )
+  → blackboard: update run status → "review"
+  → halt — no auto-merge
+```
+
+---
+
+## Build Order
+
+1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
+2. `config/role_registry.yaml` — map tier+domain → agent personality files
+3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
+4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
+5. `adapters/base/*` — all four abstract interfaces
+6. `adapters/llm/anthropic.py` — first LLM implementation
+7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
+8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
+9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
+10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
+11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
+12. `prompts/` — fallback tier prompts (used when no agent_personality set)
+13. `adapters/vcs/github.py` — PR creation + branch management
+14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
+15. `config/team.yaml` — example config with full visibility block
+16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
+
+---
+
+## Out of Scope (Phase 2)
+
+- Cost accounting per tier + run rollup
+- Parallel workstream progress dashboard
+- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
+- Persistent standing teams
+- Web UI for run monitoring
--- a/docs/design.md
+++ b/docs/design.md
@@ -0,0 +1,681 @@
+# Tiered Agent Team System — Design Document
+
+_Started: 2026-03-14. Last updated: 2026-03-30._
+
+---
+
+## Resolved Design Decisions (formerly Open Questions)
+
+All eight open questions resolved 2026-03-30. Details in Decisions Log.
+
+1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
+
+2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
+
+3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
+
+4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
+
+5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
+
+6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
+
+7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
+
+8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
+
+---
+
+---
+
+## Overview
+
+A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
+
+---
+
+## Core Principles
+
+**1. Tiers represent cognitive modes, not org chart levels.**
+Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
+
+**2. Depth is proportional to complexity.**
+Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
+
+**3. Goal anchoring at every level.**
+T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
+
+**4. Artifacts, not summaries.**
+Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
+
+**5. Verification is mandatory.**
+T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
+
+**6. Provider agnostic.**
+The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
+
+**7. Specialist talent pool.**
+Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
+
+---
+
+## Tier Definitions
+
+| Tier | Role | Owns | Capability Level |
+|------|------|------|-----------------|
+| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
+| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
+| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
+| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
+| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
+
+T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
+
+Capability levels map to actual models per provider in config — the core system never references a specific model name.
+
+---
+
+## Dispatch Model
+
+### T1 Owns the Plan
+
+T1 is not just a decomposer — it is the dispatch planner. Its output declares:
+
+- **Workstreams** — the decomposed units of work
+- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
+- **Parallelism** — which workstreams are independent and can run concurrently
+
+T1 does not prescribe how each tier operates internally. That is the tier's own concern.
+
+### T1 Lifecycle — Two Explicit Phases
+
+T1 is invoked twice per run, each with a distinct prompt and purpose:
+
+**Phase 1 — Plan:**
+1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
+2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
+3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
+
+**Phase 2 — Accept:**
+After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
+
+Both phases are named explicitly in the task brief schema and tracked on the blackboard.
+
+### Each Tier Owns the Layer Below
+
+Control flow is distributed, not centralised:
+
+- T1 manages its T2s
+- T2 Lead manages T2 specialists and their domain boundaries
+- T2 specialists each own their T3s
+- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
+- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
+
+This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
+
+**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
+
+### Dynamic Paths
+
+Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
+
+---
+
+## Orchestration Patterns Per Tier
+
+Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
+
+| Tier | Pattern | Rationale |
+|------|---------|-----------|
+| T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
+| T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
+| T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
+| T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
+| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
+| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
+
+### T2 Flow in Detail
+
+1. T1 spawns **T2 Lead Architect** with goal + workstream context
+2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
+3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
+4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
+5. Specialists work in parallel, each within their defined domain
+6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
+7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
+8. T1 (Accept phase) validates canonical architecture against goal anchor
+9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
+
+---
+
+## Horizontal Scaling Within Tiers
+
+```
+T1 — Phase 1: Plan (self-critique → Andrew approval)
+│
+├── T2: Lead Architect (boundaries + shared assumptions first)
+│   ├── T2: Backend Architect  ─┐
+│   ├── T2: Frontend Architect  ├─ parallel, within defined domains
+│   └── T2: Infra Architect    ─┘
+│       │
+│       └── (Lead synthesises → conflict resolution if needed → canonical architecture)
+│
+├── T2 Backend Architect owns:
+│   ├── T3: API Squad Lead  ─┐
+│   └── T3: DB Squad Lead   ─┴─ light mesh within domain
+│           ├── T4: Worker A  ─┐
+│           ├── T4: Worker B  ─┼─ swarm / pipeline (T3 decides)
+│           └── T4: Worker C  ─┘
+│                   └── T5: Verifier(s) — fan-out + consensus
+│
+└── T1 — Phase 2: Accept (validates against goal anchor → PR)
+```
+
+---
+
+## Use Case Flows
+
+T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
+
+### Full Stack — T1→T2→T3→T4→T5
+*Complex feature, new product, cross-domain changes*
+
+```
+T1 Plan
+  → assess complexity (high)
+  → output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
+  → self-critique pass
+  → GATE: surface to Andrew ← approval required
+
+T2 Lead (spawned by runner after approval)
+  → receive: goal + full workplan
+  → publish: domain boundaries + shared assumptions doc → blackboard
+  → GATE (optional): review boundaries before specialists spawn
+
+T2 Specialists (parallel fan-out, wait on Lead)
+  → each receives: their domain boundary + shared assumptions
+  → produce: architecture proposal for their slice
+  → Lead synthesises, drives conflict resolution if needed
+  → Lead writes: canonical architecture → blackboard
+  → GATE (recommended): review architecture before implementation
+
+Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
+
+T3s (light mesh within T2 domain)
+  → write draft task lists to blackboard
+  → read peers' lists, reconcile boundaries
+  → commit merged task plan before T4 dispatch
+  → GATE (optional): review task breakdown
+
+T4s
+  → swarm: independent tasks run in parallel
+  → pipeline: T4-A output feeds T4-B (T3 declares dependencies)
+  → commit to feature branches
+
+T5s (fan-out per T4 slice)
+  → each reviews its slice independently
+  → T3 collects results → joint verdict
+  → GATE (optional): review T5 verdict before T3 marks done
+  → partial: T3 retries only failed slices
+  → pass: T3 signals workstream done to T2
+
+T2 specialists → signal T2 Lead
+T2 Lead → writes integration summary → blackboard
+
+T1 Accept
+  → validate against goal anchor
+  → open PR, notify_adapter.send(pr summary + url)
+```
+
+### Medium Complexity — T1→T3→T4→T5
+*Config change, isolated bug fix — T1 determines no cross-domain design needed*
+
+```
+T1 Plan
+  → assess: contained scope, single domain, no T2 architecture needed
+  → workplan: tier paths [T3, T4, T5]
+  → GATE: Andrew approval
+
+T3s spawned directly by runner
+  → receives T1 brief with task context (no T2 architecture layer)
+  → T3 light mesh → T4 dispatch → T5 verify → signal done
+
+T1 Accept → PR
+```
+
+### Simple / Hotfix — T1→T4→T5
+*Single file, single function, trivial atomic task*
+
+```
+T1 Plan
+  → assess: trivial, single workstream
+  → tier path: [T4, T5]
+  → GATE: Andrew approval
+
+T4 (coding agent)
+  → single atomic task, commits
+
+T5 (single verifier, not full fan-out)
+  → code review + correctness check
+  → pass → T1 Accept → PR
+```
+
+---
+
+## Resolved Mechanics
+
+### T3 Mesh via Blackboard
+
+T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
+
+1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
+2. Each T3 reads all sibling T3 draft lists in its T2 domain
+3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
+4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
+5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
+
+The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
+
+---
+
+### T1 Plan Output Schema
+
+T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
+
+```json
+{
+  "run_id": "uuid",
+  "goal_anchor": "Original goal — immutable, propagated to every downstream brief",
+  "complexity": "high | medium | low",
+  "retry_budget_multiplier": 2,
+  "workstreams": [
+    {
+      "id": "ws-backend-api",
+      "name": "Backend API",
+      "domain": "backend",
+      "tier_path": ["t2", "t3", "t4", "t5"],
+      "parallel_group": "A",
+      "t2_specialist": "agents/engineering/engineering-software-architect.md",
+      "notes": "Focus on webhook ingest and retry queue"
+    }
+  ],
+  "parallelism": {
+    "groups": {
+      "A": ["ws-backend-api", "ws-frontend"],
+      "B": ["ws-infra"]
+    },
+    "sequence": ["A", "B"]
+  },
+  "self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
+}
+```
+
+`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
+
+---
+
+### T5 Consensus & Verdict Schema
+
+T3 aggregates all T5 results into a joint verdict after fan-out completes.
+
+**Individual T5 result:**
+```json
+{
+  "verifier_id": "uuid",
+  "scope": "queue-client",
+  "verdict": "pass | fail",
+  "issues": ["issue description..."],
+  "notes": "human-readable summary"
+}
+```
+
+**T3 joint verdict (written to blackboard):**
+```json
+{
+  "t5_results": [...],
+  "joint_verdict": "pass | partial | fail",
+  "failed_scopes": ["queue-client"],
+  "summary": "Human-readable summary for gate surface and logs"
+}
+```
+
+**Split verdict handling:**
+- `pass` → T3 marks workstream done, signals T2
+- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
+- `fail` → T3 escalates to T2 (or T1 if shallow path)
+
+---
+
+### Spawn Call Ownership
+
+The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
+
+**Flow:**
+1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
+2. Runner's spawn loop detects pending rows
+3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
+4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
+5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
+
+This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
+
+---
+
+### Gate Approval UX
+
+**Core mechanic (platform-agnostic):**
+
+1. Runner writes `gate_pending` to blackboard
+2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
+3. Runner polls blackboard for `gate_approved` or `gate_rejected`
+4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
+
+Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
+
+**Adapter responsibility:**
+Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
+
+Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
+
+---
+
+### T3 Mesh Timeout
+
+If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
+
+1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
+
+2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
+
+Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
+
+---
+
+### Path Amendment Mechanism
+
+When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
+
+1. The discovering tier writes a `path_amendment` event to the blackboard:
+```json
+{
+  "kind": "path_amendment",
+  "proposed_by": "t3/ws-backend-api",
+  "reason": "Discovered auth dependency requires T2 architectural pass",
+  "amendment": {
+    "workstream": "ws-backend-api",
+    "add_tiers": ["t2"],
+    "insert_before": "t3"
+  }
+}
+```
+2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
+3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
+4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
+
+No agent needs callback plumbing. The runner is the notification bridge.
+
+---
+
+## Shared State
+
+For software pipelines, **the repo is the primary blackboard**:
+- T4 workers commit to feature branches
+- T3 leads review and merge to workstream branches
+- T2 architects own integration branches
+- T1 does final integration and acceptance
+
+Supplemented by a SQLite coordination store per run tracking:
+- In-flight workstreams and their current execution plans
+- Handoff artifacts and tier status
+- Retry counts and escalation history
+- Path amendments (proposed, by whom, timestamp)
+
+---
+
+## Failure Handling
+
+Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
+
+| Failure | Owner | Handler | Action |
+|---------|-------|---------|--------|
+| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
+| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
+| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
+| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
+| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
+| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
+| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
+| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
+| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
+
+**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
+
+Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
+
+T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
+
+---
+
+## Agent Talent Pool
+
+The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
+
+**Division of responsibility:**
+- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
+- Agency-agents provides: the specialist knowledge each agent brings to its role
+
+T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
+
+**Default tier-to-specialist mapping for software pipelines:**
+
+| Tier | Domain | Agent |
+|------|--------|-------|
+| T1 | Strategy | nexus-strategy |
+| T2 | Backend | software-architect |
+| T2 | Infra | devops-automator |
+| T2 | Data | data-engineer |
+| T3 | Backend | senior-developer |
+| T3 | Reliability | sre |
+| T4 | Frontend | frontend-developer |
+| T4 | Backend | backend-architect |
+| T4 | Database | database-optimizer |
+| T4 | DevOps | devops-automator |
+| T4 | Mobile | mobile-app-builder |
+| T4 | AI/ML | ai-engineer |
+| T4 | Security | security-engineer |
+| T4 | Docs | technical-writer |
+| T5 | Code review | code-reviewer |
+| T5 | Integration | testing-reality-checker |
+| T5 | API | testing-api-tester |
+| T5 | Performance | testing-performance-benchmarker |
+| T5 | Security | security-engineer |
+
+The roster is not fixed — T1 can select any agent from the library based on workstream needs.
+
+---
+
+## Adapter Layers
+
+Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
+
+```
+Core (platform-agnostic)
+├── team_runner      — thin bootstrap: spawn T1, monitor blackboard, handle result
+├── blackboard       — SQLite coordination state
+├── task_brief       — schema + validation
+└── escalation       — retry logic, failure routing
+
+Adapters (swappable)
+├── llm/             — anthropic (now), openai, ollama, any API
+├── notify/          — openclaw (now), slack, email, webhook...
+├── vcs/             — github (now), gitlab, gitea, bare git...
+└── runtime/
+    ├── standard     — openclaw sessions_spawn (T1/T2/T3)
+    └── coding_agent — claude_code (T4/T5 default), codex, aider...
+```
+
+Swapping providers means writing a new adapter file — nothing in core changes.
+
+T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
+
+---
+
+## Run Visibility Layer
+
+Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
+
+### 1. Human-Readable Live Log
+
+Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
+
+```
+[abc123] 12:30:01  T1   PLAN_START    Assessing scope: "Build webhook ingestion system"
+[abc123] 12:30:14  T1   PLAN_DONE     3 workstreams — backend-api, infra, docs (2 parallel)
+[abc123] 12:30:14  GATE APPROVAL      ⏸  Waiting on approval before T2 spawns
+[abc123] 12:31:02  GATE APPROVED      ✓  Approved — continuing
+[abc123] 12:31:03  T2   LEAD_START    Lead Architect spawned
+[abc123] 12:31:41  T2   BOUNDS_READY  Domain boundaries + shared assumptions published
+[abc123] 12:31:42  T2   SPEC_START    3 specialists spawned (parallel): backend, infra, docs
+[abc123] 12:32:15  T2   SPEC_DONE     backend-api architecture draft ready
+[abc123] 12:32:58  T2   SYNTH_DONE    Canonical architecture written to blackboard
+[abc123] 12:32:58  GATE INSPECTION    ⏸  T2 synthesis ready for review
+[abc123] 12:33:44  T3   MESH_START    backend-api: 2 squad leads negotiating task boundaries
+[abc123] 12:34:01  T3   MESH_DONE     Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
+[abc123] 12:34:02  T4   SWARM_START   5 workers spawned in parallel
+[abc123] 12:35:10  T4   DONE          worker-3 auth-middleware ✓
+[abc123] 12:35:22  T4   FAIL          worker-4 queue-client ✗  (retry 1/3)
+[abc123] 12:36:04  T4   DONE          worker-4 queue-client ✓  (retry resolved)
+[abc123] 12:36:05  T5   VERIFY_START  4 verifiers spawned
+[abc123] 12:36:45  T5   VERDICT       partial — queue-client needs rework
+[abc123] 12:37:12  T5   VERDICT       ✓  all pass — workstream backend-api done
+```
+
+Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
+
+### 2. Inspection Gates
+
+Configurable pause points. When the runner hits a gate, it:
+1. Writes a `gate_pending` event to the blackboard
+2. Fires `notify_adapter.send()` with the tier summary + gate context
+3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
+
+The tier summary surfaced at each gate includes:
+- **What was produced** (the tier artifact in readable form)
+- **What happens next** (which agents will spawn, doing what)
+- **Any anomalies** flagged by the tier itself
+
+Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
+
+```yaml
+visibility:
+  strict_mode: false
+  log_level: normal           # normal | verbose
+  inspection_gates:
+    t1_plan: true             # always — required by design
+    t2_lead: false            # optional — review boundaries before specialists
+    t2_synthesis: true        # recommended — review architecture before implementation
+    t3_plan: false            # verbose — useful early on, disable once T3 is trusted
+    t5_verdict: false         # review T5 joint verdict before T3 marks workstream done
+  gate_timeout_minutes: 60    # auto-reject if no response within this window
+```
+
+### 3. Inspection CLI — `cli/agency.py`
+
+```
+agency run <config.yaml>               # start a run, returns run_id
+agency watch <run_id>                  # tail live log (follows blackboard events)
+agency inspect <run_id>                # interactive tree view of run state
+agency inspect <run_id> --tier t2      # jump to T2 artifacts
+agency inspect <run_id> --brief <id>   # show full brief + result JSON
+
+agency approve <run_id>                # approve current gate → continue
+agency approve <run_id> --note "..."   # approve with a note written to blackboard
+agency reject <run_id> --reason "..."  # reject → tier re-invoked
+agency pause <run_id>                  # force-pause at next tier boundary
+agency resume <run_id>                 # release a manual pause
+```
+
+`agency inspect` (no flags) renders a live tree:
+```
+Run abc123 — "Build webhook ingestion system"
+├── T1 Plan ✓
+│   └── [view workplan]
+├── T2 Architecture ✓  [GATE: pending review]
+│   ├── [view domain boundaries]
+│   ├── [view shared assumptions]
+│   └── [view canonical architecture]
+├── T3 backend-api (active)
+│   ├── [view task breakdown]
+│   └── T4 workers: 3/7 done, 1 retrying, 3 pending
+└── T3 infra (pending)
+```
+
+### Blackboard Event Vocabulary (extended)
+
+```python
+# existing
+"spawned" | "completed" | "failed" | "escalated" | "retried"
+
+# new — visibility layer
+"gate_pending"     # runner hit a gate, waiting for human
+"gate_approved"    # human approved, run continues
+"gate_rejected"    # human rejected, tier re-invoked
+"gate_paused"      # manual pause via CLI
+"gate_resumed"     # manual resume via CLI
+"path_amendment"   # mid-run tier proposed path change
+"log"              # human-readable log line (level + message)
+```
+
+---
+
+## Decisions Log
+
+**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
+
+**T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
+
+**T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
+
+**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
+
+**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
+
+**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
+
+**T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
+
+**T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
+
+**T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
+
+**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
+
+**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
+
+**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
+
+**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
+
+**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
+
+**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
+
+**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
+
+**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
+
+**Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
+
+**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
+
+**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
+
+**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
+
+**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
+
+**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
+
+**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
+
+**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
--- a/requirements.txt
+++ b/requirements.txt
@@ -10,6 +10,9 @@ pyyaml
 # Environment variable management
 python-dotenv

+# GitHub VCS adapter
+PyGithub
+
 # --- stdlib-only (no pip install needed) ---
 # sqlite3   — blackboard persistence
 # dataclasses — task_brief schema
Author	SHA1	Message	Date
Hans Heinemann	342832fa5e	chore: update submodule URL to Gitea	2026-04-02 10:05:09 -04:00
hansheinemann	641f122cdb	docs: add CLAUDE.md agent quick reference docs: add CLAUDE.md agent quick reference	2026-03-30 15:19:07 -04:00
hansheinemann	54afa0f53f	docs: resolve all design questions + visibility layer + portability audit docs: resolve all design questions + visibility layer + portability audit	2026-03-30 15:18:48 -04:00
hansheinemann	f228061c4d	docs: update design doc with new architecture decisions docs: update design doc with new architecture decisions	2026-03-30 15:18:30 -04:00
Hans Heinemann	1c99e40f98	docs: purge OpenClaw/Hans specifics from core design Portability audit — all platform-specific concerns moved to adapter layer: - Gate Approval UX (Resolved Mechanics): rewritten as platform-agnostic. Core: runner writes gate_pending, calls notify_adapter.send(), polls blackboard for gate_approved. Universal path: agency CLI writes directly to blackboard. Adapter handles its own inbound response bridge internally. - pending_gates.json removed from core directory structure and runner responsibilities — adapter-internal state, not a core concern. - 'User → Hans → team_runner.start()' → 'User → team_runner.start()' Core has no dependency on a specific caller. - 'notify_adapter.send(...to Andrew via Hans)' → 'notify_adapter.send()' throughout design.md and buildspec.md. - anthropic.py description: 'via OpenClaw or direct API' → 'direct API' (anthropic adapter never goes via OpenClaw) - Output/review decision: 'Hans messages Andrew' → 'notify_adapter.send()' - Run visibility decision: 'Andrew via Hans' → 'via notify_adapter.send()' - Decisions log: gate approval and visibility entries rewritten accordingly Adapter layer correctly unchanged: adapters/notify/openclaw.py — OpenClaw-specific, owns its inbound bridge adapters/runtime/openclaw.py — OpenClaw sessions_spawn, correctly isolated team.yaml example config — adapter selection is config, not core	2026-03-30 14:31:55 -04:00
Hans Heinemann	8f143e779d	docs: resolve remaining 3 design questions (spawn ownership, gate UX, mesh timeout) - Spawn calls: runner owns all runtime_adapter.spawn() calls; tiers write status=pending briefs to blackboard, runner's spawn loop acts on them. Gate logic lives in the spawn loop — no gate plumbing needed in agents. - Gate approval UX: Signal reply via Hans + direct CLI both supported. Both write gate_approved to blackboard; runner doesn't care which path. Hans uses pending_gates.json for multi-run disambiguation. - T3 mesh timeout: escalate to T2 (domain boundary problem). If T2 also exhausts retry budget, normal escalation ladder handles it. No force-commit. Add pending_gates.json to directory structure and buildspec. Update runner step in build order with full spawn loop responsibilities.	2026-03-30 14:22:39 -04:00
Hans Heinemann	a721db63f6	docs: lock in visibility layer, resolve all 5 open design questions - Resolve T3 mesh mechanics: blackboard-based draft/commit cycle - Resolve T1 plan output schema: formal JSON structure with workstreams + parallelism groups - Resolve T5 consensus: T3 aggregates joint verdict (pass/partial/fail), partial retries failed slices only - Resolve path amendment mechanism: event-based, runner notifies higher tier, no approval gate - Resolve failure handling: confirmed distributed ownership, runner owns T1 + terminal only Add run visibility layer: - Human-readable live log (normal + verbose modes) - Configurable inspection gates (t1_plan always, t2_synthesis recommended, others optional) - strict_mode flag for full gating on early runs - cli/agency.py: run, watch, inspect, approve, reject, pause, resume - gate_pending halt loop in team_runner, gate_approved/rejected resume - Expanded blackboard event vocabulary (gate_*, path_amendment, log) - t3_task_lists table for mesh coordination state - Inspection gate flow added to buildspec Key Flows Build order updated: 16 steps (added cli/ step, clarified runner gate responsibilities)	2026-03-30 13:43:19 -04:00
Hans Heinemann	882b769d21	chore: sync agency-agents submodule with upstream	2026-03-30 09:00:16 -04:00
Hans Heinemann	ce3c020de2	docs: add open design questions section	2026-03-16 20:45:47 -04:00
Hans Heinemann	b54436f474	docs: T1 two-phase lifecycle, T2 Lead Architect, shared assumptions, conflict resolution	2026-03-16 20:41:13 -04:00
Hans Heinemann	1ed7023c08	docs: update design — dynamic dispatch, distributed ownership, orchestration patterns	2026-03-16 16:13:33 -04:00
Hans Heinemann	9efbb3b010	docs: add CLAUDE.md agent quick reference	2026-03-16 15:52:44 -04:00
hansheinemann	72bd744664	docs: add design doc and buildspec (#5 )	2026-03-16 15:51:14 -04:00
hansheinemann	084cfb0bb2	feat: implement all adapter layers (#2 ) Adapters implemented: - adapters/llm/anthropic.py — Anthropic Claude SDK, capability-based model selection, max_tokens + temperature configurable via team.yaml, lazy SDK import - adapters/vcs/github.py — GitHub PR/branch operations via gh CLI - adapters/notify/openclaw.py — OpenClaw system event notifications - adapters/runtime/openclaw.py — OpenClaw sessions_spawn for agent execution - adapters/runtime/claude_code.py — Claude Code CLI for T4/T5 coding tasks All adapters follow the abstract base interfaces from Phase 1. Config-driven model selection via capability_map in team.yaml.	2026-03-16 11:45:11 -04:00
hansheinemann	ce1ce85b87	feat: expand role_registry with specialist roles + update agency-agents submodule (#4 ) Role registry changes: - T2 backend: software-architect → backend-architect - T2 frontend: software-architect → frontend-architect - T3 backend: senior-developer → senior-backend-developer (NEW) - T3 frontend: senior-developer → senior-frontend-developer (NEW) - T4 backend: backend-architect → backend-developer - T4 default: senior-developer → backend-developer - Added coverage for: ai, security, mobile, database, devops, docs, data, embedded, e2e, accessibility Submodule updated to include: frontend-architect, backend-developer, senior-backend-developer, senior-frontend-developer. Clean tier separation: T2 = architects (design) T3 = senior devs (lead + implement-or-delegate) T4 = developers (pure implementation) T5 = reviewers/testers (verification)	2026-03-16 11:44:54 -04:00