Compare commits
15 Commits
5b0d00a799
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 342832fa5e | |||
| 641f122cdb | |||
| 54afa0f53f | |||
| f228061c4d | |||
| 1c99e40f98 | |||
| 8f143e779d | |||
| a721db63f6 | |||
| 882b769d21 | |||
| ce3c020de2 | |||
| b54436f474 | |||
| 1ed7023c08 | |||
| 9efbb3b010 | |||
| 72bd744664 | |||
| 084cfb0bb2 | |||
| ce1ce85b87 |
2
.gitmodules
vendored
2
.gitmodules
vendored
@@ -1,3 +1,3 @@
|
||||
[submodule "agents"]
|
||||
path = agents
|
||||
url = https://github.com/coding-with-hans-heinemann/agency-agents.git
|
||||
url = https://git.tandrewng.com/cw-hans/agency-agents.git
|
||||
|
||||
48
CLAUDE.md
Normal file
48
CLAUDE.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# CLAUDE.md — Agent Quick Reference
|
||||
|
||||
Read this before exploring the codebase. It saves tokens.
|
||||
|
||||
## What This Is
|
||||
|
||||
A tiered multi-agent orchestration framework. T1 decomposes goals → T2 architects → T3 leads → T4 implements → T5 verifies. SQLite blackboard tracks state. All external dependencies (LLM, VCS, notify, runtime) are pluggable adapters.
|
||||
|
||||
## Key Docs
|
||||
|
||||
- `docs/design.md` — architecture decisions, tier design, key choices
|
||||
- `docs/buildspec.md` — 15-step build order, phase breakdown
|
||||
|
||||
## Project Layout
|
||||
|
||||
```
|
||||
core/ — task_brief.py, blackboard.py, escalation.py, team_runner.py
|
||||
adapters/base/ — abstract base classes (LLMAdapter, VCSAdapter, NotifyAdapter, RuntimeAdapter)
|
||||
adapters/llm/ — anthropic.py
|
||||
adapters/vcs/ — github.py
|
||||
adapters/notify/— openclaw.py
|
||||
adapters/runtime— openclaw.py, claude_code.py
|
||||
prompts/ — T1–T5 system prompt .md files
|
||||
config/ — team.yaml (run config), role_registry.yaml (tier→role→persona)
|
||||
agents/ — git submodule, agent persona .md files
|
||||
runs/ — per-run blackboard.db files (gitignored)
|
||||
```
|
||||
|
||||
## Conventions
|
||||
|
||||
- **Never commit or push directly to `main`** — always branch (`hans/...` or `feature/...`) and PR
|
||||
- New adapters: subclass the relevant `adapters/base/*.py` abstract class
|
||||
- New roles: add persona `.md` to `agents/` submodule + entry in `config/role_registry.yaml`
|
||||
- Failure handling lives in `core/escalation.py` — extend `FailureType` there
|
||||
- `TaskBrief` is the canonical work unit — all tiers pass briefs to each other
|
||||
- Blackboard is the single source of truth per run — always write events there
|
||||
|
||||
## Current State
|
||||
|
||||
Phase 2 adapter implementations exist. `core/team_runner.py` may still have stubs — check before assuming it's wired up end-to-end.
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
python -m venv .venv && source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
python -m core.team_runner --config config/team.yaml
|
||||
```
|
||||
@@ -1,16 +1,15 @@
|
||||
"""
|
||||
adapters/llm/anthropic.py
|
||||
Anthropic Claude adapter — Phase 2 stub.
|
||||
Anthropic Claude LLM adapter — Phase 2 implementation.
|
||||
|
||||
TODO (Phase 2):
|
||||
- Implement complete() using the anthropic SDK (anthropic.Anthropic client).
|
||||
- Implement resolve_model() by reading config/team.yaml capability_map.
|
||||
- Handle streaming responses, rate-limit retries, and token counting.
|
||||
- Support system-prompt injection via context["system_prompt"].
|
||||
- Map capability → model using the provider's capability_map config.
|
||||
Uses the ``anthropic`` SDK to call Claude models. Model selection is driven
|
||||
by the capability_map in team.yaml so the adapter stays provider-agnostic in
|
||||
configuration.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
|
||||
from adapters.base.llm import LLMAdapter
|
||||
|
||||
|
||||
@@ -18,27 +17,123 @@ class AnthropicAdapter(LLMAdapter):
|
||||
"""
|
||||
LLM adapter for Anthropic Claude models.
|
||||
|
||||
Reads model configuration from config/team.yaml:
|
||||
models.provider: anthropic
|
||||
models.capability_map.reasoning-heavy.anthropic: claude-opus-4-6
|
||||
models.capability_map.capable.anthropic: claude-sonnet-4-6
|
||||
models.capability_map.fast-cheap.anthropic: claude-haiku-3-5
|
||||
Reads model configuration from the loaded team.yaml config dict::
|
||||
|
||||
models:
|
||||
default_max_tokens: 4096 # fallback max_tokens for all calls
|
||||
default_temperature: 0 # fallback temperature for all calls
|
||||
capability_map:
|
||||
reasoning-heavy:
|
||||
anthropic: claude-opus-4-6
|
||||
capable:
|
||||
anthropic: claude-sonnet-4-6
|
||||
fast-cheap:
|
||||
anthropic: claude-haiku-3-5
|
||||
|
||||
The provider key used when looking up ``capability_map`` is hardcoded to
|
||||
``"anthropic"`` — the adapter knows its own provider; there is no need for
|
||||
a separate ``models.provider`` config field.
|
||||
|
||||
Both ``default_max_tokens`` and ``default_temperature`` can be overridden
|
||||
per-call via the ``context`` dict passed to :meth:`complete`.
|
||||
|
||||
Environment variables
|
||||
---------------------
|
||||
ANTHROPIC_API_KEY : Required. Authenticates with the Anthropic API.
|
||||
"""
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
||||
# Extract API key from environment (ANTHROPIC_API_KEY).
|
||||
# Initialise the anthropic.Anthropic() client.
|
||||
raise NotImplementedError("AnthropicAdapter.__init__ is not yet implemented.")
|
||||
"""
|
||||
Initialise the Anthropic adapter.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : Loaded team.yaml config dict.
|
||||
|
||||
Raises
|
||||
------
|
||||
ValueError
|
||||
If ANTHROPIC_API_KEY is not set in the environment.
|
||||
"""
|
||||
try:
|
||||
import anthropic as _anthropic
|
||||
except ModuleNotFoundError as exc:
|
||||
raise ImportError(
|
||||
"The 'anthropic' package is required for AnthropicAdapter. "
|
||||
"Install it with: pip install anthropic"
|
||||
) from exc
|
||||
|
||||
self._config = config
|
||||
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
||||
if not api_key:
|
||||
raise ValueError(
|
||||
"ANTHROPIC_API_KEY environment variable is not set. "
|
||||
"Export it before running the-agency."
|
||||
)
|
||||
self._client = _anthropic.Anthropic(api_key=api_key)
|
||||
self._models_cfg: dict = config.get("models", {})
|
||||
self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
|
||||
self._default_temperature: float = self._models_cfg.get("default_temperature", 0)
|
||||
|
||||
def complete(self, prompt: str, capability: str, context: dict) -> str:
|
||||
# TODO (Phase 2): Call anthropic client messages.create().
|
||||
# Use resolve_model(capability) to pick the model.
|
||||
# Support context keys: system_prompt, max_tokens, temperature.
|
||||
# Return response text as a plain string.
|
||||
raise NotImplementedError("AnthropicAdapter.complete is not yet implemented.")
|
||||
"""
|
||||
Send a prompt to a Claude model and return the text response.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
prompt : User-role prompt content.
|
||||
capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
|
||||
context : Optional per-call overrides:
|
||||
system_prompt (str) — prepended as the system turn.
|
||||
max_tokens (int) — defaults to models.default_max_tokens in team.yaml.
|
||||
temperature (float) — defaults to models.default_temperature in team.yaml.
|
||||
|
||||
Returns
|
||||
-------
|
||||
The model's text completion as a plain string.
|
||||
"""
|
||||
model = self.resolve_model(capability)
|
||||
max_tokens: int = context.get("max_tokens", self._default_max_tokens)
|
||||
temperature: float = context.get("temperature", self._default_temperature)
|
||||
system_prompt: str = context.get("system_prompt", "")
|
||||
|
||||
create_kwargs: dict = {
|
||||
"model": model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{"role": "user", "content": prompt}],
|
||||
}
|
||||
if system_prompt:
|
||||
create_kwargs["system"] = system_prompt
|
||||
if temperature != 0.0:
|
||||
create_kwargs["temperature"] = temperature
|
||||
|
||||
response = self._client.messages.create(**create_kwargs)
|
||||
return response.content[0].text
|
||||
|
||||
def resolve_model(self, capability: str) -> str:
|
||||
# TODO (Phase 2): Look up capability in team.yaml capability_map.
|
||||
# Fall back to "capable" tier model if capability is unknown.
|
||||
raise NotImplementedError("AnthropicAdapter.resolve_model is not yet implemented.")
|
||||
"""
|
||||
Map a capability string to the Anthropic model identifier.
|
||||
|
||||
Looks up ``config.models.capability_map[capability][provider]``.
|
||||
Falls back to the "capable" tier model if the capability is unknown.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
|
||||
|
||||
Returns
|
||||
-------
|
||||
Anthropic model identifier (e.g. "claude-opus-4-6").
|
||||
"""
|
||||
# The adapter knows its own provider — no need to read it from config.
|
||||
cap_map: dict = self._models_cfg.get("capability_map", {})
|
||||
|
||||
if capability in cap_map and "anthropic" in cap_map[capability]:
|
||||
return cap_map[capability]["anthropic"]
|
||||
|
||||
# Fall back to "capable" tier
|
||||
if "capable" in cap_map and "anthropic" in cap_map["capable"]:
|
||||
return cap_map["capable"]["anthropic"]
|
||||
|
||||
# Hard-coded last resort
|
||||
return "claude-sonnet-4-6"
|
||||
|
||||
@@ -1,35 +1,93 @@
|
||||
"""
|
||||
adapters/notify/openclaw.py
|
||||
OpenClaw notification adapter — Phase 2 stub.
|
||||
OpenClaw notification adapter — Phase 2 implementation.
|
||||
|
||||
TODO (Phase 2):
|
||||
- Implement send() to dispatch notifications via the OpenClaw API.
|
||||
- Support context keys: channel, severity, run_id, brief_id.
|
||||
- Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
|
||||
- Handle rate limiting and delivery retries.
|
||||
Sends notifications by shelling out to the ``openclaw`` CLI::
|
||||
|
||||
openclaw system event --text "<message>" --mode now
|
||||
|
||||
If the binary is not on PATH the method logs a warning and returns without
|
||||
raising — notifications are best-effort and should never crash the pipeline.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
|
||||
from adapters.base.notify import NotifyAdapter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class OpenClawNotifyAdapter(NotifyAdapter):
|
||||
"""
|
||||
Notification adapter that sends messages via OpenClaw.
|
||||
Notification adapter that dispatches messages via the ``openclaw`` CLI.
|
||||
|
||||
Expects environment variables:
|
||||
OPENCLAW_API_KEY — authentication token
|
||||
OPENCLAW_URL — base URL for the OpenClaw API (optional, defaults to hosted)
|
||||
Environment variables
|
||||
---------------------
|
||||
OPENCLAW_SIGNAL_NUMBER : Optional. Direct signal target for OpenClaw sends.
|
||||
"""
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
||||
# Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
|
||||
# Initialise an HTTP client (e.g. httpx or requests).
|
||||
raise NotImplementedError("OpenClawNotifyAdapter.__init__ is not yet implemented.")
|
||||
"""
|
||||
Initialise the OpenClaw notification adapter.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : Loaded team.yaml config dict (reserved for future options).
|
||||
"""
|
||||
self._config = config
|
||||
self._signal_number: str = os.environ.get("OPENCLAW_SIGNAL_NUMBER", "")
|
||||
|
||||
def send(self, message: str, context: dict) -> None:
|
||||
# TODO (Phase 2): POST notification payload to OpenClaw API.
|
||||
# Include message, context (channel, severity, run_id, brief_id).
|
||||
# Log delivery confirmation or raise on failure.
|
||||
raise NotImplementedError("OpenClawNotifyAdapter.send is not yet implemented.")
|
||||
"""
|
||||
Send a notification via ``openclaw system event``.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
message : Human-readable notification text.
|
||||
context : Optional metadata. Recognised keys:
|
||||
level (str) — "info" | "warning" | "error"; logged locally.
|
||||
run_id (str) — included in the local log record.
|
||||
brief_id (str) — included in the local log record.
|
||||
|
||||
Notes
|
||||
-----
|
||||
If the ``openclaw`` binary is not present on PATH, the method logs a
|
||||
warning and returns silently. Notifications are best-effort.
|
||||
"""
|
||||
level: str = context.get("level", "info")
|
||||
run_id: str = context.get("run_id", "")
|
||||
brief_id: str = context.get("brief_id", "")
|
||||
|
||||
# Always log locally regardless of CLI availability.
|
||||
log_msg = "[notify:%s] %s (run=%s brief=%s)" % (level, message, run_id, brief_id)
|
||||
if level == "error":
|
||||
logger.error(log_msg)
|
||||
elif level == "warning":
|
||||
logger.warning(log_msg)
|
||||
else:
|
||||
logger.info(log_msg)
|
||||
|
||||
cmd = ["openclaw", "system", "event", "--text", message, "--mode", "now"]
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
)
|
||||
if result.returncode != 0:
|
||||
logger.warning(
|
||||
"openclaw event returned non-zero exit %d: %s",
|
||||
result.returncode,
|
||||
result.stderr.strip(),
|
||||
)
|
||||
except FileNotFoundError:
|
||||
logger.warning(
|
||||
"openclaw CLI not found on PATH; notification not delivered: %s",
|
||||
message,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning("openclaw event timed out for message: %s", message)
|
||||
|
||||
@@ -1,51 +1,163 @@
|
||||
"""
|
||||
adapters/runtime/claude_code.py
|
||||
Claude Code agent runtime adapter — Phase 2 stub.
|
||||
Claude Code sub-agent runtime adapter — Phase 2 implementation.
|
||||
|
||||
TODO (Phase 2):
|
||||
- Implement spawn() to launch a Claude Code sub-agent via the Agent SDK.
|
||||
- Implement get_result() to await agent completion and parse the output.
|
||||
- Implement kill() to terminate the sub-agent process or session.
|
||||
- Map task brief context (files, constraints, artifacts) into the agent's
|
||||
system prompt and tool context.
|
||||
- Handle Claude Code tool-use responses and extract structured output.
|
||||
Spawns the ``claude`` CLI as a non-interactive subprocess for T4/T5
|
||||
implementation tasks::
|
||||
|
||||
claude --permission-mode bypassPermissions --print "<task>"
|
||||
|
||||
Each spawned process is tracked by a UUID job_id so callers can later poll
|
||||
for the result or terminate the job. Stdout is captured and returned as the
|
||||
agent output; stderr is included for debugging.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import subprocess
|
||||
import tempfile
|
||||
import threading
|
||||
import uuid
|
||||
|
||||
from adapters.base.runtime import RuntimeAdapter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ClaudeCodeRuntimeAdapter(RuntimeAdapter):
|
||||
"""
|
||||
Runtime adapter that spawns Claude Code sub-agents for coding tasks.
|
||||
Runtime adapter that spawns ``claude`` CLI sub-agents for coding tasks.
|
||||
|
||||
Used when a TaskBrief has preferred_runtime == "coding_agent".
|
||||
Credentials are inherited from the environment (``ANTHROPIC_API_KEY``).
|
||||
The ``claude`` CLI must be installed and reachable on PATH.
|
||||
|
||||
Expects the Claude Code CLI / Agent SDK to be available in the environment.
|
||||
Credentials are inherited from the environment (ANTHROPIC_API_KEY).
|
||||
Used when a TaskBrief has ``preferred_runtime == "coding_agent"``.
|
||||
"""
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
||||
# Validate that Claude Code CLI or SDK is accessible.
|
||||
# Initialise any agent session management state.
|
||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.__init__ is not yet implemented.")
|
||||
"""
|
||||
Initialise the Claude Code runtime adapter.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : Loaded team.yaml config dict (reserved for future options).
|
||||
"""
|
||||
self._config = config
|
||||
# Maps job_id → running Popen instance.
|
||||
self._jobs: dict[str, subprocess.Popen] = {}
|
||||
self._lock = threading.Lock()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# RuntimeAdapter interface
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def spawn(self, task: str, capability: str, context: dict) -> str:
|
||||
# TODO (Phase 2): Launch a Claude Code sub-agent.
|
||||
# Compose a structured system prompt from task + context.
|
||||
# Inject relevant files and constraints as tool context.
|
||||
# Return an agent_id that maps to a running agent session.
|
||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.spawn is not yet implemented.")
|
||||
"""
|
||||
Launch ``claude --permission-mode bypassPermissions --print "<task>"``
|
||||
as a non-interactive subprocess.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
task : Full task description (typically a JSON-serialised brief).
|
||||
capability : Capability hint (not forwarded; Claude Code resolves its
|
||||
own model from the local environment).
|
||||
context : Optional keys:
|
||||
workdir (str) — cwd for the subprocess. A fresh
|
||||
temporary directory is created if omitted.
|
||||
|
||||
Returns
|
||||
-------
|
||||
A UUID job_id string that uniquely identifies this subprocess.
|
||||
"""
|
||||
workdir: str = context.get("workdir") or tempfile.mkdtemp(
|
||||
prefix="agency-claude-"
|
||||
)
|
||||
job_id = str(uuid.uuid4())
|
||||
logger.info("Spawning Claude Code job %s in %s", job_id, workdir)
|
||||
|
||||
proc = subprocess.Popen(
|
||||
["claude", "--permission-mode", "bypassPermissions", "--print", task],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
cwd=workdir,
|
||||
)
|
||||
|
||||
with self._lock:
|
||||
self._jobs[job_id] = proc
|
||||
|
||||
return job_id
|
||||
|
||||
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
||||
# TODO (Phase 2): Await the Claude Code agent session to complete.
|
||||
# Parse the agent's final message for structured JSON output.
|
||||
# Return dict with: {"status": ..., "output": ..., "artifacts": [...]}.
|
||||
# Raise TimeoutError if timeout_s elapses.
|
||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.get_result is not yet implemented.")
|
||||
"""
|
||||
Wait for the Claude Code subprocess to complete and return its output.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
agent_id : Job id returned by spawn().
|
||||
timeout_s : Maximum seconds to wait before raising TimeoutError.
|
||||
|
||||
Returns
|
||||
-------
|
||||
dict with keys:
|
||||
status ("completed" | "failed")
|
||||
output (str — full stdout)
|
||||
artifacts (list — always empty; callers must parse output)
|
||||
stderr (str — full stderr)
|
||||
|
||||
Raises
|
||||
------
|
||||
KeyError
|
||||
If agent_id does not correspond to a known job.
|
||||
TimeoutError
|
||||
If the subprocess does not finish within timeout_s seconds.
|
||||
"""
|
||||
with self._lock:
|
||||
proc = self._jobs.get(agent_id)
|
||||
|
||||
if proc is None:
|
||||
raise KeyError(f"No Claude Code job found for agent_id={agent_id!r}")
|
||||
|
||||
try:
|
||||
stdout, stderr = proc.communicate(timeout=timeout_s)
|
||||
except subprocess.TimeoutExpired:
|
||||
proc.kill()
|
||||
stdout, stderr = proc.communicate()
|
||||
raise TimeoutError(
|
||||
f"Claude Code job {agent_id!r} did not complete within {timeout_s}s."
|
||||
)
|
||||
|
||||
status = "completed" if proc.returncode == 0 else "failed"
|
||||
logger.info(
|
||||
"Claude Code job %s finished: status=%s returncode=%d",
|
||||
agent_id,
|
||||
status,
|
||||
proc.returncode,
|
||||
)
|
||||
|
||||
return {
|
||||
"status": status,
|
||||
"output": stdout,
|
||||
"artifacts": [],
|
||||
"stderr": stderr,
|
||||
}
|
||||
|
||||
def kill(self, agent_id: str) -> None:
|
||||
# TODO (Phase 2): Terminate the Claude Code agent session.
|
||||
# Clean up any temporary files or session state.
|
||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.kill is not yet implemented.")
|
||||
"""
|
||||
Terminate a running Claude Code subprocess.
|
||||
|
||||
Silently succeeds if the job has already finished or the id is unknown.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
agent_id : Job id returned by spawn().
|
||||
"""
|
||||
with self._lock:
|
||||
proc = self._jobs.get(agent_id)
|
||||
|
||||
if proc is not None:
|
||||
try:
|
||||
proc.terminate()
|
||||
logger.info("Terminated Claude Code job %s", agent_id)
|
||||
except OSError:
|
||||
pass # Process already gone — that is fine.
|
||||
|
||||
@@ -1,48 +1,241 @@
|
||||
"""
|
||||
adapters/runtime/openclaw.py
|
||||
OpenClaw agent runtime adapter — Phase 2 stub.
|
||||
OpenClaw agent runtime adapter — Phase 2 implementation.
|
||||
|
||||
TODO (Phase 2):
|
||||
- Implement spawn() to submit a task to an OpenClaw worker pool.
|
||||
- Implement get_result() to poll or subscribe for agent completion.
|
||||
- Implement kill() to cancel a running OpenClaw agent job.
|
||||
- Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
|
||||
- Map capability hint to an appropriate worker class/queue.
|
||||
Spawns sub-agents by shelling out to the ``openclaw`` CLI::
|
||||
|
||||
openclaw session spawn --task "<task>" --mode run
|
||||
openclaw session get <session_id>
|
||||
openclaw session kill <session_id>
|
||||
|
||||
If the ``openclaw`` binary is unavailable, all methods raise
|
||||
``NotImplementedError`` with a helpful message rather than crashing with a
|
||||
raw ``FileNotFoundError``.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
import subprocess
|
||||
import time
|
||||
|
||||
from adapters.base.runtime import RuntimeAdapter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Status strings from the openclaw CLI that indicate a session has finished.
|
||||
_TERMINAL_STATUSES = frozenset(
|
||||
{"done", "completed", "failed", "partial", "blocked", "error"}
|
||||
)
|
||||
|
||||
|
||||
class OpenClawRuntimeAdapter(RuntimeAdapter):
|
||||
"""
|
||||
Runtime adapter that dispatches agent tasks to OpenClaw workers.
|
||||
Runtime adapter that dispatches agent tasks to OpenClaw worker sessions.
|
||||
|
||||
Expects environment variables:
|
||||
OPENCLAW_API_KEY — authentication token
|
||||
OPENCLAW_URL — base URL for the OpenClaw API
|
||||
All interactions use the ``openclaw`` CLI. No additional credentials are
|
||||
required beyond what OpenClaw manages in the local environment.
|
||||
"""
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
||||
# Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
|
||||
# Initialise HTTP client and any job-tracking state.
|
||||
raise NotImplementedError("OpenClawRuntimeAdapter.__init__ is not yet implemented.")
|
||||
"""
|
||||
Initialise the OpenClaw runtime adapter.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : Loaded team.yaml config dict (reserved for future options).
|
||||
"""
|
||||
self._config = config
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# RuntimeAdapter interface
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def spawn(self, task: str, capability: str, context: dict) -> str:
|
||||
# TODO (Phase 2): Submit task to OpenClaw worker pool.
|
||||
# Map capability ("reasoning-heavy" | "capable" | "fast-cheap") to
|
||||
# an appropriate worker queue or model hint.
|
||||
# Return an agent_id string that can be used to poll for results.
|
||||
raise NotImplementedError("OpenClawRuntimeAdapter.spawn is not yet implemented.")
|
||||
"""
|
||||
Spawn an OpenClaw agent session for the given task.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
task : Natural-language task description.
|
||||
capability : Capability hint ("reasoning-heavy" | "capable" | "fast-cheap").
|
||||
Passed informally; actual routing is handled by OpenClaw.
|
||||
context : Arbitrary context bag (currently unused by this adapter).
|
||||
|
||||
Returns
|
||||
-------
|
||||
session_id string parsed from the CLI output.
|
||||
|
||||
Raises
|
||||
------
|
||||
NotImplementedError
|
||||
If the ``openclaw`` CLI is not available on PATH.
|
||||
RuntimeError
|
||||
If the session_id cannot be parsed from the CLI output.
|
||||
"""
|
||||
# TODO: map capability to an openclaw worker tier / model hint if the
|
||||
# openclaw CLI gains that flag in a future release.
|
||||
cmd = ["openclaw", "session", "spawn", "--task", task, "--mode", "run"]
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=True,
|
||||
)
|
||||
except FileNotFoundError:
|
||||
raise NotImplementedError(
|
||||
"openclaw CLI not found on PATH. "
|
||||
"Install OpenClaw or configure a different runtime adapter "
|
||||
"(e.g. adapters.runtime.claude_code.ClaudeCodeRuntimeAdapter)."
|
||||
)
|
||||
except subprocess.CalledProcessError as exc:
|
||||
raise RuntimeError(
|
||||
f"openclaw session spawn failed (exit {exc.returncode}): "
|
||||
f"{exc.stderr.strip()}"
|
||||
) from exc
|
||||
|
||||
return self._parse_session_id(result.stdout)
|
||||
|
||||
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
||||
# TODO (Phase 2): Poll or long-poll the OpenClaw API for job completion.
|
||||
# Raise TimeoutError if timeout_s elapses before the job finishes.
|
||||
# Return a dict with at minimum: {"status": ..., "output": ..., "artifacts": [...]}.
|
||||
raise NotImplementedError("OpenClawRuntimeAdapter.get_result is not yet implemented.")
|
||||
"""
|
||||
Poll ``openclaw session get`` until the session reaches a terminal
|
||||
state or *timeout_s* seconds elapse.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
agent_id : Session ID returned by spawn().
|
||||
timeout_s : Maximum seconds to wait before raising TimeoutError.
|
||||
|
||||
Returns
|
||||
-------
|
||||
dict with keys: ``status``, ``output``, ``artifacts``.
|
||||
|
||||
Raises
|
||||
------
|
||||
TimeoutError
|
||||
If the session does not finish within timeout_s seconds.
|
||||
NotImplementedError
|
||||
If the ``openclaw`` CLI is not available on PATH.
|
||||
"""
|
||||
deadline = time.monotonic() + timeout_s
|
||||
poll_interval = 2.0
|
||||
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["openclaw", "session", "get", agent_id],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=15,
|
||||
)
|
||||
except FileNotFoundError:
|
||||
raise NotImplementedError(
|
||||
"openclaw CLI not found on PATH. "
|
||||
"Install OpenClaw or switch to a different runtime adapter."
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.debug("openclaw session get timed out; will retry")
|
||||
time.sleep(poll_interval)
|
||||
continue
|
||||
|
||||
if result.returncode == 0 and result.stdout.strip():
|
||||
parsed = self._parse_get_output(result.stdout)
|
||||
if parsed.get("status", "").lower() in _TERMINAL_STATUSES:
|
||||
return parsed
|
||||
else:
|
||||
logger.debug(
|
||||
"openclaw session get returned exit=%d; retrying. stderr=%s",
|
||||
result.returncode,
|
||||
result.stderr.strip(),
|
||||
)
|
||||
|
||||
time.sleep(poll_interval)
|
||||
|
||||
raise TimeoutError(
|
||||
f"Agent {agent_id!r} did not complete within {timeout_s}s."
|
||||
)
|
||||
|
||||
def kill(self, agent_id: str) -> None:
|
||||
# TODO (Phase 2): Send a cancellation request to the OpenClaw API.
|
||||
# Silently succeed if the agent has already finished.
|
||||
raise NotImplementedError("OpenClawRuntimeAdapter.kill is not yet implemented.")
|
||||
"""
|
||||
Terminate an OpenClaw session unconditionally.
|
||||
|
||||
Silently succeeds if the session has already finished.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
agent_id : Session ID returned by spawn().
|
||||
|
||||
Raises
|
||||
------
|
||||
NotImplementedError
|
||||
If the ``openclaw`` CLI is not available on PATH.
|
||||
"""
|
||||
try:
|
||||
subprocess.run(
|
||||
["openclaw", "session", "kill", agent_id],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=15,
|
||||
)
|
||||
except FileNotFoundError:
|
||||
raise NotImplementedError(
|
||||
"openclaw CLI not found on PATH. "
|
||||
"Install OpenClaw or switch to a different runtime adapter."
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning("openclaw session kill timed out for agent %s", agent_id)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Private helpers
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _parse_session_id(self, output: str) -> str:
|
||||
"""Extract a session_id from the raw stdout of ``openclaw session spawn``."""
|
||||
output = output.strip()
|
||||
|
||||
# Prefer structured JSON output.
|
||||
try:
|
||||
data = json.loads(output)
|
||||
for key in ("session_id", "sessionId", "id"):
|
||||
if key in data:
|
||||
return str(data[key])
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
# Regex: look for "session_id: <id>" or similar.
|
||||
m = re.search(
|
||||
r"(?:session[_\s]?id|sessionId)[:\s]+([a-zA-Z0-9_\-]+)",
|
||||
output,
|
||||
re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
return m.group(1)
|
||||
|
||||
# Last resort: return the first non-empty line.
|
||||
lines = [ln.strip() for ln in output.splitlines() if ln.strip()]
|
||||
if lines:
|
||||
return lines[0]
|
||||
|
||||
raise RuntimeError(
|
||||
f"Could not parse session_id from openclaw output: {output!r}"
|
||||
)
|
||||
|
||||
def _parse_get_output(self, output: str) -> dict:
|
||||
"""Parse the stdout of ``openclaw session get`` into a result dict."""
|
||||
output = output.strip()
|
||||
try:
|
||||
data = json.loads(output)
|
||||
return {
|
||||
"status": data.get("status", "done"),
|
||||
"output": data.get("output", output),
|
||||
"artifacts": data.get("artifacts", []),
|
||||
}
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
# Non-JSON output — treat as completed with raw text output.
|
||||
return {
|
||||
"status": "done",
|
||||
"output": output,
|
||||
"artifacts": [],
|
||||
}
|
||||
|
||||
@@ -1,16 +1,30 @@
|
||||
"""
|
||||
adapters/vcs/github.py
|
||||
GitHub VCS adapter — Phase 2 stub.
|
||||
GitHub VCS adapter — Phase 2 implementation.
|
||||
|
||||
TODO (Phase 2):
|
||||
- Implement create_branch() using PyGithub or gh CLI subprocess.
|
||||
- Implement commit() — stage files and push via git subprocess or API.
|
||||
- Implement create_pr() using GitHub REST API (POST /repos/{owner}/{repo}/pulls).
|
||||
- Implement get_pr_status() using GET /repos/{owner}/{repo}/pulls/{pull_number}.
|
||||
- Read repo and credentials from config/team.yaml and environment (GITHUB_TOKEN).
|
||||
Uses PyGithub (``pip install PyGithub``) to interact with the GitHub REST API.
|
||||
Reads the repository URL and base branch from the team.yaml config dict.
|
||||
|
||||
Note on commit() signature
|
||||
--------------------------
|
||||
The base class declares ``commit(files: list[str], message: str)``, which is
|
||||
insufficient for the GitHub Contents API (which requires file *content*, not
|
||||
just paths). This implementation extends the signature to accept either:
|
||||
|
||||
* ``dict[str, str]`` — ``{path: content}`` mapping (preferred; uses the API).
|
||||
* ``list[str]`` — local file paths; content is read from disk and pushed.
|
||||
|
||||
The optional ``branch`` keyword argument targets a specific branch; it
|
||||
defaults to the configured base branch.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import re
|
||||
from typing import Union
|
||||
|
||||
from github import Github, GithubException
|
||||
|
||||
from adapters.base.vcs import VCSAdapter
|
||||
|
||||
|
||||
@@ -18,34 +32,175 @@ class GitHubAdapter(VCSAdapter):
|
||||
"""
|
||||
VCS adapter for GitHub repositories.
|
||||
|
||||
Expects environment variable GITHUB_TOKEN and config values:
|
||||
run.repo — SSH or HTTPS clone URL
|
||||
run.base_branch — default base branch (e.g. "main")
|
||||
Authenticates via GITHUB_TOKEN and interacts with the GitHub REST API
|
||||
through PyGithub.
|
||||
|
||||
Environment variables
|
||||
---------------------
|
||||
GITHUB_TOKEN : Required. Personal access token or GitHub App installation token.
|
||||
|
||||
Config keys (from team.yaml)
|
||||
----------------------------
|
||||
run.repo : SSH or HTTPS clone URL (e.g. "git@github.com:org/repo.git").
|
||||
run.base_branch : Default base branch (e.g. "main").
|
||||
"""
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
||||
# Extract GITHUB_TOKEN from environment.
|
||||
# Parse owner/repo from config.run.repo.
|
||||
raise NotImplementedError("GitHubAdapter.__init__ is not yet implemented.")
|
||||
"""
|
||||
Initialise the GitHub adapter.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : Loaded team.yaml config dict.
|
||||
|
||||
Raises
|
||||
------
|
||||
ValueError
|
||||
If GITHUB_TOKEN is not set or the repo URL cannot be parsed.
|
||||
"""
|
||||
self._config = config
|
||||
token = os.environ.get("GITHUB_TOKEN")
|
||||
if not token:
|
||||
raise ValueError(
|
||||
"GITHUB_TOKEN environment variable is not set. "
|
||||
"Create a personal access token and export it before running the-agency."
|
||||
)
|
||||
self._g = Github(token)
|
||||
|
||||
run_cfg: dict = config.get("run", {})
|
||||
repo_url: str = run_cfg.get("repo", "")
|
||||
self._base_branch: str = run_cfg.get("base_branch", "main")
|
||||
|
||||
self._owner, self._repo_name = self._parse_repo_url(repo_url)
|
||||
self._repo = self._g.get_repo(f"{self._owner}/{self._repo_name}")
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _parse_repo_url(self, url: str) -> tuple[str, str]:
|
||||
"""Parse *owner* and *repo* name from an SSH or HTTPS GitHub URL."""
|
||||
# git@github.com:owner/repo.git
|
||||
m = re.match(r"git@github\.com:([^/]+)/([^/]+?)(?:\.git)?$", url)
|
||||
if m:
|
||||
return m.group(1), m.group(2)
|
||||
# https://github.com/owner/repo[.git]
|
||||
m = re.match(r"https?://github\.com/([^/]+)/([^/]+?)(?:\.git)?/?$", url)
|
||||
if m:
|
||||
return m.group(1), m.group(2)
|
||||
raise ValueError(
|
||||
f"Cannot parse GitHub owner/repo from URL: {url!r}. "
|
||||
"Expected SSH (git@github.com:owner/repo.git) or "
|
||||
"HTTPS (https://github.com/owner/repo.git) format."
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# VCSAdapter interface
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def create_branch(self, name: str) -> None:
|
||||
# TODO (Phase 2): Create branch via GitHub API or local git subprocess.
|
||||
# Use config.run.base_branch as the branch point.
|
||||
raise NotImplementedError("GitHubAdapter.create_branch is not yet implemented.")
|
||||
"""
|
||||
Create a new branch off ``self._base_branch`` on the remote.
|
||||
|
||||
def commit(self, files: list[str], message: str) -> str:
|
||||
# TODO (Phase 2): Stage files (git add), create commit (git commit), push.
|
||||
# Return the resulting commit SHA.
|
||||
raise NotImplementedError("GitHubAdapter.commit is not yet implemented.")
|
||||
Parameters
|
||||
----------
|
||||
name : New branch name (e.g. "feat/webhook-ingestion").
|
||||
"""
|
||||
base_ref = self._repo.get_git_ref(f"heads/{self._base_branch}")
|
||||
self._repo.create_git_ref(f"refs/heads/{name}", base_ref.object.sha)
|
||||
|
||||
def commit(
|
||||
self,
|
||||
files: Union[dict[str, str], list[str]],
|
||||
message: str,
|
||||
branch: str | None = None,
|
||||
) -> str:
|
||||
"""
|
||||
Commit files to the repository via the GitHub Contents API.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
files : Either a ``dict[path, content]`` mapping (preferred), or a
|
||||
``list[path]`` of local file paths whose content is read from
|
||||
disk.
|
||||
message : Commit message.
|
||||
branch : Target branch. Defaults to ``self._base_branch``.
|
||||
|
||||
Returns
|
||||
-------
|
||||
SHA of the last created/updated commit, or empty string if no files
|
||||
were committed.
|
||||
"""
|
||||
target_branch = branch or self._base_branch
|
||||
|
||||
# Normalise to {path: content}
|
||||
if isinstance(files, list):
|
||||
files_dict: dict[str, str] = {}
|
||||
for path in files:
|
||||
with open(path, "r", encoding="utf-8") as fh:
|
||||
files_dict[path] = fh.read()
|
||||
else:
|
||||
files_dict = files
|
||||
|
||||
last_sha: str = ""
|
||||
for path, content in files_dict.items():
|
||||
try:
|
||||
existing = self._repo.get_contents(path, ref=target_branch)
|
||||
result = self._repo.update_file(
|
||||
path=path,
|
||||
message=message,
|
||||
content=content,
|
||||
sha=existing.sha, # type: ignore[union-attr]
|
||||
branch=target_branch,
|
||||
)
|
||||
except GithubException:
|
||||
# File does not exist yet — create it
|
||||
result = self._repo.create_file(
|
||||
path=path,
|
||||
message=message,
|
||||
content=content,
|
||||
branch=target_branch,
|
||||
)
|
||||
last_sha = result["commit"].sha
|
||||
|
||||
return last_sha
|
||||
|
||||
def create_pr(self, title: str, body: str, head: str, base: str) -> str:
|
||||
# TODO (Phase 2): POST to GitHub API /repos/{owner}/{repo}/pulls.
|
||||
# Return the HTML URL of the created PR.
|
||||
raise NotImplementedError("GitHubAdapter.create_pr is not yet implemented.")
|
||||
"""
|
||||
Open a pull request on GitHub.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
title : PR title.
|
||||
body : PR description / body markdown.
|
||||
head : Head branch name (the branch with changes).
|
||||
base : Base branch name (e.g. "main").
|
||||
|
||||
Returns
|
||||
-------
|
||||
HTML URL of the created pull request.
|
||||
"""
|
||||
pr = self._repo.create_pull(
|
||||
title=title,
|
||||
body=body,
|
||||
head=head,
|
||||
base=base,
|
||||
)
|
||||
return pr.html_url
|
||||
|
||||
def get_pr_status(self, pr_id: str) -> str:
|
||||
# TODO (Phase 2): GET /repos/{owner}/{repo}/pulls/{number}.
|
||||
# Map GitHub PR state ("open", "closed") + merged flag to
|
||||
# our schema: "open" | "merged" | "closed".
|
||||
raise NotImplementedError("GitHubAdapter.get_pr_status is not yet implemented.")
|
||||
"""
|
||||
Fetch the current status of a pull request.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
pr_id : Pull request number as a string (e.g. "42").
|
||||
|
||||
Returns
|
||||
-------
|
||||
One of: "open" | "merged" | "closed".
|
||||
"""
|
||||
pr = self._repo.get_pull(int(pr_id))
|
||||
if pr.merged:
|
||||
return "merged"
|
||||
return pr.state # "open" or "closed"
|
||||
|
||||
2
agents
2
agents
Submodule agents updated: 5c669c28e6...5f1204a023
@@ -2,33 +2,49 @@ t1:
|
||||
default: agents/strategy/nexus-strategy.md
|
||||
|
||||
t2:
|
||||
backend: agents/engineering/engineering-software-architect.md
|
||||
frontend: agents/engineering/engineering-software-architect.md
|
||||
backend: agents/engineering/engineering-backend-architect.md
|
||||
frontend: agents/engineering/engineering-frontend-architect.md
|
||||
infra: agents/engineering/engineering-devops-automator.md
|
||||
data: agents/engineering/engineering-data-engineer.md
|
||||
ai: agents/engineering/engineering-software-architect.md
|
||||
security: agents/engineering/engineering-security-engineer.md
|
||||
mobile: agents/engineering/engineering-software-architect.md
|
||||
default: agents/engineering/engineering-software-architect.md
|
||||
|
||||
t3:
|
||||
backend: agents/engineering/engineering-senior-developer.md
|
||||
frontend: agents/engineering/engineering-senior-developer.md
|
||||
backend: agents/engineering/engineering-senior-backend-developer.md
|
||||
frontend: agents/engineering/engineering-senior-frontend-developer.md
|
||||
infra: agents/engineering/engineering-sre.md
|
||||
default: agents/engineering/engineering-senior-developer.md
|
||||
data: agents/engineering/engineering-data-engineer.md
|
||||
ai: agents/engineering/engineering-ai-engineer.md
|
||||
security: agents/engineering/engineering-security-engineer.md
|
||||
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||
database: agents/engineering/engineering-database-optimizer.md
|
||||
devops: agents/engineering/engineering-sre.md
|
||||
docs: agents/engineering/engineering-technical-writer.md
|
||||
default: agents/engineering/engineering-backend-developer.md
|
||||
|
||||
t4:
|
||||
frontend: agents/engineering/engineering-frontend-developer.md
|
||||
backend: agents/engineering/engineering-backend-architect.md
|
||||
backend: agents/engineering/engineering-backend-developer.md
|
||||
database: agents/engineering/engineering-database-optimizer.md
|
||||
devops: agents/engineering/engineering-devops-automator.md
|
||||
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||
ai: agents/engineering/engineering-ai-engineer.md
|
||||
security: agents/engineering/engineering-security-engineer.md
|
||||
docs: agents/engineering/engineering-technical-writer.md
|
||||
default: agents/engineering/engineering-senior-developer.md
|
||||
data: agents/engineering/engineering-data-engineer.md
|
||||
embedded: agents/engineering/engineering-embedded-firmware-engineer.md
|
||||
default: agents/engineering/engineering-backend-developer.md
|
||||
|
||||
t5:
|
||||
code: agents/engineering/engineering-code-reviewer.md
|
||||
integration: agents/testing/testing-reality-checker.md
|
||||
api: agents/testing/testing-api-tester.md
|
||||
performance: agents/testing/testing-performance-benchmarker.md
|
||||
security: agents/engineering/engineering-security-engineer.md
|
||||
default: agents/engineering/engineering-code-reviewer.md
|
||||
code: agents/engineering/engineering-code-reviewer.md
|
||||
integration: agents/testing/testing-reality-checker.md
|
||||
api: agents/testing/testing-api-tester.md
|
||||
performance: agents/testing/testing-performance-benchmarker.md
|
||||
security: agents/engineering/engineering-security-engineer.md
|
||||
accessibility: agents/testing/testing-accessibility-auditor.md
|
||||
e2e: agents/testing/testing-evidence-collector.md
|
||||
frontend: agents/testing/testing-accessibility-auditor.md
|
||||
data: agents/testing/testing-reality-checker.md
|
||||
default: agents/engineering/engineering-code-reviewer.md
|
||||
|
||||
507
docs/buildspec.md
Normal file
507
docs/buildspec.md
Normal file
@@ -0,0 +1,507 @@
|
||||
# Tiered Agent Team System — Build Spec
|
||||
|
||||
_Started: 2026-03-15. Last updated: 2026-03-30._
|
||||
_See design.md for the design doc and decisions log._
|
||||
|
||||
---
|
||||
|
||||
## Language & Runtime
|
||||
|
||||
**Python 3.11+.** Reasons:
|
||||
- Agent/AI tooling is Python-first
|
||||
- Clean type hints + dataclasses for schemas
|
||||
- Agents can read and modify their own orchestration code
|
||||
- Runs anywhere — no Node, no OpenClaw dependency
|
||||
|
||||
---
|
||||
|
||||
## Repository
|
||||
|
||||
Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`
|
||||
|
||||
Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
agent-teams/
|
||||
├── core/
|
||||
│ ├── team_runner.py — run lifecycle, agent spawning
|
||||
│ ├── blackboard.py — SQLite coordination state
|
||||
│ ├── task_brief.py — schema + validation
|
||||
│ └── escalation.py — retry logic, failure routing
|
||||
│
|
||||
├── adapters/
|
||||
│ ├── base/
|
||||
│ │ ├── llm.py — abstract LLM interface
|
||||
│ │ ├── vcs.py — abstract VCS interface
|
||||
│ │ ├── notify.py — abstract notification interface
|
||||
│ │ └── runtime.py — abstract agent runtime interface
|
||||
│ ├── llm/
|
||||
│ │ ├── anthropic.py — Claude via direct Anthropic API
|
||||
│ │ ├── openai.py — GPT / o-series
|
||||
│ │ └── ollama.py — local models
|
||||
│ ├── vcs/
|
||||
│ │ └── github.py
|
||||
│ ├── notify/
|
||||
│ │ └── openclaw.py — messages Hans who notifies Andrew
|
||||
│ └── runtime/
|
||||
│ ├── openclaw.py — sessions_spawn (general purpose)
|
||||
│ └── claude_code.py — coding agent runtime (file/git/exec tools)
|
||||
│
|
||||
├── agents/ — git submodule: msitarzewski/agency-agents
|
||||
│ ├── engineering/
|
||||
│ ├── testing/
|
||||
│ ├── strategy/
|
||||
│ └── ... — full agency-agents roster
|
||||
│
|
||||
├── prompts/
|
||||
│ ├── t1_visionary.md — fallback if no agent_personality set
|
||||
│ ├── t2_architect.md
|
||||
│ ├── t3_squad_lead.md
|
||||
│ ├── t4_implementer.md
|
||||
│ └── t5_verifier.md
|
||||
│
|
||||
├── config/
|
||||
│ ├── team.yaml — example run configuration
|
||||
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
|
||||
│
|
||||
├── cli/
|
||||
│ └── agency.py — run, watch, inspect, approve, reject, pause, resume
|
||||
│
|
||||
├── runs/ — runtime state, one subdir per run_id
|
||||
│ └── .gitkeep
|
||||
│
|
||||
└── README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Blackboard
|
||||
|
||||
SQLite. One file per run at `runs/<run_id>/blackboard.db`.
|
||||
|
||||
### Tables
|
||||
|
||||
**runs**
|
||||
```sql
|
||||
CREATE TABLE runs (
|
||||
run_id TEXT PRIMARY KEY,
|
||||
goal TEXT NOT NULL,
|
||||
status TEXT NOT NULL, -- pending | active | review | done | failed
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
**workstreams**
|
||||
```sql
|
||||
CREATE TABLE workstreams (
|
||||
workstream_id TEXT PRIMARY KEY,
|
||||
run_id TEXT NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
tier INTEGER NOT NULL,
|
||||
status TEXT NOT NULL, -- pending | active | blocked | done | failed
|
||||
owner_agent_id TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
**briefs**
|
||||
```sql
|
||||
CREATE TABLE briefs (
|
||||
brief_id TEXT PRIMARY KEY,
|
||||
run_id TEXT NOT NULL,
|
||||
parent_brief_id TEXT,
|
||||
workstream_id TEXT,
|
||||
tier INTEGER NOT NULL,
|
||||
role TEXT NOT NULL,
|
||||
status TEXT NOT NULL, -- pending | active | done | failed
|
||||
payload TEXT NOT NULL, -- full JSON brief
|
||||
result TEXT, -- JSON result when done
|
||||
retry_count INTEGER DEFAULT 0,
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
**events**
|
||||
```sql
|
||||
CREATE TABLE events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
run_id TEXT NOT NULL,
|
||||
brief_id TEXT,
|
||||
kind TEXT NOT NULL, -- see event vocabulary below
|
||||
detail TEXT, -- JSON
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
**Event kind vocabulary:**
|
||||
```
|
||||
-- lifecycle
|
||||
spawned | completed | failed | escalated | retried
|
||||
|
||||
-- visibility / gates
|
||||
gate_pending -- runner hit an inspection gate, waiting for human
|
||||
gate_approved -- human approved via CLI or notify
|
||||
gate_rejected -- human rejected, tier re-invoked
|
||||
gate_paused -- manual pause via CLI
|
||||
gate_resumed -- manual resume via CLI
|
||||
|
||||
-- amendments / informational
|
||||
path_amendment -- mid-run tier proposed a tier path change
|
||||
log -- human-readable log line (detail: {level, message})
|
||||
```
|
||||
|
||||
**t3_task_lists** *(T3 mesh coordination)*
|
||||
```sql
|
||||
CREATE TABLE t3_task_lists (
|
||||
entry_id TEXT PRIMARY KEY,
|
||||
run_id TEXT NOT NULL,
|
||||
workstream_id TEXT NOT NULL,
|
||||
t3_agent_id TEXT NOT NULL,
|
||||
status TEXT NOT NULL, -- draft | committed
|
||||
tasks TEXT NOT NULL, -- JSON array of proposed T4 task descriptors
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task Brief Schema
|
||||
|
||||
Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
|
||||
|
||||
```json
|
||||
{
|
||||
"brief_id": "uuid",
|
||||
"run_id": "uuid",
|
||||
"parent_brief_id": "uuid | null",
|
||||
"tier": 4,
|
||||
"role": "implementer",
|
||||
"goal_anchor": "Original T1 intent — always propagated unchanged",
|
||||
"workstream": "backend-api",
|
||||
"task": "Implement POST /webhooks/ingest endpoint",
|
||||
"acceptance_criteria": [
|
||||
"Accepts JSON payload",
|
||||
"Returns 202 on success",
|
||||
"Writes to queue"
|
||||
],
|
||||
"constraints": [
|
||||
"Use existing queue client in src/queue.py",
|
||||
"No new dependencies"
|
||||
],
|
||||
"context": {
|
||||
"relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
|
||||
"interface_contract": "..."
|
||||
},
|
||||
"retry_budget": 3,
|
||||
"retry_count": 0,
|
||||
"preferred_runtime": "coding_agent",
|
||||
"agent_personality": "agents/engineering/engineering-code-reviewer.md",
|
||||
"created_at": "ISO-8601"
|
||||
}
|
||||
```
|
||||
|
||||
`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
|
||||
|
||||
`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
|
||||
|
||||
```
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adapter Interfaces
|
||||
|
||||
### LLM (`adapters/base/llm.py`)
|
||||
```python
|
||||
class LLMAdapter:
|
||||
def complete(self, prompt: str, capability: str, context: dict) -> str
|
||||
def resolve_model(self, capability: str) -> str
|
||||
# capability: "reasoning-heavy" | "capable" | "fast-cheap"
|
||||
```
|
||||
|
||||
### VCS (`adapters/base/vcs.py`)
|
||||
```python
|
||||
class VCSAdapter:
|
||||
def create_branch(self, name: str) -> None
|
||||
def commit(self, files: list[str], message: str) -> str # returns commit sha
|
||||
def create_pr(self, title: str, body: str, head: str, base: str) -> str # returns pr url
|
||||
def get_pr_status(self, pr_id: str) -> str # open | merged | closed
|
||||
```
|
||||
|
||||
### Notify (`adapters/base/notify.py`)
|
||||
```python
|
||||
class NotifyAdapter:
|
||||
def send(self, message: str, context: dict) -> None
|
||||
```
|
||||
|
||||
### Runtime (`adapters/base/runtime.py`)
|
||||
```python
|
||||
class RuntimeAdapter:
|
||||
def spawn(self, task: str, capability: str, context: dict) -> str # returns agent_id
|
||||
def get_result(self, agent_id: str, timeout_s: int) -> dict
|
||||
def kill(self, agent_id: str) -> None
|
||||
|
||||
# Two implementations:
|
||||
# openclaw.py — general purpose, uses sessions_spawn, suits T1/T2/T3
|
||||
# claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
|
||||
#
|
||||
# The runner selects runtime based on brief.preferred_runtime:
|
||||
# "standard" → openclaw.py (default)
|
||||
# "coding_agent" → claude_code.py (falls back to standard if unavailable)
|
||||
#
|
||||
# Both implementations inject brief.agent_personality as the system prompt
|
||||
# when spawning, if present. Falls back to generic tier prompt otherwise.
|
||||
# claude_code.py passes the agent file via --system-prompt flag natively
|
||||
# (agency-agents was designed for Claude Code's agents/ directory).
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Run Config (`config/team.yaml`)
|
||||
|
||||
```yaml
|
||||
run:
|
||||
goal: "Build webhook ingestion system with retry logic and DLQ"
|
||||
repo: "git@github.com:org/repo.git"
|
||||
base_branch: "main"
|
||||
|
||||
adapters:
|
||||
llm: anthropic
|
||||
vcs: github
|
||||
notify: openclaw
|
||||
runtime: openclaw
|
||||
|
||||
models:
|
||||
provider: anthropic # default provider
|
||||
capability_map:
|
||||
reasoning-heavy:
|
||||
anthropic: claude-opus-4-6
|
||||
openai: o3
|
||||
capable:
|
||||
anthropic: claude-sonnet-4-6
|
||||
openai: gpt-4o
|
||||
ollama: llama3.1:70b
|
||||
fast-cheap:
|
||||
anthropic: claude-haiku-3-5
|
||||
openai: gpt-4o-mini
|
||||
ollama: llama3.2
|
||||
|
||||
# optional: override provider per tier
|
||||
tier_overrides:
|
||||
t1: { provider: openai, capability: reasoning-heavy }
|
||||
t4: { provider: ollama, capability: fast-cheap }
|
||||
|
||||
runtime:
|
||||
default: openclaw
|
||||
coding_agent: claude_code # used for T4/T5 when available; omit to disable
|
||||
native_teams: false # Claude Code's experimental agent teams — opt-in only
|
||||
# when true: T3 hands full workstream to Claude Code,
|
||||
# which fans out internally. faster but less blackboard
|
||||
# visibility. default: false (explicit T4 spawning)
|
||||
# tier_runtime_map (optional overrides):
|
||||
# t1: standard
|
||||
# t2: standard
|
||||
# t3: standard
|
||||
# t4: coding_agent
|
||||
# t5: coding_agent
|
||||
|
||||
retry_defaults:
|
||||
bad_output: 3
|
||||
partial: 2
|
||||
blocked: 0 # always escalate immediately
|
||||
|
||||
visibility:
|
||||
strict_mode: false # true = all gates on (recommended for first runs)
|
||||
log_level: normal # normal | verbose (verbose = per-T4 start/done lines)
|
||||
inspection_gates:
|
||||
t1_plan: true # always — required by design
|
||||
t2_lead: false # optional — review boundaries before specialists spawn
|
||||
t2_synthesis: true # recommended — review architecture before implementation
|
||||
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||
gate_timeout_minutes: 60 # auto-reject if no human response within this window
|
||||
|
||||
t3_mesh_timeout_minutes: 10 # max time for T3s to commit task lists before runner escalates
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Role Registry (`config/role_registry.yaml`)
|
||||
|
||||
Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
|
||||
|
||||
```yaml
|
||||
t1:
|
||||
default: agents/strategy/nexus-strategy.md
|
||||
|
||||
t2:
|
||||
backend: agents/engineering/engineering-software-architect.md
|
||||
frontend: agents/engineering/engineering-software-architect.md
|
||||
infra: agents/engineering/engineering-devops-automator.md
|
||||
data: agents/engineering/engineering-data-engineer.md
|
||||
default: agents/engineering/engineering-software-architect.md
|
||||
|
||||
t3:
|
||||
backend: agents/engineering/engineering-senior-developer.md
|
||||
frontend: agents/engineering/engineering-senior-developer.md
|
||||
infra: agents/engineering/engineering-sre.md
|
||||
default: agents/engineering/engineering-senior-developer.md
|
||||
|
||||
t4:
|
||||
frontend: agents/engineering/engineering-frontend-developer.md
|
||||
backend: agents/engineering/engineering-backend-architect.md
|
||||
database: agents/engineering/engineering-database-optimizer.md
|
||||
devops: agents/engineering/engineering-devops-automator.md
|
||||
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||
ai: agents/engineering/engineering-ai-engineer.md
|
||||
security: agents/engineering/engineering-security-engineer.md
|
||||
docs: agents/engineering/engineering-technical-writer.md
|
||||
default: agents/engineering/engineering-senior-developer.md
|
||||
|
||||
t5:
|
||||
code: agents/engineering/engineering-code-reviewer.md
|
||||
integration: agents/testing/testing-reality-checker.md
|
||||
api: agents/testing/testing-api-tester.md
|
||||
performance: agents/testing/testing-performance-benchmarker.md
|
||||
security: agents/engineering/engineering-security-engineer.md
|
||||
default: agents/engineering/engineering-code-reviewer.md
|
||||
```
|
||||
|
||||
```yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Flows
|
||||
|
||||
### 1. Run Kickoff
|
||||
|
||||
```
|
||||
User → team_runner.start(goal, config) # via CLI or any caller
|
||||
→ generate run_id
|
||||
→ init blackboard (create runs/<run_id>/blackboard.db)
|
||||
→ build T1 brief (goal_anchor = goal, retry_budget from config)
|
||||
→ spawn T1 via runtime adapter
|
||||
→ await T1 workplan
|
||||
```
|
||||
|
||||
### 2. T1 Scope Assessment
|
||||
|
||||
```
|
||||
T1 receives brief
|
||||
→ assess complexity → decide depth
|
||||
→ identify workstreams
|
||||
→ set retry_budget multiplier per workstream (1x simple, 2x complex)
|
||||
→ emit N workstream briefs for T2 (or T3 if shallow)
|
||||
→ write workplan to blackboard
|
||||
→ team_runner spawns T2s in parallel
|
||||
```
|
||||
|
||||
### 3. T4 Retry Loop (escalation.py)
|
||||
|
||||
```
|
||||
spawn T4 with brief
|
||||
→ receive result
|
||||
→ classify: bad_output | blocked | partial | success
|
||||
|
||||
blocked:
|
||||
→ log event(escalated)
|
||||
→ pass to T3 immediately
|
||||
|
||||
bad_output, retries_remaining:
|
||||
→ amend brief with failure context, increment retry_count
|
||||
→ re-spawn T4
|
||||
→ log event(retried)
|
||||
|
||||
bad_output, retries_exhausted:
|
||||
→ log event(escalated)
|
||||
→ pass to T3
|
||||
|
||||
partial:
|
||||
→ write salvageable parts to blackboard
|
||||
→ re-task remainder with new brief
|
||||
|
||||
success:
|
||||
→ write result to blackboard
|
||||
→ log event(completed)
|
||||
→ notify T3
|
||||
```
|
||||
|
||||
### 4. Inspection Gate Flow
|
||||
|
||||
```
|
||||
runner reaches configured gate (e.g. t2_synthesis)
|
||||
→ write event(gate_pending, detail={tier, summary, what_happens_next})
|
||||
→ notify_adapter.send(tier summary + gate context)
|
||||
→ halt: poll blackboard for gate_approved or gate_rejected
|
||||
|
||||
gate_approved:
|
||||
→ write event(gate_approved)
|
||||
→ continue run
|
||||
|
||||
gate_rejected:
|
||||
→ write event(gate_rejected, detail={reason})
|
||||
→ re-invoke tier with rejection reason in brief context
|
||||
→ loop back to gate_pending when tier completes again
|
||||
|
||||
gate_timeout (gate_timeout_minutes elapsed):
|
||||
→ treat as gate_rejected
|
||||
→ notify Andrew: "Gate timed out, re-invoking tier"
|
||||
```
|
||||
|
||||
### 5. Review Gate
|
||||
|
||||
```
|
||||
T1 completes integration
|
||||
→ vcs_adapter.create_pr(
|
||||
title="[agent-teams] <run_id>: <goal summary>",
|
||||
body="<workplan + workstream summaries>",
|
||||
head="integration/<run_id>",
|
||||
base="main"
|
||||
)
|
||||
→ notify_adapter.send(
|
||||
"Run <run_id> complete. PR ready for review: <pr_url>",
|
||||
context={run_id, goal, workstreams, pr_url}
|
||||
)
|
||||
→ blackboard: update run status → "review"
|
||||
→ halt — no auto-merge
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build Order
|
||||
|
||||
1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
|
||||
2. `config/role_registry.yaml` — map tier+domain → agent personality files
|
||||
3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
|
||||
4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
|
||||
5. `adapters/base/*` — all four abstract interfaces
|
||||
6. `adapters/llm/anthropic.py` — first LLM implementation
|
||||
7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
|
||||
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
|
||||
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
|
||||
10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
|
||||
11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
|
||||
12. `prompts/` — fallback tier prompts (used when no agent_personality set)
|
||||
13. `adapters/vcs/github.py` — PR creation + branch management
|
||||
14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
|
||||
15. `config/team.yaml` — example config with full visibility block
|
||||
16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope (Phase 2)
|
||||
|
||||
- Cost accounting per tier + run rollup
|
||||
- Parallel workstream progress dashboard
|
||||
- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
|
||||
- Persistent standing teams
|
||||
- Web UI for run monitoring
|
||||
681
docs/design.md
Normal file
681
docs/design.md
Normal file
@@ -0,0 +1,681 @@
|
||||
# Tiered Agent Team System — Design Document
|
||||
|
||||
_Started: 2026-03-14. Last updated: 2026-03-30._
|
||||
|
||||
---
|
||||
|
||||
## Resolved Design Decisions (formerly Open Questions)
|
||||
|
||||
All eight open questions resolved 2026-03-30. Details in Decisions Log.
|
||||
|
||||
1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
|
||||
|
||||
2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
|
||||
|
||||
3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
|
||||
|
||||
4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
|
||||
|
||||
5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
|
||||
|
||||
6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
|
||||
|
||||
7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
|
||||
|
||||
8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
|
||||
|
||||
---
|
||||
|
||||
## Core Principles
|
||||
|
||||
**1. Tiers represent cognitive modes, not org chart levels.**
|
||||
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
|
||||
|
||||
**2. Depth is proportional to complexity.**
|
||||
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
|
||||
|
||||
**3. Goal anchoring at every level.**
|
||||
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
|
||||
|
||||
**4. Artifacts, not summaries.**
|
||||
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
|
||||
|
||||
**5. Verification is mandatory.**
|
||||
T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
|
||||
|
||||
**6. Provider agnostic.**
|
||||
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
|
||||
|
||||
**7. Specialist talent pool.**
|
||||
Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
|
||||
|
||||
---
|
||||
|
||||
## Tier Definitions
|
||||
|
||||
| Tier | Role | Owns | Capability Level |
|
||||
|------|------|------|-----------------|
|
||||
| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
|
||||
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
|
||||
| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
|
||||
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
|
||||
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
|
||||
|
||||
T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
|
||||
|
||||
Capability levels map to actual models per provider in config — the core system never references a specific model name.
|
||||
|
||||
---
|
||||
|
||||
## Dispatch Model
|
||||
|
||||
### T1 Owns the Plan
|
||||
|
||||
T1 is not just a decomposer — it is the dispatch planner. Its output declares:
|
||||
|
||||
- **Workstreams** — the decomposed units of work
|
||||
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
|
||||
- **Parallelism** — which workstreams are independent and can run concurrently
|
||||
|
||||
T1 does not prescribe how each tier operates internally. That is the tier's own concern.
|
||||
|
||||
### T1 Lifecycle — Two Explicit Phases
|
||||
|
||||
T1 is invoked twice per run, each with a distinct prompt and purpose:
|
||||
|
||||
**Phase 1 — Plan:**
|
||||
1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
|
||||
2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
|
||||
3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
|
||||
|
||||
**Phase 2 — Accept:**
|
||||
After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
|
||||
|
||||
Both phases are named explicitly in the task brief schema and tracked on the blackboard.
|
||||
|
||||
### Each Tier Owns the Layer Below
|
||||
|
||||
Control flow is distributed, not centralised:
|
||||
|
||||
- T1 manages its T2s
|
||||
- T2 Lead manages T2 specialists and their domain boundaries
|
||||
- T2 specialists each own their T3s
|
||||
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
|
||||
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
|
||||
|
||||
This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
|
||||
|
||||
**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
|
||||
|
||||
### Dynamic Paths
|
||||
|
||||
Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
|
||||
|
||||
---
|
||||
|
||||
## Orchestration Patterns Per Tier
|
||||
|
||||
Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
|
||||
|
||||
| Tier | Pattern | Rationale |
|
||||
|------|---------|-----------|
|
||||
| T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
|
||||
| T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
|
||||
| T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
|
||||
| T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
|
||||
| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
|
||||
| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
|
||||
|
||||
### T2 Flow in Detail
|
||||
|
||||
1. T1 spawns **T2 Lead Architect** with goal + workstream context
|
||||
2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
|
||||
3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
|
||||
4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
|
||||
5. Specialists work in parallel, each within their defined domain
|
||||
6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
|
||||
7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
|
||||
8. T1 (Accept phase) validates canonical architecture against goal anchor
|
||||
9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
|
||||
|
||||
---
|
||||
|
||||
## Horizontal Scaling Within Tiers
|
||||
|
||||
```
|
||||
T1 — Phase 1: Plan (self-critique → Andrew approval)
|
||||
│
|
||||
├── T2: Lead Architect (boundaries + shared assumptions first)
|
||||
│ ├── T2: Backend Architect ─┐
|
||||
│ ├── T2: Frontend Architect ├─ parallel, within defined domains
|
||||
│ └── T2: Infra Architect ─┘
|
||||
│ │
|
||||
│ └── (Lead synthesises → conflict resolution if needed → canonical architecture)
|
||||
│
|
||||
├── T2 Backend Architect owns:
|
||||
│ ├── T3: API Squad Lead ─┐
|
||||
│ └── T3: DB Squad Lead ─┴─ light mesh within domain
|
||||
│ ├── T4: Worker A ─┐
|
||||
│ ├── T4: Worker B ─┼─ swarm / pipeline (T3 decides)
|
||||
│ └── T4: Worker C ─┘
|
||||
│ └── T5: Verifier(s) — fan-out + consensus
|
||||
│
|
||||
└── T1 — Phase 2: Accept (validates against goal anchor → PR)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Case Flows
|
||||
|
||||
T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
|
||||
|
||||
### Full Stack — T1→T2→T3→T4→T5
|
||||
*Complex feature, new product, cross-domain changes*
|
||||
|
||||
```
|
||||
T1 Plan
|
||||
→ assess complexity (high)
|
||||
→ output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
|
||||
→ self-critique pass
|
||||
→ GATE: surface to Andrew ← approval required
|
||||
|
||||
T2 Lead (spawned by runner after approval)
|
||||
→ receive: goal + full workplan
|
||||
→ publish: domain boundaries + shared assumptions doc → blackboard
|
||||
→ GATE (optional): review boundaries before specialists spawn
|
||||
|
||||
T2 Specialists (parallel fan-out, wait on Lead)
|
||||
→ each receives: their domain boundary + shared assumptions
|
||||
→ produce: architecture proposal for their slice
|
||||
→ Lead synthesises, drives conflict resolution if needed
|
||||
→ Lead writes: canonical architecture → blackboard
|
||||
→ GATE (recommended): review architecture before implementation
|
||||
|
||||
Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
|
||||
|
||||
T3s (light mesh within T2 domain)
|
||||
→ write draft task lists to blackboard
|
||||
→ read peers' lists, reconcile boundaries
|
||||
→ commit merged task plan before T4 dispatch
|
||||
→ GATE (optional): review task breakdown
|
||||
|
||||
T4s
|
||||
→ swarm: independent tasks run in parallel
|
||||
→ pipeline: T4-A output feeds T4-B (T3 declares dependencies)
|
||||
→ commit to feature branches
|
||||
|
||||
T5s (fan-out per T4 slice)
|
||||
→ each reviews its slice independently
|
||||
→ T3 collects results → joint verdict
|
||||
→ GATE (optional): review T5 verdict before T3 marks done
|
||||
→ partial: T3 retries only failed slices
|
||||
→ pass: T3 signals workstream done to T2
|
||||
|
||||
T2 specialists → signal T2 Lead
|
||||
T2 Lead → writes integration summary → blackboard
|
||||
|
||||
T1 Accept
|
||||
→ validate against goal anchor
|
||||
→ open PR, notify_adapter.send(pr summary + url)
|
||||
```
|
||||
|
||||
### Medium Complexity — T1→T3→T4→T5
|
||||
*Config change, isolated bug fix — T1 determines no cross-domain design needed*
|
||||
|
||||
```
|
||||
T1 Plan
|
||||
→ assess: contained scope, single domain, no T2 architecture needed
|
||||
→ workplan: tier paths [T3, T4, T5]
|
||||
→ GATE: Andrew approval
|
||||
|
||||
T3s spawned directly by runner
|
||||
→ receives T1 brief with task context (no T2 architecture layer)
|
||||
→ T3 light mesh → T4 dispatch → T5 verify → signal done
|
||||
|
||||
T1 Accept → PR
|
||||
```
|
||||
|
||||
### Simple / Hotfix — T1→T4→T5
|
||||
*Single file, single function, trivial atomic task*
|
||||
|
||||
```
|
||||
T1 Plan
|
||||
→ assess: trivial, single workstream
|
||||
→ tier path: [T4, T5]
|
||||
→ GATE: Andrew approval
|
||||
|
||||
T4 (coding agent)
|
||||
→ single atomic task, commits
|
||||
|
||||
T5 (single verifier, not full fan-out)
|
||||
→ code review + correctness check
|
||||
→ pass → T1 Accept → PR
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resolved Mechanics
|
||||
|
||||
### T3 Mesh via Blackboard
|
||||
|
||||
T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
|
||||
|
||||
1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
|
||||
2. Each T3 reads all sibling T3 draft lists in its T2 domain
|
||||
3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
|
||||
4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
|
||||
5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
|
||||
|
||||
The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
|
||||
|
||||
---
|
||||
|
||||
### T1 Plan Output Schema
|
||||
|
||||
T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
|
||||
|
||||
```json
|
||||
{
|
||||
"run_id": "uuid",
|
||||
"goal_anchor": "Original goal — immutable, propagated to every downstream brief",
|
||||
"complexity": "high | medium | low",
|
||||
"retry_budget_multiplier": 2,
|
||||
"workstreams": [
|
||||
{
|
||||
"id": "ws-backend-api",
|
||||
"name": "Backend API",
|
||||
"domain": "backend",
|
||||
"tier_path": ["t2", "t3", "t4", "t5"],
|
||||
"parallel_group": "A",
|
||||
"t2_specialist": "agents/engineering/engineering-software-architect.md",
|
||||
"notes": "Focus on webhook ingest and retry queue"
|
||||
}
|
||||
],
|
||||
"parallelism": {
|
||||
"groups": {
|
||||
"A": ["ws-backend-api", "ws-frontend"],
|
||||
"B": ["ws-infra"]
|
||||
},
|
||||
"sequence": ["A", "B"]
|
||||
},
|
||||
"self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
|
||||
}
|
||||
```
|
||||
|
||||
`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
|
||||
|
||||
---
|
||||
|
||||
### T5 Consensus & Verdict Schema
|
||||
|
||||
T3 aggregates all T5 results into a joint verdict after fan-out completes.
|
||||
|
||||
**Individual T5 result:**
|
||||
```json
|
||||
{
|
||||
"verifier_id": "uuid",
|
||||
"scope": "queue-client",
|
||||
"verdict": "pass | fail",
|
||||
"issues": ["issue description..."],
|
||||
"notes": "human-readable summary"
|
||||
}
|
||||
```
|
||||
|
||||
**T3 joint verdict (written to blackboard):**
|
||||
```json
|
||||
{
|
||||
"t5_results": [...],
|
||||
"joint_verdict": "pass | partial | fail",
|
||||
"failed_scopes": ["queue-client"],
|
||||
"summary": "Human-readable summary for gate surface and logs"
|
||||
}
|
||||
```
|
||||
|
||||
**Split verdict handling:**
|
||||
- `pass` → T3 marks workstream done, signals T2
|
||||
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
|
||||
- `fail` → T3 escalates to T2 (or T1 if shallow path)
|
||||
|
||||
---
|
||||
|
||||
### Spawn Call Ownership
|
||||
|
||||
The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
|
||||
|
||||
**Flow:**
|
||||
1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
|
||||
2. Runner's spawn loop detects pending rows
|
||||
3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
|
||||
4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
|
||||
5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
|
||||
|
||||
This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
|
||||
|
||||
---
|
||||
|
||||
### Gate Approval UX
|
||||
|
||||
**Core mechanic (platform-agnostic):**
|
||||
|
||||
1. Runner writes `gate_pending` to blackboard
|
||||
2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
|
||||
3. Runner polls blackboard for `gate_approved` or `gate_rejected`
|
||||
4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
|
||||
|
||||
Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
|
||||
|
||||
**Adapter responsibility:**
|
||||
Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
|
||||
|
||||
Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
|
||||
|
||||
---
|
||||
|
||||
### T3 Mesh Timeout
|
||||
|
||||
If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
|
||||
|
||||
1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
|
||||
|
||||
2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
|
||||
|
||||
Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
|
||||
|
||||
---
|
||||
|
||||
### Path Amendment Mechanism
|
||||
|
||||
When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
|
||||
|
||||
1. The discovering tier writes a `path_amendment` event to the blackboard:
|
||||
```json
|
||||
{
|
||||
"kind": "path_amendment",
|
||||
"proposed_by": "t3/ws-backend-api",
|
||||
"reason": "Discovered auth dependency requires T2 architectural pass",
|
||||
"amendment": {
|
||||
"workstream": "ws-backend-api",
|
||||
"add_tiers": ["t2"],
|
||||
"insert_before": "t3"
|
||||
}
|
||||
}
|
||||
```
|
||||
2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
|
||||
3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
|
||||
4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
|
||||
|
||||
No agent needs callback plumbing. The runner is the notification bridge.
|
||||
|
||||
---
|
||||
|
||||
## Shared State
|
||||
|
||||
For software pipelines, **the repo is the primary blackboard**:
|
||||
- T4 workers commit to feature branches
|
||||
- T3 leads review and merge to workstream branches
|
||||
- T2 architects own integration branches
|
||||
- T1 does final integration and acceptance
|
||||
|
||||
Supplemented by a SQLite coordination store per run tracking:
|
||||
- In-flight workstreams and their current execution plans
|
||||
- Handoff artifacts and tier status
|
||||
- Retry counts and escalation history
|
||||
- Path amendments (proposed, by whom, timestamp)
|
||||
|
||||
---
|
||||
|
||||
## Failure Handling
|
||||
|
||||
Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
|
||||
|
||||
| Failure | Owner | Handler | Action |
|
||||
|---------|-------|---------|--------|
|
||||
| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
|
||||
| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
|
||||
| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
|
||||
| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
|
||||
| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
|
||||
| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
|
||||
| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
|
||||
| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
|
||||
| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
|
||||
|
||||
**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
|
||||
|
||||
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
||||
|
||||
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
|
||||
|
||||
---
|
||||
|
||||
## Agent Talent Pool
|
||||
|
||||
The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
|
||||
|
||||
**Division of responsibility:**
|
||||
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
|
||||
- Agency-agents provides: the specialist knowledge each agent brings to its role
|
||||
|
||||
T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
|
||||
|
||||
**Default tier-to-specialist mapping for software pipelines:**
|
||||
|
||||
| Tier | Domain | Agent |
|
||||
|------|--------|-------|
|
||||
| T1 | Strategy | nexus-strategy |
|
||||
| T2 | Backend | software-architect |
|
||||
| T2 | Infra | devops-automator |
|
||||
| T2 | Data | data-engineer |
|
||||
| T3 | Backend | senior-developer |
|
||||
| T3 | Reliability | sre |
|
||||
| T4 | Frontend | frontend-developer |
|
||||
| T4 | Backend | backend-architect |
|
||||
| T4 | Database | database-optimizer |
|
||||
| T4 | DevOps | devops-automator |
|
||||
| T4 | Mobile | mobile-app-builder |
|
||||
| T4 | AI/ML | ai-engineer |
|
||||
| T4 | Security | security-engineer |
|
||||
| T4 | Docs | technical-writer |
|
||||
| T5 | Code review | code-reviewer |
|
||||
| T5 | Integration | testing-reality-checker |
|
||||
| T5 | API | testing-api-tester |
|
||||
| T5 | Performance | testing-performance-benchmarker |
|
||||
| T5 | Security | security-engineer |
|
||||
|
||||
The roster is not fixed — T1 can select any agent from the library based on workstream needs.
|
||||
|
||||
---
|
||||
|
||||
## Adapter Layers
|
||||
|
||||
Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
|
||||
|
||||
```
|
||||
Core (platform-agnostic)
|
||||
├── team_runner — thin bootstrap: spawn T1, monitor blackboard, handle result
|
||||
├── blackboard — SQLite coordination state
|
||||
├── task_brief — schema + validation
|
||||
└── escalation — retry logic, failure routing
|
||||
|
||||
Adapters (swappable)
|
||||
├── llm/ — anthropic (now), openai, ollama, any API
|
||||
├── notify/ — openclaw (now), slack, email, webhook...
|
||||
├── vcs/ — github (now), gitlab, gitea, bare git...
|
||||
└── runtime/
|
||||
├── standard — openclaw sessions_spawn (T1/T2/T3)
|
||||
└── coding_agent — claude_code (T4/T5 default), codex, aider...
|
||||
```
|
||||
|
||||
Swapping providers means writing a new adapter file — nothing in core changes.
|
||||
|
||||
T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
|
||||
|
||||
---
|
||||
|
||||
## Run Visibility Layer
|
||||
|
||||
Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
|
||||
|
||||
### 1. Human-Readable Live Log
|
||||
|
||||
Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
|
||||
|
||||
```
|
||||
[abc123] 12:30:01 T1 PLAN_START Assessing scope: "Build webhook ingestion system"
|
||||
[abc123] 12:30:14 T1 PLAN_DONE 3 workstreams — backend-api, infra, docs (2 parallel)
|
||||
[abc123] 12:30:14 GATE APPROVAL ⏸ Waiting on approval before T2 spawns
|
||||
[abc123] 12:31:02 GATE APPROVED ✓ Approved — continuing
|
||||
[abc123] 12:31:03 T2 LEAD_START Lead Architect spawned
|
||||
[abc123] 12:31:41 T2 BOUNDS_READY Domain boundaries + shared assumptions published
|
||||
[abc123] 12:31:42 T2 SPEC_START 3 specialists spawned (parallel): backend, infra, docs
|
||||
[abc123] 12:32:15 T2 SPEC_DONE backend-api architecture draft ready
|
||||
[abc123] 12:32:58 T2 SYNTH_DONE Canonical architecture written to blackboard
|
||||
[abc123] 12:32:58 GATE INSPECTION ⏸ T2 synthesis ready for review
|
||||
[abc123] 12:33:44 T3 MESH_START backend-api: 2 squad leads negotiating task boundaries
|
||||
[abc123] 12:34:01 T3 MESH_DONE Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
|
||||
[abc123] 12:34:02 T4 SWARM_START 5 workers spawned in parallel
|
||||
[abc123] 12:35:10 T4 DONE worker-3 auth-middleware ✓
|
||||
[abc123] 12:35:22 T4 FAIL worker-4 queue-client ✗ (retry 1/3)
|
||||
[abc123] 12:36:04 T4 DONE worker-4 queue-client ✓ (retry resolved)
|
||||
[abc123] 12:36:05 T5 VERIFY_START 4 verifiers spawned
|
||||
[abc123] 12:36:45 T5 VERDICT partial — queue-client needs rework
|
||||
[abc123] 12:37:12 T5 VERDICT ✓ all pass — workstream backend-api done
|
||||
```
|
||||
|
||||
Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
|
||||
|
||||
### 2. Inspection Gates
|
||||
|
||||
Configurable pause points. When the runner hits a gate, it:
|
||||
1. Writes a `gate_pending` event to the blackboard
|
||||
2. Fires `notify_adapter.send()` with the tier summary + gate context
|
||||
3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
|
||||
|
||||
The tier summary surfaced at each gate includes:
|
||||
- **What was produced** (the tier artifact in readable form)
|
||||
- **What happens next** (which agents will spawn, doing what)
|
||||
- **Any anomalies** flagged by the tier itself
|
||||
|
||||
Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
|
||||
|
||||
```yaml
|
||||
visibility:
|
||||
strict_mode: false
|
||||
log_level: normal # normal | verbose
|
||||
inspection_gates:
|
||||
t1_plan: true # always — required by design
|
||||
t2_lead: false # optional — review boundaries before specialists
|
||||
t2_synthesis: true # recommended — review architecture before implementation
|
||||
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||
gate_timeout_minutes: 60 # auto-reject if no response within this window
|
||||
```
|
||||
|
||||
### 3. Inspection CLI — `cli/agency.py`
|
||||
|
||||
```
|
||||
agency run <config.yaml> # start a run, returns run_id
|
||||
agency watch <run_id> # tail live log (follows blackboard events)
|
||||
agency inspect <run_id> # interactive tree view of run state
|
||||
agency inspect <run_id> --tier t2 # jump to T2 artifacts
|
||||
agency inspect <run_id> --brief <id> # show full brief + result JSON
|
||||
|
||||
agency approve <run_id> # approve current gate → continue
|
||||
agency approve <run_id> --note "..." # approve with a note written to blackboard
|
||||
agency reject <run_id> --reason "..." # reject → tier re-invoked
|
||||
agency pause <run_id> # force-pause at next tier boundary
|
||||
agency resume <run_id> # release a manual pause
|
||||
```
|
||||
|
||||
`agency inspect` (no flags) renders a live tree:
|
||||
```
|
||||
Run abc123 — "Build webhook ingestion system"
|
||||
├── T1 Plan ✓
|
||||
│ └── [view workplan]
|
||||
├── T2 Architecture ✓ [GATE: pending review]
|
||||
│ ├── [view domain boundaries]
|
||||
│ ├── [view shared assumptions]
|
||||
│ └── [view canonical architecture]
|
||||
├── T3 backend-api (active)
|
||||
│ ├── [view task breakdown]
|
||||
│ └── T4 workers: 3/7 done, 1 retrying, 3 pending
|
||||
└── T3 infra (pending)
|
||||
```
|
||||
|
||||
### Blackboard Event Vocabulary (extended)
|
||||
|
||||
```python
|
||||
# existing
|
||||
"spawned" | "completed" | "failed" | "escalated" | "retried"
|
||||
|
||||
# new — visibility layer
|
||||
"gate_pending" # runner hit a gate, waiting for human
|
||||
"gate_approved" # human approved, run continues
|
||||
"gate_rejected" # human rejected, tier re-invoked
|
||||
"gate_paused" # manual pause via CLI
|
||||
"gate_resumed" # manual resume via CLI
|
||||
"path_amendment" # mid-run tier proposed path change
|
||||
"log" # human-readable log line (level + message)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decisions Log
|
||||
|
||||
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
||||
|
||||
**T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
|
||||
|
||||
**T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
|
||||
|
||||
**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
|
||||
|
||||
**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
|
||||
|
||||
**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
|
||||
|
||||
**T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
|
||||
|
||||
**T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
|
||||
|
||||
**T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
|
||||
|
||||
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
|
||||
|
||||
**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
|
||||
|
||||
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
||||
|
||||
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
|
||||
|
||||
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
|
||||
|
||||
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
||||
|
||||
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|
||||
|
||||
**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
|
||||
|
||||
**Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
|
||||
|
||||
**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
|
||||
|
||||
**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
|
||||
|
||||
**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
|
||||
|
||||
**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
|
||||
|
||||
**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
|
||||
|
||||
**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
|
||||
|
||||
**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
|
||||
@@ -10,6 +10,9 @@ pyyaml
|
||||
# Environment variable management
|
||||
python-dotenv
|
||||
|
||||
# GitHub VCS adapter
|
||||
PyGithub
|
||||
|
||||
# --- stdlib-only (no pip install needed) ---
|
||||
# sqlite3 — blackboard persistence
|
||||
# dataclasses — task_brief schema
|
||||
|
||||
Reference in New Issue
Block a user