Compare commits
17 Commits
5b0d00a799
...
hans/team-
| Author | SHA1 | Date | |
|---|---|---|---|
| 86d0b34462 | |||
| c5dabf41f4 | |||
| 8994f87a43 | |||
| 641f122cdb | |||
| 54afa0f53f | |||
| f228061c4d | |||
| 1c99e40f98 | |||
| 8f143e779d | |||
| a721db63f6 | |||
| 882b769d21 | |||
| ce3c020de2 | |||
| b54436f474 | |||
| 1ed7023c08 | |||
| 9efbb3b010 | |||
| 72bd744664 | |||
| 084cfb0bb2 | |||
| ce1ce85b87 |
2
.gitmodules
vendored
2
.gitmodules
vendored
@@ -1,3 +1,3 @@
|
|||||||
[submodule "agents"]
|
[submodule "agents"]
|
||||||
path = agents
|
path = agents
|
||||||
url = https://github.com/coding-with-hans-heinemann/agency-agents.git
|
url = https://git.tandrewng.com/cw-hans/agency-agents.git
|
||||||
|
|||||||
48
CLAUDE.md
Normal file
48
CLAUDE.md
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
# CLAUDE.md — Agent Quick Reference
|
||||||
|
|
||||||
|
Read this before exploring the codebase. It saves tokens.
|
||||||
|
|
||||||
|
## What This Is
|
||||||
|
|
||||||
|
A tiered multi-agent orchestration framework. T1 decomposes goals → T2 architects → T3 leads → T4 implements → T5 verifies. SQLite blackboard tracks state. All external dependencies (LLM, VCS, notify, runtime) are pluggable adapters.
|
||||||
|
|
||||||
|
## Key Docs
|
||||||
|
|
||||||
|
- `docs/design.md` — architecture decisions, tier design, key choices
|
||||||
|
- `docs/buildspec.md` — 15-step build order, phase breakdown
|
||||||
|
|
||||||
|
## Project Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
core/ — task_brief.py, blackboard.py, escalation.py, team_runner.py
|
||||||
|
adapters/base/ — abstract base classes (LLMAdapter, VCSAdapter, NotifyAdapter, RuntimeAdapter)
|
||||||
|
adapters/llm/ — anthropic.py
|
||||||
|
adapters/vcs/ — github.py
|
||||||
|
adapters/notify/— openclaw.py
|
||||||
|
adapters/runtime— openclaw.py, claude_code.py
|
||||||
|
prompts/ — T1–T5 system prompt .md files
|
||||||
|
config/ — team.yaml (run config), role_registry.yaml (tier→role→persona)
|
||||||
|
agents/ — git submodule, agent persona .md files
|
||||||
|
runs/ — per-run blackboard.db files (gitignored)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
- **Never commit or push directly to `main`** — always branch (`hans/...` or `feature/...`) and PR
|
||||||
|
- New adapters: subclass the relevant `adapters/base/*.py` abstract class
|
||||||
|
- New roles: add persona `.md` to `agents/` submodule + entry in `config/role_registry.yaml`
|
||||||
|
- Failure handling lives in `core/escalation.py` — extend `FailureType` there
|
||||||
|
- `TaskBrief` is the canonical work unit — all tiers pass briefs to each other
|
||||||
|
- Blackboard is the single source of truth per run — always write events there
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
Phase 2 adapter implementations exist. `core/team_runner.py` may still have stubs — check before assuming it's wired up end-to-end.
|
||||||
|
|
||||||
|
## Running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv .venv && source .venv/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
python -m core.team_runner --config config/team.yaml
|
||||||
|
```
|
||||||
@@ -1,16 +1,15 @@
|
|||||||
"""
|
"""
|
||||||
adapters/llm/anthropic.py
|
adapters/llm/anthropic.py
|
||||||
Anthropic Claude adapter — Phase 2 stub.
|
Anthropic Claude LLM adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Uses the ``anthropic`` SDK to call Claude models. Model selection is driven
|
||||||
- Implement complete() using the anthropic SDK (anthropic.Anthropic client).
|
by the capability_map in team.yaml so the adapter stays provider-agnostic in
|
||||||
- Implement resolve_model() by reading config/team.yaml capability_map.
|
configuration.
|
||||||
- Handle streaming responses, rate-limit retries, and token counting.
|
|
||||||
- Support system-prompt injection via context["system_prompt"].
|
|
||||||
- Map capability → model using the provider's capability_map config.
|
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
from adapters.base.llm import LLMAdapter
|
from adapters.base.llm import LLMAdapter
|
||||||
|
|
||||||
|
|
||||||
@@ -18,27 +17,123 @@ class AnthropicAdapter(LLMAdapter):
|
|||||||
"""
|
"""
|
||||||
LLM adapter for Anthropic Claude models.
|
LLM adapter for Anthropic Claude models.
|
||||||
|
|
||||||
Reads model configuration from config/team.yaml:
|
Reads model configuration from the loaded team.yaml config dict::
|
||||||
models.provider: anthropic
|
|
||||||
models.capability_map.reasoning-heavy.anthropic: claude-opus-4-6
|
models:
|
||||||
models.capability_map.capable.anthropic: claude-sonnet-4-6
|
default_max_tokens: 4096 # fallback max_tokens for all calls
|
||||||
models.capability_map.fast-cheap.anthropic: claude-haiku-3-5
|
default_temperature: 0 # fallback temperature for all calls
|
||||||
|
capability_map:
|
||||||
|
reasoning-heavy:
|
||||||
|
anthropic: claude-opus-4-6
|
||||||
|
capable:
|
||||||
|
anthropic: claude-sonnet-4-6
|
||||||
|
fast-cheap:
|
||||||
|
anthropic: claude-haiku-3-5
|
||||||
|
|
||||||
|
The provider key used when looking up ``capability_map`` is hardcoded to
|
||||||
|
``"anthropic"`` — the adapter knows its own provider; there is no need for
|
||||||
|
a separate ``models.provider`` config field.
|
||||||
|
|
||||||
|
Both ``default_max_tokens`` and ``default_temperature`` can be overridden
|
||||||
|
per-call via the ``context`` dict passed to :meth:`complete`.
|
||||||
|
|
||||||
|
Environment variables
|
||||||
|
---------------------
|
||||||
|
ANTHROPIC_API_KEY : Required. Authenticates with the Anthropic API.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract API key from environment (ANTHROPIC_API_KEY).
|
Initialise the Anthropic adapter.
|
||||||
# Initialise the anthropic.Anthropic() client.
|
|
||||||
raise NotImplementedError("AnthropicAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
ValueError
|
||||||
|
If ANTHROPIC_API_KEY is not set in the environment.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import anthropic as _anthropic
|
||||||
|
except ModuleNotFoundError as exc:
|
||||||
|
raise ImportError(
|
||||||
|
"The 'anthropic' package is required for AnthropicAdapter. "
|
||||||
|
"Install it with: pip install anthropic"
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
self._config = config
|
||||||
|
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
||||||
|
if not api_key:
|
||||||
|
raise ValueError(
|
||||||
|
"ANTHROPIC_API_KEY environment variable is not set. "
|
||||||
|
"Export it before running the-agency."
|
||||||
|
)
|
||||||
|
self._client = _anthropic.Anthropic(api_key=api_key)
|
||||||
|
self._models_cfg: dict = config.get("models", {})
|
||||||
|
self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
|
||||||
|
self._default_temperature: float = self._models_cfg.get("default_temperature", 0)
|
||||||
|
|
||||||
def complete(self, prompt: str, capability: str, context: dict) -> str:
|
def complete(self, prompt: str, capability: str, context: dict) -> str:
|
||||||
# TODO (Phase 2): Call anthropic client messages.create().
|
"""
|
||||||
# Use resolve_model(capability) to pick the model.
|
Send a prompt to a Claude model and return the text response.
|
||||||
# Support context keys: system_prompt, max_tokens, temperature.
|
|
||||||
# Return response text as a plain string.
|
Parameters
|
||||||
raise NotImplementedError("AnthropicAdapter.complete is not yet implemented.")
|
----------
|
||||||
|
prompt : User-role prompt content.
|
||||||
|
capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
|
||||||
|
context : Optional per-call overrides:
|
||||||
|
system_prompt (str) — prepended as the system turn.
|
||||||
|
max_tokens (int) — defaults to models.default_max_tokens in team.yaml.
|
||||||
|
temperature (float) — defaults to models.default_temperature in team.yaml.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
The model's text completion as a plain string.
|
||||||
|
"""
|
||||||
|
model = self.resolve_model(capability)
|
||||||
|
max_tokens: int = context.get("max_tokens", self._default_max_tokens)
|
||||||
|
temperature: float = context.get("temperature", self._default_temperature)
|
||||||
|
system_prompt: str = context.get("system_prompt", "")
|
||||||
|
|
||||||
|
create_kwargs: dict = {
|
||||||
|
"model": model,
|
||||||
|
"max_tokens": max_tokens,
|
||||||
|
"messages": [{"role": "user", "content": prompt}],
|
||||||
|
}
|
||||||
|
if system_prompt:
|
||||||
|
create_kwargs["system"] = system_prompt
|
||||||
|
if temperature != 0.0:
|
||||||
|
create_kwargs["temperature"] = temperature
|
||||||
|
|
||||||
|
response = self._client.messages.create(**create_kwargs)
|
||||||
|
return response.content[0].text
|
||||||
|
|
||||||
def resolve_model(self, capability: str) -> str:
|
def resolve_model(self, capability: str) -> str:
|
||||||
# TODO (Phase 2): Look up capability in team.yaml capability_map.
|
"""
|
||||||
# Fall back to "capable" tier model if capability is unknown.
|
Map a capability string to the Anthropic model identifier.
|
||||||
raise NotImplementedError("AnthropicAdapter.resolve_model is not yet implemented.")
|
|
||||||
|
Looks up ``config.models.capability_map[capability][provider]``.
|
||||||
|
Falls back to the "capable" tier model if the capability is unknown.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
Anthropic model identifier (e.g. "claude-opus-4-6").
|
||||||
|
"""
|
||||||
|
# The adapter knows its own provider — no need to read it from config.
|
||||||
|
cap_map: dict = self._models_cfg.get("capability_map", {})
|
||||||
|
|
||||||
|
if capability in cap_map and "anthropic" in cap_map[capability]:
|
||||||
|
return cap_map[capability]["anthropic"]
|
||||||
|
|
||||||
|
# Fall back to "capable" tier
|
||||||
|
if "capable" in cap_map and "anthropic" in cap_map["capable"]:
|
||||||
|
return cap_map["capable"]["anthropic"]
|
||||||
|
|
||||||
|
# Hard-coded last resort
|
||||||
|
return "claude-sonnet-4-6"
|
||||||
|
|||||||
@@ -1,35 +1,93 @@
|
|||||||
"""
|
"""
|
||||||
adapters/notify/openclaw.py
|
adapters/notify/openclaw.py
|
||||||
OpenClaw notification adapter — Phase 2 stub.
|
OpenClaw notification adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Sends notifications by shelling out to the ``openclaw`` CLI::
|
||||||
- Implement send() to dispatch notifications via the OpenClaw API.
|
|
||||||
- Support context keys: channel, severity, run_id, brief_id.
|
openclaw system event --text "<message>" --mode now
|
||||||
- Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
|
|
||||||
- Handle rate limiting and delivery retries.
|
If the binary is not on PATH the method logs a warning and returns without
|
||||||
|
raising — notifications are best-effort and should never crash the pipeline.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
|
||||||
from adapters.base.notify import NotifyAdapter
|
from adapters.base.notify import NotifyAdapter
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
class OpenClawNotifyAdapter(NotifyAdapter):
|
class OpenClawNotifyAdapter(NotifyAdapter):
|
||||||
"""
|
"""
|
||||||
Notification adapter that sends messages via OpenClaw.
|
Notification adapter that dispatches messages via the ``openclaw`` CLI.
|
||||||
|
|
||||||
Expects environment variables:
|
Environment variables
|
||||||
OPENCLAW_API_KEY — authentication token
|
---------------------
|
||||||
OPENCLAW_URL — base URL for the OpenClaw API (optional, defaults to hosted)
|
OPENCLAW_SIGNAL_NUMBER : Optional. Direct signal target for OpenClaw sends.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
|
Initialise the OpenClaw notification adapter.
|
||||||
# Initialise an HTTP client (e.g. httpx or requests).
|
|
||||||
raise NotImplementedError("OpenClawNotifyAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict (reserved for future options).
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
self._signal_number: str = os.environ.get("OPENCLAW_SIGNAL_NUMBER", "")
|
||||||
|
|
||||||
def send(self, message: str, context: dict) -> None:
|
def send(self, message: str, context: dict) -> None:
|
||||||
# TODO (Phase 2): POST notification payload to OpenClaw API.
|
"""
|
||||||
# Include message, context (channel, severity, run_id, brief_id).
|
Send a notification via ``openclaw system event``.
|
||||||
# Log delivery confirmation or raise on failure.
|
|
||||||
raise NotImplementedError("OpenClawNotifyAdapter.send is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
message : Human-readable notification text.
|
||||||
|
context : Optional metadata. Recognised keys:
|
||||||
|
level (str) — "info" | "warning" | "error"; logged locally.
|
||||||
|
run_id (str) — included in the local log record.
|
||||||
|
brief_id (str) — included in the local log record.
|
||||||
|
|
||||||
|
Notes
|
||||||
|
-----
|
||||||
|
If the ``openclaw`` binary is not present on PATH, the method logs a
|
||||||
|
warning and returns silently. Notifications are best-effort.
|
||||||
|
"""
|
||||||
|
level: str = context.get("level", "info")
|
||||||
|
run_id: str = context.get("run_id", "")
|
||||||
|
brief_id: str = context.get("brief_id", "")
|
||||||
|
|
||||||
|
# Always log locally regardless of CLI availability.
|
||||||
|
log_msg = "[notify:%s] %s (run=%s brief=%s)" % (level, message, run_id, brief_id)
|
||||||
|
if level == "error":
|
||||||
|
logger.error(log_msg)
|
||||||
|
elif level == "warning":
|
||||||
|
logger.warning(log_msg)
|
||||||
|
else:
|
||||||
|
logger.info(log_msg)
|
||||||
|
|
||||||
|
cmd = ["openclaw", "system", "event", "--text", message, "--mode", "now"]
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
cmd,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if result.returncode != 0:
|
||||||
|
logger.warning(
|
||||||
|
"openclaw event returned non-zero exit %d: %s",
|
||||||
|
result.returncode,
|
||||||
|
result.stderr.strip(),
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
logger.warning(
|
||||||
|
"openclaw CLI not found on PATH; notification not delivered: %s",
|
||||||
|
message,
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.warning("openclaw event timed out for message: %s", message)
|
||||||
|
|||||||
@@ -1,51 +1,163 @@
|
|||||||
"""
|
"""
|
||||||
adapters/runtime/claude_code.py
|
adapters/runtime/claude_code.py
|
||||||
Claude Code agent runtime adapter — Phase 2 stub.
|
Claude Code sub-agent runtime adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Spawns the ``claude`` CLI as a non-interactive subprocess for T4/T5
|
||||||
- Implement spawn() to launch a Claude Code sub-agent via the Agent SDK.
|
implementation tasks::
|
||||||
- Implement get_result() to await agent completion and parse the output.
|
|
||||||
- Implement kill() to terminate the sub-agent process or session.
|
claude --permission-mode bypassPermissions --print "<task>"
|
||||||
- Map task brief context (files, constraints, artifacts) into the agent's
|
|
||||||
system prompt and tool context.
|
Each spawned process is tracked by a UUID job_id so callers can later poll
|
||||||
- Handle Claude Code tool-use responses and extract structured output.
|
for the result or terminate the job. Stdout is captured and returned as the
|
||||||
|
agent output; stderr is included for debugging.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
import tempfile
|
||||||
|
import threading
|
||||||
|
import uuid
|
||||||
|
|
||||||
from adapters.base.runtime import RuntimeAdapter
|
from adapters.base.runtime import RuntimeAdapter
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
class ClaudeCodeRuntimeAdapter(RuntimeAdapter):
|
class ClaudeCodeRuntimeAdapter(RuntimeAdapter):
|
||||||
"""
|
"""
|
||||||
Runtime adapter that spawns Claude Code sub-agents for coding tasks.
|
Runtime adapter that spawns ``claude`` CLI sub-agents for coding tasks.
|
||||||
|
|
||||||
Used when a TaskBrief has preferred_runtime == "coding_agent".
|
Credentials are inherited from the environment (``ANTHROPIC_API_KEY``).
|
||||||
|
The ``claude`` CLI must be installed and reachable on PATH.
|
||||||
|
|
||||||
Expects the Claude Code CLI / Agent SDK to be available in the environment.
|
Used when a TaskBrief has ``preferred_runtime == "coding_agent"``.
|
||||||
Credentials are inherited from the environment (ANTHROPIC_API_KEY).
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Validate that Claude Code CLI or SDK is accessible.
|
Initialise the Claude Code runtime adapter.
|
||||||
# Initialise any agent session management state.
|
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict (reserved for future options).
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
# Maps job_id → running Popen instance.
|
||||||
|
self._jobs: dict[str, subprocess.Popen] = {}
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# RuntimeAdapter interface
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def spawn(self, task: str, capability: str, context: dict) -> str:
|
def spawn(self, task: str, capability: str, context: dict) -> str:
|
||||||
# TODO (Phase 2): Launch a Claude Code sub-agent.
|
"""
|
||||||
# Compose a structured system prompt from task + context.
|
Launch ``claude --permission-mode bypassPermissions --print "<task>"``
|
||||||
# Inject relevant files and constraints as tool context.
|
as a non-interactive subprocess.
|
||||||
# Return an agent_id that maps to a running agent session.
|
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.spawn is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
task : Full task description (typically a JSON-serialised brief).
|
||||||
|
capability : Capability hint (not forwarded; Claude Code resolves its
|
||||||
|
own model from the local environment).
|
||||||
|
context : Optional keys:
|
||||||
|
workdir (str) — cwd for the subprocess. A fresh
|
||||||
|
temporary directory is created if omitted.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
A UUID job_id string that uniquely identifies this subprocess.
|
||||||
|
"""
|
||||||
|
workdir: str = context.get("workdir") or tempfile.mkdtemp(
|
||||||
|
prefix="agency-claude-"
|
||||||
|
)
|
||||||
|
job_id = str(uuid.uuid4())
|
||||||
|
logger.info("Spawning Claude Code job %s in %s", job_id, workdir)
|
||||||
|
|
||||||
|
proc = subprocess.Popen(
|
||||||
|
["claude", "--permission-mode", "bypassPermissions", "--print", task],
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
stderr=subprocess.PIPE,
|
||||||
|
text=True,
|
||||||
|
cwd=workdir,
|
||||||
|
)
|
||||||
|
|
||||||
|
with self._lock:
|
||||||
|
self._jobs[job_id] = proc
|
||||||
|
|
||||||
|
return job_id
|
||||||
|
|
||||||
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
||||||
# TODO (Phase 2): Await the Claude Code agent session to complete.
|
"""
|
||||||
# Parse the agent's final message for structured JSON output.
|
Wait for the Claude Code subprocess to complete and return its output.
|
||||||
# Return dict with: {"status": ..., "output": ..., "artifacts": [...]}.
|
|
||||||
# Raise TimeoutError if timeout_s elapses.
|
Parameters
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.get_result is not yet implemented.")
|
----------
|
||||||
|
agent_id : Job id returned by spawn().
|
||||||
|
timeout_s : Maximum seconds to wait before raising TimeoutError.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
dict with keys:
|
||||||
|
status ("completed" | "failed")
|
||||||
|
output (str — full stdout)
|
||||||
|
artifacts (list — always empty; callers must parse output)
|
||||||
|
stderr (str — full stderr)
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
KeyError
|
||||||
|
If agent_id does not correspond to a known job.
|
||||||
|
TimeoutError
|
||||||
|
If the subprocess does not finish within timeout_s seconds.
|
||||||
|
"""
|
||||||
|
with self._lock:
|
||||||
|
proc = self._jobs.get(agent_id)
|
||||||
|
|
||||||
|
if proc is None:
|
||||||
|
raise KeyError(f"No Claude Code job found for agent_id={agent_id!r}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
stdout, stderr = proc.communicate(timeout=timeout_s)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
proc.kill()
|
||||||
|
stdout, stderr = proc.communicate()
|
||||||
|
raise TimeoutError(
|
||||||
|
f"Claude Code job {agent_id!r} did not complete within {timeout_s}s."
|
||||||
|
)
|
||||||
|
|
||||||
|
status = "completed" if proc.returncode == 0 else "failed"
|
||||||
|
logger.info(
|
||||||
|
"Claude Code job %s finished: status=%s returncode=%d",
|
||||||
|
agent_id,
|
||||||
|
status,
|
||||||
|
proc.returncode,
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": status,
|
||||||
|
"output": stdout,
|
||||||
|
"artifacts": [],
|
||||||
|
"stderr": stderr,
|
||||||
|
}
|
||||||
|
|
||||||
def kill(self, agent_id: str) -> None:
|
def kill(self, agent_id: str) -> None:
|
||||||
# TODO (Phase 2): Terminate the Claude Code agent session.
|
"""
|
||||||
# Clean up any temporary files or session state.
|
Terminate a running Claude Code subprocess.
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.kill is not yet implemented.")
|
|
||||||
|
Silently succeeds if the job has already finished or the id is unknown.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
agent_id : Job id returned by spawn().
|
||||||
|
"""
|
||||||
|
with self._lock:
|
||||||
|
proc = self._jobs.get(agent_id)
|
||||||
|
|
||||||
|
if proc is not None:
|
||||||
|
try:
|
||||||
|
proc.terminate()
|
||||||
|
logger.info("Terminated Claude Code job %s", agent_id)
|
||||||
|
except OSError:
|
||||||
|
pass # Process already gone — that is fine.
|
||||||
|
|||||||
@@ -1,48 +1,241 @@
|
|||||||
"""
|
"""
|
||||||
adapters/runtime/openclaw.py
|
adapters/runtime/openclaw.py
|
||||||
OpenClaw agent runtime adapter — Phase 2 stub.
|
OpenClaw agent runtime adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Spawns sub-agents by shelling out to the ``openclaw`` CLI::
|
||||||
- Implement spawn() to submit a task to an OpenClaw worker pool.
|
|
||||||
- Implement get_result() to poll or subscribe for agent completion.
|
openclaw session spawn --task "<task>" --mode run
|
||||||
- Implement kill() to cancel a running OpenClaw agent job.
|
openclaw session get <session_id>
|
||||||
- Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
|
openclaw session kill <session_id>
|
||||||
- Map capability hint to an appropriate worker class/queue.
|
|
||||||
|
If the ``openclaw`` binary is unavailable, all methods raise
|
||||||
|
``NotImplementedError`` with a helpful message rather than crashing with a
|
||||||
|
raw ``FileNotFoundError``.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
|
||||||
from adapters.base.runtime import RuntimeAdapter
|
from adapters.base.runtime import RuntimeAdapter
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Status strings from the openclaw CLI that indicate a session has finished.
|
||||||
|
_TERMINAL_STATUSES = frozenset(
|
||||||
|
{"done", "completed", "failed", "partial", "blocked", "error"}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class OpenClawRuntimeAdapter(RuntimeAdapter):
|
class OpenClawRuntimeAdapter(RuntimeAdapter):
|
||||||
"""
|
"""
|
||||||
Runtime adapter that dispatches agent tasks to OpenClaw workers.
|
Runtime adapter that dispatches agent tasks to OpenClaw worker sessions.
|
||||||
|
|
||||||
Expects environment variables:
|
All interactions use the ``openclaw`` CLI. No additional credentials are
|
||||||
OPENCLAW_API_KEY — authentication token
|
required beyond what OpenClaw manages in the local environment.
|
||||||
OPENCLAW_URL — base URL for the OpenClaw API
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
|
Initialise the OpenClaw runtime adapter.
|
||||||
# Initialise HTTP client and any job-tracking state.
|
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict (reserved for future options).
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# RuntimeAdapter interface
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def spawn(self, task: str, capability: str, context: dict) -> str:
|
def spawn(self, task: str, capability: str, context: dict) -> str:
|
||||||
# TODO (Phase 2): Submit task to OpenClaw worker pool.
|
"""
|
||||||
# Map capability ("reasoning-heavy" | "capable" | "fast-cheap") to
|
Spawn an OpenClaw agent session for the given task.
|
||||||
# an appropriate worker queue or model hint.
|
|
||||||
# Return an agent_id string that can be used to poll for results.
|
Parameters
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.spawn is not yet implemented.")
|
----------
|
||||||
|
task : Natural-language task description.
|
||||||
|
capability : Capability hint ("reasoning-heavy" | "capable" | "fast-cheap").
|
||||||
|
Passed informally; actual routing is handled by OpenClaw.
|
||||||
|
context : Arbitrary context bag (currently unused by this adapter).
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
session_id string parsed from the CLI output.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
NotImplementedError
|
||||||
|
If the ``openclaw`` CLI is not available on PATH.
|
||||||
|
RuntimeError
|
||||||
|
If the session_id cannot be parsed from the CLI output.
|
||||||
|
"""
|
||||||
|
# TODO: map capability to an openclaw worker tier / model hint if the
|
||||||
|
# openclaw CLI gains that flag in a future release.
|
||||||
|
cmd = ["openclaw", "session", "spawn", "--task", task, "--mode", "run"]
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
cmd,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise NotImplementedError(
|
||||||
|
"openclaw CLI not found on PATH. "
|
||||||
|
"Install OpenClaw or configure a different runtime adapter "
|
||||||
|
"(e.g. adapters.runtime.claude_code.ClaudeCodeRuntimeAdapter)."
|
||||||
|
)
|
||||||
|
except subprocess.CalledProcessError as exc:
|
||||||
|
raise RuntimeError(
|
||||||
|
f"openclaw session spawn failed (exit {exc.returncode}): "
|
||||||
|
f"{exc.stderr.strip()}"
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
return self._parse_session_id(result.stdout)
|
||||||
|
|
||||||
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
||||||
# TODO (Phase 2): Poll or long-poll the OpenClaw API for job completion.
|
"""
|
||||||
# Raise TimeoutError if timeout_s elapses before the job finishes.
|
Poll ``openclaw session get`` until the session reaches a terminal
|
||||||
# Return a dict with at minimum: {"status": ..., "output": ..., "artifacts": [...]}.
|
state or *timeout_s* seconds elapse.
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.get_result is not yet implemented.")
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
agent_id : Session ID returned by spawn().
|
||||||
|
timeout_s : Maximum seconds to wait before raising TimeoutError.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
dict with keys: ``status``, ``output``, ``artifacts``.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
TimeoutError
|
||||||
|
If the session does not finish within timeout_s seconds.
|
||||||
|
NotImplementedError
|
||||||
|
If the ``openclaw`` CLI is not available on PATH.
|
||||||
|
"""
|
||||||
|
deadline = time.monotonic() + timeout_s
|
||||||
|
poll_interval = 2.0
|
||||||
|
|
||||||
|
while time.monotonic() < deadline:
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["openclaw", "session", "get", agent_id],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise NotImplementedError(
|
||||||
|
"openclaw CLI not found on PATH. "
|
||||||
|
"Install OpenClaw or switch to a different runtime adapter."
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.debug("openclaw session get timed out; will retry")
|
||||||
|
time.sleep(poll_interval)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if result.returncode == 0 and result.stdout.strip():
|
||||||
|
parsed = self._parse_get_output(result.stdout)
|
||||||
|
if parsed.get("status", "").lower() in _TERMINAL_STATUSES:
|
||||||
|
return parsed
|
||||||
|
else:
|
||||||
|
logger.debug(
|
||||||
|
"openclaw session get returned exit=%d; retrying. stderr=%s",
|
||||||
|
result.returncode,
|
||||||
|
result.stderr.strip(),
|
||||||
|
)
|
||||||
|
|
||||||
|
time.sleep(poll_interval)
|
||||||
|
|
||||||
|
raise TimeoutError(
|
||||||
|
f"Agent {agent_id!r} did not complete within {timeout_s}s."
|
||||||
|
)
|
||||||
|
|
||||||
def kill(self, agent_id: str) -> None:
|
def kill(self, agent_id: str) -> None:
|
||||||
# TODO (Phase 2): Send a cancellation request to the OpenClaw API.
|
"""
|
||||||
# Silently succeed if the agent has already finished.
|
Terminate an OpenClaw session unconditionally.
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.kill is not yet implemented.")
|
|
||||||
|
Silently succeeds if the session has already finished.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
agent_id : Session ID returned by spawn().
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
NotImplementedError
|
||||||
|
If the ``openclaw`` CLI is not available on PATH.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
subprocess.run(
|
||||||
|
["openclaw", "session", "kill", agent_id],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise NotImplementedError(
|
||||||
|
"openclaw CLI not found on PATH. "
|
||||||
|
"Install OpenClaw or switch to a different runtime adapter."
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.warning("openclaw session kill timed out for agent %s", agent_id)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Private helpers
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _parse_session_id(self, output: str) -> str:
|
||||||
|
"""Extract a session_id from the raw stdout of ``openclaw session spawn``."""
|
||||||
|
output = output.strip()
|
||||||
|
|
||||||
|
# Prefer structured JSON output.
|
||||||
|
try:
|
||||||
|
data = json.loads(output)
|
||||||
|
for key in ("session_id", "sessionId", "id"):
|
||||||
|
if key in data:
|
||||||
|
return str(data[key])
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Regex: look for "session_id: <id>" or similar.
|
||||||
|
m = re.search(
|
||||||
|
r"(?:session[_\s]?id|sessionId)[:\s]+([a-zA-Z0-9_\-]+)",
|
||||||
|
output,
|
||||||
|
re.IGNORECASE,
|
||||||
|
)
|
||||||
|
if m:
|
||||||
|
return m.group(1)
|
||||||
|
|
||||||
|
# Last resort: return the first non-empty line.
|
||||||
|
lines = [ln.strip() for ln in output.splitlines() if ln.strip()]
|
||||||
|
if lines:
|
||||||
|
return lines[0]
|
||||||
|
|
||||||
|
raise RuntimeError(
|
||||||
|
f"Could not parse session_id from openclaw output: {output!r}"
|
||||||
|
)
|
||||||
|
|
||||||
|
def _parse_get_output(self, output: str) -> dict:
|
||||||
|
"""Parse the stdout of ``openclaw session get`` into a result dict."""
|
||||||
|
output = output.strip()
|
||||||
|
try:
|
||||||
|
data = json.loads(output)
|
||||||
|
return {
|
||||||
|
"status": data.get("status", "done"),
|
||||||
|
"output": data.get("output", output),
|
||||||
|
"artifacts": data.get("artifacts", []),
|
||||||
|
}
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
# Non-JSON output — treat as completed with raw text output.
|
||||||
|
return {
|
||||||
|
"status": "done",
|
||||||
|
"output": output,
|
||||||
|
"artifacts": [],
|
||||||
|
}
|
||||||
|
|||||||
@@ -1,16 +1,30 @@
|
|||||||
"""
|
"""
|
||||||
adapters/vcs/github.py
|
adapters/vcs/github.py
|
||||||
GitHub VCS adapter — Phase 2 stub.
|
GitHub VCS adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Uses PyGithub (``pip install PyGithub``) to interact with the GitHub REST API.
|
||||||
- Implement create_branch() using PyGithub or gh CLI subprocess.
|
Reads the repository URL and base branch from the team.yaml config dict.
|
||||||
- Implement commit() — stage files and push via git subprocess or API.
|
|
||||||
- Implement create_pr() using GitHub REST API (POST /repos/{owner}/{repo}/pulls).
|
Note on commit() signature
|
||||||
- Implement get_pr_status() using GET /repos/{owner}/{repo}/pulls/{pull_number}.
|
--------------------------
|
||||||
- Read repo and credentials from config/team.yaml and environment (GITHUB_TOKEN).
|
The base class declares ``commit(files: list[str], message: str)``, which is
|
||||||
|
insufficient for the GitHub Contents API (which requires file *content*, not
|
||||||
|
just paths). This implementation extends the signature to accept either:
|
||||||
|
|
||||||
|
* ``dict[str, str]`` — ``{path: content}`` mapping (preferred; uses the API).
|
||||||
|
* ``list[str]`` — local file paths; content is read from disk and pushed.
|
||||||
|
|
||||||
|
The optional ``branch`` keyword argument targets a specific branch; it
|
||||||
|
defaults to the configured base branch.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
from typing import Union
|
||||||
|
|
||||||
|
from github import Github, GithubException
|
||||||
|
|
||||||
from adapters.base.vcs import VCSAdapter
|
from adapters.base.vcs import VCSAdapter
|
||||||
|
|
||||||
|
|
||||||
@@ -18,34 +32,175 @@ class GitHubAdapter(VCSAdapter):
|
|||||||
"""
|
"""
|
||||||
VCS adapter for GitHub repositories.
|
VCS adapter for GitHub repositories.
|
||||||
|
|
||||||
Expects environment variable GITHUB_TOKEN and config values:
|
Authenticates via GITHUB_TOKEN and interacts with the GitHub REST API
|
||||||
run.repo — SSH or HTTPS clone URL
|
through PyGithub.
|
||||||
run.base_branch — default base branch (e.g. "main")
|
|
||||||
|
Environment variables
|
||||||
|
---------------------
|
||||||
|
GITHUB_TOKEN : Required. Personal access token or GitHub App installation token.
|
||||||
|
|
||||||
|
Config keys (from team.yaml)
|
||||||
|
----------------------------
|
||||||
|
run.repo : SSH or HTTPS clone URL (e.g. "git@github.com:org/repo.git").
|
||||||
|
run.base_branch : Default base branch (e.g. "main").
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract GITHUB_TOKEN from environment.
|
Initialise the GitHub adapter.
|
||||||
# Parse owner/repo from config.run.repo.
|
|
||||||
raise NotImplementedError("GitHubAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
ValueError
|
||||||
|
If GITHUB_TOKEN is not set or the repo URL cannot be parsed.
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
token = os.environ.get("GITHUB_TOKEN")
|
||||||
|
if not token:
|
||||||
|
raise ValueError(
|
||||||
|
"GITHUB_TOKEN environment variable is not set. "
|
||||||
|
"Create a personal access token and export it before running the-agency."
|
||||||
|
)
|
||||||
|
self._g = Github(token)
|
||||||
|
|
||||||
|
run_cfg: dict = config.get("run", {})
|
||||||
|
repo_url: str = run_cfg.get("repo", "")
|
||||||
|
self._base_branch: str = run_cfg.get("base_branch", "main")
|
||||||
|
|
||||||
|
self._owner, self._repo_name = self._parse_repo_url(repo_url)
|
||||||
|
self._repo = self._g.get_repo(f"{self._owner}/{self._repo_name}")
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _parse_repo_url(self, url: str) -> tuple[str, str]:
|
||||||
|
"""Parse *owner* and *repo* name from an SSH or HTTPS GitHub URL."""
|
||||||
|
# git@github.com:owner/repo.git
|
||||||
|
m = re.match(r"git@github\.com:([^/]+)/([^/]+?)(?:\.git)?$", url)
|
||||||
|
if m:
|
||||||
|
return m.group(1), m.group(2)
|
||||||
|
# https://github.com/owner/repo[.git]
|
||||||
|
m = re.match(r"https?://github\.com/([^/]+)/([^/]+?)(?:\.git)?/?$", url)
|
||||||
|
if m:
|
||||||
|
return m.group(1), m.group(2)
|
||||||
|
raise ValueError(
|
||||||
|
f"Cannot parse GitHub owner/repo from URL: {url!r}. "
|
||||||
|
"Expected SSH (git@github.com:owner/repo.git) or "
|
||||||
|
"HTTPS (https://github.com/owner/repo.git) format."
|
||||||
|
)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# VCSAdapter interface
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def create_branch(self, name: str) -> None:
|
def create_branch(self, name: str) -> None:
|
||||||
# TODO (Phase 2): Create branch via GitHub API or local git subprocess.
|
"""
|
||||||
# Use config.run.base_branch as the branch point.
|
Create a new branch off ``self._base_branch`` on the remote.
|
||||||
raise NotImplementedError("GitHubAdapter.create_branch is not yet implemented.")
|
|
||||||
|
|
||||||
def commit(self, files: list[str], message: str) -> str:
|
Parameters
|
||||||
# TODO (Phase 2): Stage files (git add), create commit (git commit), push.
|
----------
|
||||||
# Return the resulting commit SHA.
|
name : New branch name (e.g. "feat/webhook-ingestion").
|
||||||
raise NotImplementedError("GitHubAdapter.commit is not yet implemented.")
|
"""
|
||||||
|
base_ref = self._repo.get_git_ref(f"heads/{self._base_branch}")
|
||||||
|
self._repo.create_git_ref(f"refs/heads/{name}", base_ref.object.sha)
|
||||||
|
|
||||||
|
def commit(
|
||||||
|
self,
|
||||||
|
files: Union[dict[str, str], list[str]],
|
||||||
|
message: str,
|
||||||
|
branch: str | None = None,
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Commit files to the repository via the GitHub Contents API.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
files : Either a ``dict[path, content]`` mapping (preferred), or a
|
||||||
|
``list[path]`` of local file paths whose content is read from
|
||||||
|
disk.
|
||||||
|
message : Commit message.
|
||||||
|
branch : Target branch. Defaults to ``self._base_branch``.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
SHA of the last created/updated commit, or empty string if no files
|
||||||
|
were committed.
|
||||||
|
"""
|
||||||
|
target_branch = branch or self._base_branch
|
||||||
|
|
||||||
|
# Normalise to {path: content}
|
||||||
|
if isinstance(files, list):
|
||||||
|
files_dict: dict[str, str] = {}
|
||||||
|
for path in files:
|
||||||
|
with open(path, "r", encoding="utf-8") as fh:
|
||||||
|
files_dict[path] = fh.read()
|
||||||
|
else:
|
||||||
|
files_dict = files
|
||||||
|
|
||||||
|
last_sha: str = ""
|
||||||
|
for path, content in files_dict.items():
|
||||||
|
try:
|
||||||
|
existing = self._repo.get_contents(path, ref=target_branch)
|
||||||
|
result = self._repo.update_file(
|
||||||
|
path=path,
|
||||||
|
message=message,
|
||||||
|
content=content,
|
||||||
|
sha=existing.sha, # type: ignore[union-attr]
|
||||||
|
branch=target_branch,
|
||||||
|
)
|
||||||
|
except GithubException:
|
||||||
|
# File does not exist yet — create it
|
||||||
|
result = self._repo.create_file(
|
||||||
|
path=path,
|
||||||
|
message=message,
|
||||||
|
content=content,
|
||||||
|
branch=target_branch,
|
||||||
|
)
|
||||||
|
last_sha = result["commit"].sha
|
||||||
|
|
||||||
|
return last_sha
|
||||||
|
|
||||||
def create_pr(self, title: str, body: str, head: str, base: str) -> str:
|
def create_pr(self, title: str, body: str, head: str, base: str) -> str:
|
||||||
# TODO (Phase 2): POST to GitHub API /repos/{owner}/{repo}/pulls.
|
"""
|
||||||
# Return the HTML URL of the created PR.
|
Open a pull request on GitHub.
|
||||||
raise NotImplementedError("GitHubAdapter.create_pr is not yet implemented.")
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
title : PR title.
|
||||||
|
body : PR description / body markdown.
|
||||||
|
head : Head branch name (the branch with changes).
|
||||||
|
base : Base branch name (e.g. "main").
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
HTML URL of the created pull request.
|
||||||
|
"""
|
||||||
|
pr = self._repo.create_pull(
|
||||||
|
title=title,
|
||||||
|
body=body,
|
||||||
|
head=head,
|
||||||
|
base=base,
|
||||||
|
)
|
||||||
|
return pr.html_url
|
||||||
|
|
||||||
def get_pr_status(self, pr_id: str) -> str:
|
def get_pr_status(self, pr_id: str) -> str:
|
||||||
# TODO (Phase 2): GET /repos/{owner}/{repo}/pulls/{number}.
|
"""
|
||||||
# Map GitHub PR state ("open", "closed") + merged flag to
|
Fetch the current status of a pull request.
|
||||||
# our schema: "open" | "merged" | "closed".
|
|
||||||
raise NotImplementedError("GitHubAdapter.get_pr_status is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
pr_id : Pull request number as a string (e.g. "42").
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
One of: "open" | "merged" | "closed".
|
||||||
|
"""
|
||||||
|
pr = self._repo.get_pull(int(pr_id))
|
||||||
|
if pr.merged:
|
||||||
|
return "merged"
|
||||||
|
return pr.state # "open" or "closed"
|
||||||
|
|||||||
2
agents
2
agents
Submodule agents updated: 5c669c28e6...e7cef08365
0
cli/__init__.py
Normal file
0
cli/__init__.py
Normal file
576
cli/agency.py
Normal file
576
cli/agency.py
Normal file
@@ -0,0 +1,576 @@
|
|||||||
|
"""
|
||||||
|
cli/agency.py
|
||||||
|
Command-line interface for the-agency pipeline.
|
||||||
|
|
||||||
|
Subcommands
|
||||||
|
-----------
|
||||||
|
run <config.yaml> Start a new run, print run_id.
|
||||||
|
watch <run_id> Tail live blackboard events.
|
||||||
|
inspect <run_id> [--tier T] [--brief B] Show run tree / artifact detail.
|
||||||
|
approve <run_id> [--note "..."] Approve current inspection gate.
|
||||||
|
reject <run_id> --reason "..." Reject current gate (re-invoke tier).
|
||||||
|
pause <run_id> Force-pause at next tier boundary.
|
||||||
|
resume <run_id> Release a manual pause.
|
||||||
|
|
||||||
|
Gate approval UX
|
||||||
|
----------------
|
||||||
|
`agency approve <run_id>` writes a gate_approved event directly to the
|
||||||
|
blackboard. The runner only polls the blackboard — it does not care how
|
||||||
|
the event got there. This makes approval work on any platform that has
|
||||||
|
filesystem access to the runs/ directory.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Blackboard import (optional — degrade gracefully if core not on sys.path)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
try:
|
||||||
|
from core.blackboard import Blackboard
|
||||||
|
_HAS_BLACKBOARD = True
|
||||||
|
except ImportError:
|
||||||
|
_HAS_BLACKBOARD = False
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# ANSI colours (degraded to no-op if not a TTY)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
_IS_TTY = sys.stdout.isatty()
|
||||||
|
|
||||||
|
|
||||||
|
def _c(code: str, text: str) -> str:
|
||||||
|
if not _IS_TTY:
|
||||||
|
return text
|
||||||
|
return f"\033[{code}m{text}\033[0m"
|
||||||
|
|
||||||
|
|
||||||
|
def _bold(t: str) -> str:
|
||||||
|
return _c("1", t)
|
||||||
|
|
||||||
|
|
||||||
|
def _dim(t: str) -> str:
|
||||||
|
return _c("2", t)
|
||||||
|
|
||||||
|
|
||||||
|
def _green(t: str) -> str:
|
||||||
|
return _c("32", t)
|
||||||
|
|
||||||
|
|
||||||
|
def _yellow(t: str) -> str:
|
||||||
|
return _c("33", t)
|
||||||
|
|
||||||
|
|
||||||
|
def _red(t: str) -> str:
|
||||||
|
return _c("31", t)
|
||||||
|
|
||||||
|
|
||||||
|
def _cyan(t: str) -> str:
|
||||||
|
return _c("36", t)
|
||||||
|
|
||||||
|
|
||||||
|
def _magenta(t: str) -> str:
|
||||||
|
return _c("35", t)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _now_iso() -> str:
|
||||||
|
return datetime.now(timezone.utc).isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
def _require_blackboard(run_id: str) -> "Blackboard":
|
||||||
|
if not _HAS_BLACKBOARD:
|
||||||
|
_die("Could not import core.blackboard. Make sure you are running from the project root.")
|
||||||
|
db_path = Path("runs") / run_id / "blackboard.db"
|
||||||
|
if not db_path.exists():
|
||||||
|
_die(f"No blackboard found for run_id={run_id!r}. Expected: {db_path}")
|
||||||
|
return Blackboard(run_id)
|
||||||
|
|
||||||
|
|
||||||
|
def _die(msg: str) -> None:
|
||||||
|
print(_red(f"Error: {msg}"), file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def _fmt_ts(iso: Optional[str]) -> str:
|
||||||
|
if not iso:
|
||||||
|
return ""
|
||||||
|
try:
|
||||||
|
dt = datetime.fromisoformat(iso)
|
||||||
|
return dt.strftime("%H:%M:%S")
|
||||||
|
except ValueError:
|
||||||
|
return iso[:19]
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_detail(raw: Optional[str]) -> dict:
|
||||||
|
if not raw:
|
||||||
|
return {}
|
||||||
|
try:
|
||||||
|
return json.loads(raw)
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
return {"raw": raw}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Event rendering
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_KIND_SYMBOLS: dict[str, str] = {
|
||||||
|
"spawned": "→",
|
||||||
|
"completed": "✓",
|
||||||
|
"failed": "✗",
|
||||||
|
"escalated": "↑",
|
||||||
|
"retried": "↺",
|
||||||
|
"gate_pending": "⏸",
|
||||||
|
"gate_approved": "✓",
|
||||||
|
"gate_rejected": "✗",
|
||||||
|
"gate_paused": "⏸",
|
||||||
|
"gate_resumed": "▶",
|
||||||
|
"path_amendment": "~",
|
||||||
|
"log": " ",
|
||||||
|
}
|
||||||
|
|
||||||
|
_KIND_COLOUR: dict[str, str] = {
|
||||||
|
"spawned": "36", # cyan
|
||||||
|
"completed": "32", # green
|
||||||
|
"failed": "31", # red
|
||||||
|
"escalated": "33", # yellow
|
||||||
|
"retried": "33", # yellow
|
||||||
|
"gate_pending": "35", # magenta
|
||||||
|
"gate_approved": "32", # green
|
||||||
|
"gate_rejected": "31", # red
|
||||||
|
"gate_paused": "35", # magenta
|
||||||
|
"gate_resumed": "32", # green
|
||||||
|
"path_amendment": "33", # yellow
|
||||||
|
"log": "0", # default
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _render_event(ev: dict, run_id: str) -> str:
|
||||||
|
kind = ev.get("kind", "")
|
||||||
|
ts = _fmt_ts(ev.get("created_at"))
|
||||||
|
detail = _parse_detail(ev.get("detail"))
|
||||||
|
sym = _KIND_SYMBOLS.get(kind, "·")
|
||||||
|
col = _KIND_COLOUR.get(kind, "0")
|
||||||
|
kind_str = _c(col, f"{sym} {kind:<18}")
|
||||||
|
|
||||||
|
# Build a short message from detail
|
||||||
|
msg_parts: list[str] = []
|
||||||
|
|
||||||
|
if kind == "log":
|
||||||
|
level = detail.get("level", "info")
|
||||||
|
message = detail.get("message", "")
|
||||||
|
level_col = "33" if level == "warning" else ("31" if level == "error" else "0")
|
||||||
|
msg_parts.append(_c(level_col, message))
|
||||||
|
elif kind in ("gate_pending", "gate_approved", "gate_rejected"):
|
||||||
|
gate = detail.get("gate", "")
|
||||||
|
summary = detail.get("summary", "")
|
||||||
|
reason = detail.get("reason", "")
|
||||||
|
if gate:
|
||||||
|
msg_parts.append(_bold(f"[{gate}]"))
|
||||||
|
if summary:
|
||||||
|
msg_parts.append(summary)
|
||||||
|
if reason:
|
||||||
|
msg_parts.append(_dim(f"({reason})"))
|
||||||
|
elif kind in ("spawned", "completed", "failed", "escalated", "retried"):
|
||||||
|
tier = detail.get("tier")
|
||||||
|
role = detail.get("role", "")
|
||||||
|
ws = detail.get("workstream", "")
|
||||||
|
task_id = detail.get("task_id", "")
|
||||||
|
reason = detail.get("reason", detail.get("error", ""))
|
||||||
|
if tier:
|
||||||
|
msg_parts.append(_bold(f"T{tier}"))
|
||||||
|
if role:
|
||||||
|
msg_parts.append(role)
|
||||||
|
if ws:
|
||||||
|
msg_parts.append(_dim(f"ws={ws}"))
|
||||||
|
if task_id:
|
||||||
|
msg_parts.append(_dim(f"task={task_id}"))
|
||||||
|
if reason:
|
||||||
|
msg_parts.append(_dim(f"— {reason[:80]}"))
|
||||||
|
elif kind == "path_amendment":
|
||||||
|
proposed_by = detail.get("proposed_by", "")
|
||||||
|
reason = detail.get("reason", "")
|
||||||
|
msg_parts.append(f"{proposed_by}: {reason}")
|
||||||
|
else:
|
||||||
|
for k, v in list(detail.items())[:3]:
|
||||||
|
msg_parts.append(f"{k}={v!r}")
|
||||||
|
|
||||||
|
msg = " ".join(msg_parts)
|
||||||
|
run_prefix = _dim(f"[{run_id}]")
|
||||||
|
ts_str = _dim(ts)
|
||||||
|
return f"{run_prefix} {ts_str} {kind_str} {msg}"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: run
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_run(args: argparse.Namespace) -> None:
|
||||||
|
"""Start a new pipeline run."""
|
||||||
|
config_path = args.config
|
||||||
|
if not os.path.exists(config_path):
|
||||||
|
_die(f"Config file not found: {config_path}")
|
||||||
|
|
||||||
|
# Import here to keep startup fast for non-run commands
|
||||||
|
try:
|
||||||
|
from core.team_runner import TeamRunner
|
||||||
|
except ImportError as exc:
|
||||||
|
_die(f"Could not import core.team_runner: {exc}")
|
||||||
|
|
||||||
|
dry = getattr(args, "dry_run", False)
|
||||||
|
runner = TeamRunner(config_path=config_path, dry_run=dry)
|
||||||
|
print(f"Starting run {_bold(runner.run_id)} …")
|
||||||
|
print(_dim(f" Watch: agency watch {runner.run_id}"))
|
||||||
|
print(_dim(f" Inspect: agency inspect {runner.run_id}"))
|
||||||
|
|
||||||
|
try:
|
||||||
|
runner.run()
|
||||||
|
print(_green(f"Run {runner.run_id} complete."))
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print(_yellow(f"\nRun {runner.run_id} interrupted."))
|
||||||
|
sys.exit(1)
|
||||||
|
except Exception as exc:
|
||||||
|
print(_red(f"Run {runner.run_id} failed: {exc}"))
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: watch
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_watch(args: argparse.Namespace) -> None:
|
||||||
|
"""Tail live blackboard events for a run."""
|
||||||
|
bb = _require_blackboard(args.run_id)
|
||||||
|
run_id = args.run_id
|
||||||
|
poll = getattr(args, "poll", 2.0)
|
||||||
|
|
||||||
|
print(_bold(f"Watching run {run_id} …"), _dim("(Ctrl-C to stop)"))
|
||||||
|
|
||||||
|
seen_ids: set[str] = set()
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
events = bb.get_all_events(limit=1000)
|
||||||
|
for ev in events:
|
||||||
|
eid = ev.get("event_id", "")
|
||||||
|
if eid in seen_ids:
|
||||||
|
continue
|
||||||
|
seen_ids.add(eid)
|
||||||
|
print(_render_event(ev, run_id))
|
||||||
|
sys.stdout.flush()
|
||||||
|
|
||||||
|
# Check if run is done
|
||||||
|
summary = bb.get_run_summary()
|
||||||
|
run_status = summary.get("status", "")
|
||||||
|
if run_status in ("done", "review", "failed"):
|
||||||
|
print()
|
||||||
|
if run_status == "review":
|
||||||
|
print(_green(f"Run {run_id} complete — status: {run_status}"))
|
||||||
|
elif run_status == "failed":
|
||||||
|
print(_red(f"Run {run_id} failed"))
|
||||||
|
else:
|
||||||
|
print(_bold(f"Run {run_id} status: {run_status}"))
|
||||||
|
break
|
||||||
|
|
||||||
|
time.sleep(poll)
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print(_dim("\nStopped watching."))
|
||||||
|
finally:
|
||||||
|
bb.close()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: inspect
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_inspect(args: argparse.Namespace) -> None:
|
||||||
|
"""Show a live tree of run state."""
|
||||||
|
bb = _require_blackboard(args.run_id)
|
||||||
|
run_id = args.run_id
|
||||||
|
tier_filter: Optional[int] = getattr(args, "tier", None)
|
||||||
|
brief_filter: Optional[str] = getattr(args, "brief", None)
|
||||||
|
|
||||||
|
try:
|
||||||
|
summary = bb.get_run_summary()
|
||||||
|
if "error" in summary:
|
||||||
|
_die(summary["error"])
|
||||||
|
|
||||||
|
if brief_filter:
|
||||||
|
_inspect_brief(bb, run_id, brief_filter)
|
||||||
|
return
|
||||||
|
|
||||||
|
if tier_filter:
|
||||||
|
_inspect_tier(bb, run_id, tier_filter)
|
||||||
|
return
|
||||||
|
|
||||||
|
_inspect_run_tree(bb, run_id, summary)
|
||||||
|
finally:
|
||||||
|
bb.close()
|
||||||
|
|
||||||
|
|
||||||
|
def _inspect_run_tree(bb: "Blackboard", run_id: str, summary: dict) -> None:
|
||||||
|
status = summary.get("status", "?")
|
||||||
|
status_str = (
|
||||||
|
_green(status) if status in ("done", "review")
|
||||||
|
else _red(status) if status == "failed"
|
||||||
|
else _yellow(status)
|
||||||
|
)
|
||||||
|
print(f"\nRun {_bold(run_id)} [{status_str}]")
|
||||||
|
print(_dim(f" Goal: {summary.get('goal', '')}"))
|
||||||
|
print()
|
||||||
|
|
||||||
|
workstreams = bb.get_workstreams()
|
||||||
|
if not workstreams:
|
||||||
|
print(_dim(" No workstreams yet."))
|
||||||
|
else:
|
||||||
|
for ws in workstreams:
|
||||||
|
ws_status = ws.get("status", "?")
|
||||||
|
ws_col = "32" if ws_status == "done" else ("31" if ws_status == "failed" else "33")
|
||||||
|
ws_line = f" ├── {ws.get('name', ws.get('workstream_id'))} [{_c(ws_col, ws_status)}]"
|
||||||
|
print(ws_line)
|
||||||
|
|
||||||
|
briefs = bb.get_briefs(workstream_id=ws["workstream_id"])
|
||||||
|
for b in briefs:
|
||||||
|
b_status = b.get("status", "?")
|
||||||
|
b_col = "32" if b_status == "done" else ("31" if b_status == "failed" else "0")
|
||||||
|
print(
|
||||||
|
f" │ ├── T{b.get('tier')} {b.get('role')} "
|
||||||
|
f"[{_c(b_col, b_status)}] "
|
||||||
|
f"retries={b.get('retry_count', 0)} "
|
||||||
|
f"{_dim(b.get('brief_id', '')[:8])}"
|
||||||
|
)
|
||||||
|
|
||||||
|
print()
|
||||||
|
# Summary counts
|
||||||
|
briefs_summary = summary.get("briefs", {})
|
||||||
|
events_summary = summary.get("events", {})
|
||||||
|
print(
|
||||||
|
_dim(
|
||||||
|
f" Briefs: {briefs_summary} "
|
||||||
|
f"Events: {events_summary}"
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _inspect_tier(bb: "Blackboard", run_id: str, tier: int) -> None:
|
||||||
|
briefs = bb.get_briefs(tier=tier)
|
||||||
|
print(f"\nRun {_bold(run_id)} — T{tier} briefs ({len(briefs)})\n")
|
||||||
|
for b in briefs:
|
||||||
|
status = b.get("status", "?")
|
||||||
|
col = "32" if status == "done" else ("31" if status == "failed" else "0")
|
||||||
|
print(
|
||||||
|
f" {_dim(b.get('brief_id', '')[:8])} "
|
||||||
|
f"{b.get('role', ''):<22} [{_c(col, status)}] "
|
||||||
|
f"retries={b.get('retry_count', 0)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _inspect_brief(bb: "Blackboard", run_id: str, brief_id: str) -> None:
|
||||||
|
briefs = bb.get_briefs()
|
||||||
|
match = next(
|
||||||
|
(b for b in briefs if b.get("brief_id", "").startswith(brief_id)),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
if not match:
|
||||||
|
_die(f"Brief {brief_id!r} not found in run {run_id}")
|
||||||
|
|
||||||
|
print(f"\nBrief {_bold(match['brief_id'])}")
|
||||||
|
print(f" Tier: T{match.get('tier')}")
|
||||||
|
print(f" Role: {match.get('role')}")
|
||||||
|
print(f" Status: {match.get('status')}")
|
||||||
|
print(f" Retries: {match.get('retry_count', 0)}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
payload_raw = match.get("payload")
|
||||||
|
if payload_raw:
|
||||||
|
try:
|
||||||
|
payload = json.loads(payload_raw)
|
||||||
|
print(_bold("Payload (brief):"))
|
||||||
|
print(json.dumps(payload, indent=2))
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
print(payload_raw)
|
||||||
|
print()
|
||||||
|
|
||||||
|
result_raw = match.get("result")
|
||||||
|
if result_raw:
|
||||||
|
try:
|
||||||
|
result = json.loads(result_raw)
|
||||||
|
print(_bold("Result:"))
|
||||||
|
print(json.dumps(result, indent=2))
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
print(result_raw)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: approve
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_approve(args: argparse.Namespace) -> None:
|
||||||
|
"""Approve the current inspection gate, writing gate_approved to the blackboard."""
|
||||||
|
bb = _require_blackboard(args.run_id)
|
||||||
|
note = getattr(args, "note", None) or ""
|
||||||
|
try:
|
||||||
|
bb.log_event(
|
||||||
|
"gate_approved",
|
||||||
|
detail={"approved_by": "cli", "note": note, "timestamp": _now_iso()},
|
||||||
|
)
|
||||||
|
print(_green(f"Gate approved for run {args.run_id}."))
|
||||||
|
if note:
|
||||||
|
print(_dim(f" Note: {note}"))
|
||||||
|
finally:
|
||||||
|
bb.close()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: reject
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_reject(args: argparse.Namespace) -> None:
|
||||||
|
"""Reject the current inspection gate, writing gate_rejected to the blackboard."""
|
||||||
|
bb = _require_blackboard(args.run_id)
|
||||||
|
reason = getattr(args, "reason", None) or "rejected via CLI"
|
||||||
|
try:
|
||||||
|
bb.log_event(
|
||||||
|
"gate_rejected",
|
||||||
|
detail={"rejected_by": "cli", "reason": reason, "timestamp": _now_iso()},
|
||||||
|
)
|
||||||
|
print(_yellow(f"Gate rejected for run {args.run_id}."))
|
||||||
|
print(_dim(f" Reason: {reason}"))
|
||||||
|
finally:
|
||||||
|
bb.close()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: pause
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_pause(args: argparse.Namespace) -> None:
|
||||||
|
"""Force-pause the run at the next tier boundary."""
|
||||||
|
bb = _require_blackboard(args.run_id)
|
||||||
|
try:
|
||||||
|
bb.log_event(
|
||||||
|
"gate_paused",
|
||||||
|
detail={"paused_by": "cli", "timestamp": _now_iso()},
|
||||||
|
)
|
||||||
|
print(_yellow(f"Pause signal written for run {args.run_id}."))
|
||||||
|
print(_dim(f" Run will pause at the next tier boundary."))
|
||||||
|
print(_dim(f" To resume: agency resume {args.run_id}"))
|
||||||
|
finally:
|
||||||
|
bb.close()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Subcommand: resume
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_resume(args: argparse.Namespace) -> None:
|
||||||
|
"""Release a manual pause."""
|
||||||
|
bb = _require_blackboard(args.run_id)
|
||||||
|
try:
|
||||||
|
bb.log_event(
|
||||||
|
"gate_resumed",
|
||||||
|
detail={"resumed_by": "cli", "timestamp": _now_iso()},
|
||||||
|
)
|
||||||
|
print(_green(f"Resume signal written for run {args.run_id}."))
|
||||||
|
finally:
|
||||||
|
bb.close()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Argument parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
prog="agency",
|
||||||
|
description="the-agency pipeline CLI",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
agency run config/team.yaml
|
||||||
|
agency watch abc12345
|
||||||
|
agency inspect abc12345
|
||||||
|
agency inspect abc12345 --tier 2
|
||||||
|
agency inspect abc12345 --brief a1b2c3d4
|
||||||
|
agency approve abc12345
|
||||||
|
agency approve abc12345 --note "looks good"
|
||||||
|
agency reject abc12345 --reason "T2 missed the caching layer"
|
||||||
|
agency pause abc12345
|
||||||
|
agency resume abc12345
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
sub = parser.add_subparsers(dest="command", metavar="<command>")
|
||||||
|
sub.required = True
|
||||||
|
|
||||||
|
# run
|
||||||
|
p_run = sub.add_parser("run", help="Start a new pipeline run")
|
||||||
|
p_run.add_argument("config", nargs="?", default="config/team.yaml",
|
||||||
|
help="Path to team.yaml (default: config/team.yaml)")
|
||||||
|
p_run.add_argument("--dry-run", action="store_true",
|
||||||
|
help="Log actions without spawning agents")
|
||||||
|
p_run.set_defaults(func=cmd_run)
|
||||||
|
|
||||||
|
# watch
|
||||||
|
p_watch = sub.add_parser("watch", help="Tail live blackboard events")
|
||||||
|
p_watch.add_argument("run_id", help="Run ID to watch")
|
||||||
|
p_watch.add_argument("--poll", type=float, default=2.0,
|
||||||
|
help="Poll interval in seconds (default: 2)")
|
||||||
|
p_watch.set_defaults(func=cmd_watch)
|
||||||
|
|
||||||
|
# inspect
|
||||||
|
p_inspect = sub.add_parser("inspect", help="Show run state tree")
|
||||||
|
p_inspect.add_argument("run_id", help="Run ID to inspect")
|
||||||
|
p_inspect.add_argument("--tier", type=int, default=None,
|
||||||
|
help="Filter to a specific tier (e.g. --tier 2)")
|
||||||
|
p_inspect.add_argument("--brief", default=None,
|
||||||
|
help="Show full brief+result for brief_id prefix")
|
||||||
|
p_inspect.set_defaults(func=cmd_inspect)
|
||||||
|
|
||||||
|
# approve
|
||||||
|
p_approve = sub.add_parser("approve", help="Approve current inspection gate")
|
||||||
|
p_approve.add_argument("run_id", help="Run ID")
|
||||||
|
p_approve.add_argument("--note", default="", help="Optional note written to blackboard")
|
||||||
|
p_approve.set_defaults(func=cmd_approve)
|
||||||
|
|
||||||
|
# reject
|
||||||
|
p_reject = sub.add_parser("reject", help="Reject current inspection gate")
|
||||||
|
p_reject.add_argument("run_id", help="Run ID")
|
||||||
|
p_reject.add_argument("--reason", default="rejected via CLI",
|
||||||
|
help="Reason for rejection (shown in blackboard + logs)")
|
||||||
|
p_reject.set_defaults(func=cmd_reject)
|
||||||
|
|
||||||
|
# pause
|
||||||
|
p_pause = sub.add_parser("pause", help="Force-pause at next tier boundary")
|
||||||
|
p_pause.add_argument("run_id", help="Run ID")
|
||||||
|
p_pause.set_defaults(func=cmd_pause)
|
||||||
|
|
||||||
|
# resume
|
||||||
|
p_resume = sub.add_parser("resume", help="Release a manual pause")
|
||||||
|
p_resume.add_argument("run_id", help="Run ID")
|
||||||
|
p_resume.set_defaults(func=cmd_resume)
|
||||||
|
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Entry point
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def main(argv: Optional[list[str]] = None) -> None:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args(argv)
|
||||||
|
args.func(args)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -2,33 +2,49 @@ t1:
|
|||||||
default: agents/strategy/nexus-strategy.md
|
default: agents/strategy/nexus-strategy.md
|
||||||
|
|
||||||
t2:
|
t2:
|
||||||
backend: agents/engineering/engineering-software-architect.md
|
backend: agents/engineering/engineering-backend-architect.md
|
||||||
frontend: agents/engineering/engineering-software-architect.md
|
frontend: agents/engineering/engineering-frontend-architect.md
|
||||||
infra: agents/engineering/engineering-devops-automator.md
|
infra: agents/engineering/engineering-devops-automator.md
|
||||||
data: agents/engineering/engineering-data-engineer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
ai: agents/engineering/engineering-software-architect.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
mobile: agents/engineering/engineering-software-architect.md
|
||||||
default: agents/engineering/engineering-software-architect.md
|
default: agents/engineering/engineering-software-architect.md
|
||||||
|
|
||||||
t3:
|
t3:
|
||||||
backend: agents/engineering/engineering-senior-developer.md
|
backend: agents/engineering/engineering-senior-backend-developer.md
|
||||||
frontend: agents/engineering/engineering-senior-developer.md
|
frontend: agents/engineering/engineering-senior-frontend-developer.md
|
||||||
infra: agents/engineering/engineering-sre.md
|
infra: agents/engineering/engineering-sre.md
|
||||||
default: agents/engineering/engineering-senior-developer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
|
devops: agents/engineering/engineering-sre.md
|
||||||
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
|
default: agents/engineering/engineering-backend-developer.md
|
||||||
|
|
||||||
t4:
|
t4:
|
||||||
frontend: agents/engineering/engineering-frontend-developer.md
|
frontend: agents/engineering/engineering-frontend-developer.md
|
||||||
backend: agents/engineering/engineering-backend-architect.md
|
backend: agents/engineering/engineering-backend-developer.md
|
||||||
database: agents/engineering/engineering-database-optimizer.md
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
devops: agents/engineering/engineering-devops-automator.md
|
devops: agents/engineering/engineering-devops-automator.md
|
||||||
mobile: agents/engineering/engineering-mobile-app-builder.md
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
ai: agents/engineering/engineering-ai-engineer.md
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
security: agents/engineering/engineering-security-engineer.md
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
docs: agents/engineering/engineering-technical-writer.md
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
default: agents/engineering/engineering-senior-developer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
embedded: agents/engineering/engineering-embedded-firmware-engineer.md
|
||||||
|
default: agents/engineering/engineering-backend-developer.md
|
||||||
|
|
||||||
t5:
|
t5:
|
||||||
code: agents/engineering/engineering-code-reviewer.md
|
code: agents/engineering/engineering-code-reviewer.md
|
||||||
integration: agents/testing/testing-reality-checker.md
|
integration: agents/testing/testing-reality-checker.md
|
||||||
api: agents/testing/testing-api-tester.md
|
api: agents/testing/testing-api-tester.md
|
||||||
performance: agents/testing/testing-performance-benchmarker.md
|
performance: agents/testing/testing-performance-benchmarker.md
|
||||||
security: agents/engineering/engineering-security-engineer.md
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
default: agents/engineering/engineering-code-reviewer.md
|
accessibility: agents/testing/testing-accessibility-auditor.md
|
||||||
|
e2e: agents/testing/testing-evidence-collector.md
|
||||||
|
frontend: agents/testing/testing-accessibility-auditor.md
|
||||||
|
data: agents/testing/testing-reality-checker.md
|
||||||
|
default: agents/engineering/engineering-code-reviewer.md
|
||||||
|
|||||||
@@ -85,18 +85,37 @@ CREATE TABLE IF NOT EXISTS events (
|
|||||||
event_id TEXT PRIMARY KEY,
|
event_id TEXT PRIMARY KEY,
|
||||||
run_id TEXT NOT NULL,
|
run_id TEXT NOT NULL,
|
||||||
brief_id TEXT, -- NULL for run-level events
|
brief_id TEXT, -- NULL for run-level events
|
||||||
kind TEXT NOT NULL, -- spawned|completed|failed|escalated|retried
|
kind TEXT NOT NULL, -- see _EVENT_KINDS
|
||||||
detail TEXT, -- JSON
|
detail TEXT, -- JSON
|
||||||
created_at TEXT NOT NULL,
|
created_at TEXT NOT NULL,
|
||||||
FOREIGN KEY (run_id) REFERENCES runs(run_id)
|
FOREIGN KEY (run_id) REFERENCES runs(run_id)
|
||||||
);
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS t3_task_lists (
|
||||||
|
entry_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
workstream_id TEXT NOT NULL,
|
||||||
|
t3_agent_id TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL DEFAULT 'draft', -- draft|committed
|
||||||
|
tasks TEXT NOT NULL DEFAULT '[]', -- JSON array of T4 task descriptors
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL,
|
||||||
|
FOREIGN KEY (run_id) REFERENCES runs(run_id)
|
||||||
|
);
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# Valid status values per table — used for input validation.
|
# Valid status values per table — used for input validation.
|
||||||
_RUN_STATUSES = {"pending", "active", "review", "done", "failed"}
|
_RUN_STATUSES = {"pending", "active", "review", "done", "failed"}
|
||||||
_WS_STATUSES = {"pending", "active", "blocked", "done", "failed"}
|
_WS_STATUSES = {"pending", "active", "blocked", "done", "failed"}
|
||||||
_BRIEF_STATUSES = {"pending", "active", "done", "failed"}
|
_BRIEF_STATUSES = {"pending", "active", "done", "failed"}
|
||||||
_EVENT_KINDS = {"spawned", "completed", "failed", "escalated", "retried"}
|
_EVENT_KINDS = {
|
||||||
|
# Lifecycle
|
||||||
|
"spawned", "completed", "failed", "escalated", "retried",
|
||||||
|
# Visibility / gates
|
||||||
|
"gate_pending", "gate_approved", "gate_rejected", "gate_paused", "gate_resumed",
|
||||||
|
# Amendments / informational
|
||||||
|
"path_amendment", "log",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -360,6 +379,194 @@ class Blackboard:
|
|||||||
# Cleanup
|
# Cleanup
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Event queries
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def get_events(
|
||||||
|
self,
|
||||||
|
kinds: Optional[list[str]] = None,
|
||||||
|
after_iso: Optional[str] = None,
|
||||||
|
brief_id: Optional[str] = None,
|
||||||
|
limit: int = 100,
|
||||||
|
) -> list[dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Query events for this run.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
kinds : Filter by event kinds (OR). None = all kinds.
|
||||||
|
after_iso : Only return events created after this ISO-8601 timestamp.
|
||||||
|
brief_id : Filter by brief_id. None = all briefs.
|
||||||
|
limit : Maximum rows to return (most recent first).
|
||||||
|
"""
|
||||||
|
conditions = ["run_id = ?"]
|
||||||
|
params: list[Any] = [self.run_id]
|
||||||
|
|
||||||
|
if kinds:
|
||||||
|
placeholders = ",".join("?" * len(kinds))
|
||||||
|
conditions.append(f"kind IN ({placeholders})")
|
||||||
|
params.extend(kinds)
|
||||||
|
|
||||||
|
if after_iso:
|
||||||
|
conditions.append("created_at > ?")
|
||||||
|
params.append(after_iso)
|
||||||
|
|
||||||
|
if brief_id:
|
||||||
|
conditions.append("brief_id = ?")
|
||||||
|
params.append(brief_id)
|
||||||
|
|
||||||
|
where = " AND ".join(conditions)
|
||||||
|
rows = self._execute(
|
||||||
|
f"SELECT * FROM events WHERE {where} ORDER BY created_at DESC LIMIT ?",
|
||||||
|
(*params, limit),
|
||||||
|
).fetchall()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
|
||||||
|
def get_latest_gate_event(
|
||||||
|
self, gate_name: str, after_iso: Optional[str] = None
|
||||||
|
) -> Optional[dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Return the most recent gate_approved or gate_rejected event for
|
||||||
|
*gate_name* written after *after_iso*.
|
||||||
|
|
||||||
|
The event detail JSON is expected to contain a ``"gate"`` field
|
||||||
|
matching *gate_name*. Falls back to returning any gate resolution
|
||||||
|
event if none carry an explicit gate field (for CLI-written events
|
||||||
|
that omit it).
|
||||||
|
"""
|
||||||
|
events = self.get_events(
|
||||||
|
kinds=["gate_approved", "gate_rejected"],
|
||||||
|
after_iso=after_iso,
|
||||||
|
limit=20,
|
||||||
|
)
|
||||||
|
# Prefer events whose detail.gate matches
|
||||||
|
for ev in events:
|
||||||
|
try:
|
||||||
|
detail = json.loads(ev.get("detail") or "{}")
|
||||||
|
if detail.get("gate") == gate_name or not detail.get("gate"):
|
||||||
|
return ev
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
return ev
|
||||||
|
return None
|
||||||
|
|
||||||
|
def get_all_events(self, limit: int = 500) -> list[dict[str, Any]]:
|
||||||
|
"""Return all events for this run, oldest first."""
|
||||||
|
rows = self._execute(
|
||||||
|
"SELECT * FROM events WHERE run_id=? ORDER BY created_at ASC LIMIT ?",
|
||||||
|
(self.run_id, limit),
|
||||||
|
).fetchall()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# T3 task lists
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def create_t3_draft(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
workstream_id: str,
|
||||||
|
t3_agent_id: str,
|
||||||
|
) -> str:
|
||||||
|
"""Insert a draft t3_task_list entry. Returns entry_id."""
|
||||||
|
entry_id = _new_uuid()
|
||||||
|
now = _now_iso()
|
||||||
|
self._execute(
|
||||||
|
"INSERT OR IGNORE INTO t3_task_lists "
|
||||||
|
"(entry_id, run_id, workstream_id, t3_agent_id, status, tasks, created_at, updated_at) "
|
||||||
|
"VALUES (?, ?, ?, ?, 'draft', '[]', ?, ?)",
|
||||||
|
(entry_id, self.run_id, workstream_id, t3_agent_id, now, now),
|
||||||
|
commit=True,
|
||||||
|
)
|
||||||
|
return entry_id
|
||||||
|
|
||||||
|
def commit_t3_task_list(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
workstream_id: str,
|
||||||
|
t3_agent_id: str,
|
||||||
|
tasks: list[Any],
|
||||||
|
) -> None:
|
||||||
|
"""Update a t3_task_list entry to committed with the final task list."""
|
||||||
|
now = _now_iso()
|
||||||
|
tasks_json = json.dumps(tasks)
|
||||||
|
self._execute(
|
||||||
|
"UPDATE t3_task_lists SET status='committed', tasks=?, updated_at=? "
|
||||||
|
"WHERE run_id=? AND workstream_id=? AND t3_agent_id=?",
|
||||||
|
(tasks_json, now, self.run_id, workstream_id, t3_agent_id),
|
||||||
|
commit=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_t3_task_lists(self, workstream_id: str) -> list[dict[str, Any]]:
|
||||||
|
"""Return all t3_task_list entries for a workstream."""
|
||||||
|
rows = self._execute(
|
||||||
|
"SELECT * FROM t3_task_lists WHERE run_id=? AND workstream_id=? ORDER BY created_at ASC",
|
||||||
|
(self.run_id, workstream_id),
|
||||||
|
).fetchall()
|
||||||
|
result = []
|
||||||
|
for r in rows:
|
||||||
|
d = dict(r)
|
||||||
|
try:
|
||||||
|
d["tasks"] = json.loads(d.get("tasks") or "[]")
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
d["tasks"] = []
|
||||||
|
result.append(d)
|
||||||
|
return result
|
||||||
|
|
||||||
|
def all_t3_committed(self, workstream_id: str) -> bool:
|
||||||
|
"""Return True if all t3_task_list entries for the workstream are committed."""
|
||||||
|
rows = self._execute(
|
||||||
|
"SELECT status FROM t3_task_lists WHERE run_id=? AND workstream_id=?",
|
||||||
|
(self.run_id, workstream_id),
|
||||||
|
).fetchall()
|
||||||
|
if not rows:
|
||||||
|
return False
|
||||||
|
return all(r["status"] == "committed" for r in rows)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Briefs query
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def get_briefs(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
status: Optional[str] = None,
|
||||||
|
tier: Optional[int] = None,
|
||||||
|
workstream_id: Optional[str] = None,
|
||||||
|
) -> list[dict[str, Any]]:
|
||||||
|
"""Query briefs with optional filters."""
|
||||||
|
conditions = ["run_id = ?"]
|
||||||
|
params: list[Any] = [self.run_id]
|
||||||
|
|
||||||
|
if status:
|
||||||
|
conditions.append("status = ?")
|
||||||
|
params.append(status)
|
||||||
|
if tier is not None:
|
||||||
|
conditions.append("tier = ?")
|
||||||
|
params.append(tier)
|
||||||
|
if workstream_id:
|
||||||
|
conditions.append("workstream_id = ?")
|
||||||
|
params.append(workstream_id)
|
||||||
|
|
||||||
|
where = " AND ".join(conditions)
|
||||||
|
rows = self._execute(
|
||||||
|
f"SELECT * FROM briefs WHERE {where} ORDER BY created_at ASC",
|
||||||
|
tuple(params),
|
||||||
|
).fetchall()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
|
||||||
|
def get_workstreams(self) -> list[dict[str, Any]]:
|
||||||
|
"""Return all workstreams for this run."""
|
||||||
|
rows = self._execute(
|
||||||
|
"SELECT * FROM workstreams WHERE run_id=? ORDER BY created_at ASC",
|
||||||
|
(self.run_id,),
|
||||||
|
).fetchall()
|
||||||
|
return [dict(r) for r in rows]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Cleanup
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def close(self) -> None:
|
def close(self) -> None:
|
||||||
"""Close the database connection gracefully."""
|
"""Close the database connection gracefully."""
|
||||||
with self._lock:
|
with self._lock:
|
||||||
|
|||||||
1668
core/team_runner.py
1668
core/team_runner.py
File diff suppressed because it is too large
Load Diff
507
docs/buildspec.md
Normal file
507
docs/buildspec.md
Normal file
@@ -0,0 +1,507 @@
|
|||||||
|
# Tiered Agent Team System — Build Spec
|
||||||
|
|
||||||
|
_Started: 2026-03-15. Last updated: 2026-03-30._
|
||||||
|
_See design.md for the design doc and decisions log._
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Language & Runtime
|
||||||
|
|
||||||
|
**Python 3.11+.** Reasons:
|
||||||
|
- Agent/AI tooling is Python-first
|
||||||
|
- Clean type hints + dataclasses for schemas
|
||||||
|
- Agents can read and modify their own orchestration code
|
||||||
|
- Runs anywhere — no Node, no OpenClaw dependency
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository
|
||||||
|
|
||||||
|
Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`
|
||||||
|
|
||||||
|
Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Directory Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
agent-teams/
|
||||||
|
├── core/
|
||||||
|
│ ├── team_runner.py — run lifecycle, agent spawning
|
||||||
|
│ ├── blackboard.py — SQLite coordination state
|
||||||
|
│ ├── task_brief.py — schema + validation
|
||||||
|
│ └── escalation.py — retry logic, failure routing
|
||||||
|
│
|
||||||
|
├── adapters/
|
||||||
|
│ ├── base/
|
||||||
|
│ │ ├── llm.py — abstract LLM interface
|
||||||
|
│ │ ├── vcs.py — abstract VCS interface
|
||||||
|
│ │ ├── notify.py — abstract notification interface
|
||||||
|
│ │ └── runtime.py — abstract agent runtime interface
|
||||||
|
│ ├── llm/
|
||||||
|
│ │ ├── anthropic.py — Claude via direct Anthropic API
|
||||||
|
│ │ ├── openai.py — GPT / o-series
|
||||||
|
│ │ └── ollama.py — local models
|
||||||
|
│ ├── vcs/
|
||||||
|
│ │ └── github.py
|
||||||
|
│ ├── notify/
|
||||||
|
│ │ └── openclaw.py — messages Hans who notifies Andrew
|
||||||
|
│ └── runtime/
|
||||||
|
│ ├── openclaw.py — sessions_spawn (general purpose)
|
||||||
|
│ └── claude_code.py — coding agent runtime (file/git/exec tools)
|
||||||
|
│
|
||||||
|
├── agents/ — git submodule: msitarzewski/agency-agents
|
||||||
|
│ ├── engineering/
|
||||||
|
│ ├── testing/
|
||||||
|
│ ├── strategy/
|
||||||
|
│ └── ... — full agency-agents roster
|
||||||
|
│
|
||||||
|
├── prompts/
|
||||||
|
│ ├── t1_visionary.md — fallback if no agent_personality set
|
||||||
|
│ ├── t2_architect.md
|
||||||
|
│ ├── t3_squad_lead.md
|
||||||
|
│ ├── t4_implementer.md
|
||||||
|
│ └── t5_verifier.md
|
||||||
|
│
|
||||||
|
├── config/
|
||||||
|
│ ├── team.yaml — example run configuration
|
||||||
|
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
|
||||||
|
│
|
||||||
|
├── cli/
|
||||||
|
│ └── agency.py — run, watch, inspect, approve, reject, pause, resume
|
||||||
|
│
|
||||||
|
├── runs/ — runtime state, one subdir per run_id
|
||||||
|
│ └── .gitkeep
|
||||||
|
│
|
||||||
|
└── README.md
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Blackboard
|
||||||
|
|
||||||
|
SQLite. One file per run at `runs/<run_id>/blackboard.db`.
|
||||||
|
|
||||||
|
### Tables
|
||||||
|
|
||||||
|
**runs**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE runs (
|
||||||
|
run_id TEXT PRIMARY KEY,
|
||||||
|
goal TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | review | done | failed
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**workstreams**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE workstreams (
|
||||||
|
workstream_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
name TEXT NOT NULL,
|
||||||
|
tier INTEGER NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | blocked | done | failed
|
||||||
|
owner_agent_id TEXT,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**briefs**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE briefs (
|
||||||
|
brief_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
parent_brief_id TEXT,
|
||||||
|
workstream_id TEXT,
|
||||||
|
tier INTEGER NOT NULL,
|
||||||
|
role TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | done | failed
|
||||||
|
payload TEXT NOT NULL, -- full JSON brief
|
||||||
|
result TEXT, -- JSON result when done
|
||||||
|
retry_count INTEGER DEFAULT 0,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**events**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE events (
|
||||||
|
event_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
brief_id TEXT,
|
||||||
|
kind TEXT NOT NULL, -- see event vocabulary below
|
||||||
|
detail TEXT, -- JSON
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Event kind vocabulary:**
|
||||||
|
```
|
||||||
|
-- lifecycle
|
||||||
|
spawned | completed | failed | escalated | retried
|
||||||
|
|
||||||
|
-- visibility / gates
|
||||||
|
gate_pending -- runner hit an inspection gate, waiting for human
|
||||||
|
gate_approved -- human approved via CLI or notify
|
||||||
|
gate_rejected -- human rejected, tier re-invoked
|
||||||
|
gate_paused -- manual pause via CLI
|
||||||
|
gate_resumed -- manual resume via CLI
|
||||||
|
|
||||||
|
-- amendments / informational
|
||||||
|
path_amendment -- mid-run tier proposed a tier path change
|
||||||
|
log -- human-readable log line (detail: {level, message})
|
||||||
|
```
|
||||||
|
|
||||||
|
**t3_task_lists** *(T3 mesh coordination)*
|
||||||
|
```sql
|
||||||
|
CREATE TABLE t3_task_lists (
|
||||||
|
entry_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
workstream_id TEXT NOT NULL,
|
||||||
|
t3_agent_id TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- draft | committed
|
||||||
|
tasks TEXT NOT NULL, -- JSON array of proposed T4 task descriptors
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task Brief Schema
|
||||||
|
|
||||||
|
Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"brief_id": "uuid",
|
||||||
|
"run_id": "uuid",
|
||||||
|
"parent_brief_id": "uuid | null",
|
||||||
|
"tier": 4,
|
||||||
|
"role": "implementer",
|
||||||
|
"goal_anchor": "Original T1 intent — always propagated unchanged",
|
||||||
|
"workstream": "backend-api",
|
||||||
|
"task": "Implement POST /webhooks/ingest endpoint",
|
||||||
|
"acceptance_criteria": [
|
||||||
|
"Accepts JSON payload",
|
||||||
|
"Returns 202 on success",
|
||||||
|
"Writes to queue"
|
||||||
|
],
|
||||||
|
"constraints": [
|
||||||
|
"Use existing queue client in src/queue.py",
|
||||||
|
"No new dependencies"
|
||||||
|
],
|
||||||
|
"context": {
|
||||||
|
"relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
|
||||||
|
"interface_contract": "..."
|
||||||
|
},
|
||||||
|
"retry_budget": 3,
|
||||||
|
"retry_count": 0,
|
||||||
|
"preferred_runtime": "coding_agent",
|
||||||
|
"agent_personality": "agents/engineering/engineering-code-reviewer.md",
|
||||||
|
"created_at": "ISO-8601"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
|
||||||
|
|
||||||
|
`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
|
||||||
|
|
||||||
|
```
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adapter Interfaces
|
||||||
|
|
||||||
|
### LLM (`adapters/base/llm.py`)
|
||||||
|
```python
|
||||||
|
class LLMAdapter:
|
||||||
|
def complete(self, prompt: str, capability: str, context: dict) -> str
|
||||||
|
def resolve_model(self, capability: str) -> str
|
||||||
|
# capability: "reasoning-heavy" | "capable" | "fast-cheap"
|
||||||
|
```
|
||||||
|
|
||||||
|
### VCS (`adapters/base/vcs.py`)
|
||||||
|
```python
|
||||||
|
class VCSAdapter:
|
||||||
|
def create_branch(self, name: str) -> None
|
||||||
|
def commit(self, files: list[str], message: str) -> str # returns commit sha
|
||||||
|
def create_pr(self, title: str, body: str, head: str, base: str) -> str # returns pr url
|
||||||
|
def get_pr_status(self, pr_id: str) -> str # open | merged | closed
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notify (`adapters/base/notify.py`)
|
||||||
|
```python
|
||||||
|
class NotifyAdapter:
|
||||||
|
def send(self, message: str, context: dict) -> None
|
||||||
|
```
|
||||||
|
|
||||||
|
### Runtime (`adapters/base/runtime.py`)
|
||||||
|
```python
|
||||||
|
class RuntimeAdapter:
|
||||||
|
def spawn(self, task: str, capability: str, context: dict) -> str # returns agent_id
|
||||||
|
def get_result(self, agent_id: str, timeout_s: int) -> dict
|
||||||
|
def kill(self, agent_id: str) -> None
|
||||||
|
|
||||||
|
# Two implementations:
|
||||||
|
# openclaw.py — general purpose, uses sessions_spawn, suits T1/T2/T3
|
||||||
|
# claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
|
||||||
|
#
|
||||||
|
# The runner selects runtime based on brief.preferred_runtime:
|
||||||
|
# "standard" → openclaw.py (default)
|
||||||
|
# "coding_agent" → claude_code.py (falls back to standard if unavailable)
|
||||||
|
#
|
||||||
|
# Both implementations inject brief.agent_personality as the system prompt
|
||||||
|
# when spawning, if present. Falls back to generic tier prompt otherwise.
|
||||||
|
# claude_code.py passes the agent file via --system-prompt flag natively
|
||||||
|
# (agency-agents was designed for Claude Code's agents/ directory).
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Run Config (`config/team.yaml`)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
run:
|
||||||
|
goal: "Build webhook ingestion system with retry logic and DLQ"
|
||||||
|
repo: "git@github.com:org/repo.git"
|
||||||
|
base_branch: "main"
|
||||||
|
|
||||||
|
adapters:
|
||||||
|
llm: anthropic
|
||||||
|
vcs: github
|
||||||
|
notify: openclaw
|
||||||
|
runtime: openclaw
|
||||||
|
|
||||||
|
models:
|
||||||
|
provider: anthropic # default provider
|
||||||
|
capability_map:
|
||||||
|
reasoning-heavy:
|
||||||
|
anthropic: claude-opus-4-6
|
||||||
|
openai: o3
|
||||||
|
capable:
|
||||||
|
anthropic: claude-sonnet-4-6
|
||||||
|
openai: gpt-4o
|
||||||
|
ollama: llama3.1:70b
|
||||||
|
fast-cheap:
|
||||||
|
anthropic: claude-haiku-3-5
|
||||||
|
openai: gpt-4o-mini
|
||||||
|
ollama: llama3.2
|
||||||
|
|
||||||
|
# optional: override provider per tier
|
||||||
|
tier_overrides:
|
||||||
|
t1: { provider: openai, capability: reasoning-heavy }
|
||||||
|
t4: { provider: ollama, capability: fast-cheap }
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
default: openclaw
|
||||||
|
coding_agent: claude_code # used for T4/T5 when available; omit to disable
|
||||||
|
native_teams: false # Claude Code's experimental agent teams — opt-in only
|
||||||
|
# when true: T3 hands full workstream to Claude Code,
|
||||||
|
# which fans out internally. faster but less blackboard
|
||||||
|
# visibility. default: false (explicit T4 spawning)
|
||||||
|
# tier_runtime_map (optional overrides):
|
||||||
|
# t1: standard
|
||||||
|
# t2: standard
|
||||||
|
# t3: standard
|
||||||
|
# t4: coding_agent
|
||||||
|
# t5: coding_agent
|
||||||
|
|
||||||
|
retry_defaults:
|
||||||
|
bad_output: 3
|
||||||
|
partial: 2
|
||||||
|
blocked: 0 # always escalate immediately
|
||||||
|
|
||||||
|
visibility:
|
||||||
|
strict_mode: false # true = all gates on (recommended for first runs)
|
||||||
|
log_level: normal # normal | verbose (verbose = per-T4 start/done lines)
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists spawn
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no human response within this window
|
||||||
|
|
||||||
|
t3_mesh_timeout_minutes: 10 # max time for T3s to commit task lists before runner escalates
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Role Registry (`config/role_registry.yaml`)
|
||||||
|
|
||||||
|
Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
t1:
|
||||||
|
default: agents/strategy/nexus-strategy.md
|
||||||
|
|
||||||
|
t2:
|
||||||
|
backend: agents/engineering/engineering-software-architect.md
|
||||||
|
frontend: agents/engineering/engineering-software-architect.md
|
||||||
|
infra: agents/engineering/engineering-devops-automator.md
|
||||||
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
default: agents/engineering/engineering-software-architect.md
|
||||||
|
|
||||||
|
t3:
|
||||||
|
backend: agents/engineering/engineering-senior-developer.md
|
||||||
|
frontend: agents/engineering/engineering-senior-developer.md
|
||||||
|
infra: agents/engineering/engineering-sre.md
|
||||||
|
default: agents/engineering/engineering-senior-developer.md
|
||||||
|
|
||||||
|
t4:
|
||||||
|
frontend: agents/engineering/engineering-frontend-developer.md
|
||||||
|
backend: agents/engineering/engineering-backend-architect.md
|
||||||
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
|
devops: agents/engineering/engineering-devops-automator.md
|
||||||
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
|
default: agents/engineering/engineering-senior-developer.md
|
||||||
|
|
||||||
|
t5:
|
||||||
|
code: agents/engineering/engineering-code-reviewer.md
|
||||||
|
integration: agents/testing/testing-reality-checker.md
|
||||||
|
api: agents/testing/testing-api-tester.md
|
||||||
|
performance: agents/testing/testing-performance-benchmarker.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
default: agents/engineering/engineering-code-reviewer.md
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Flows
|
||||||
|
|
||||||
|
### 1. Run Kickoff
|
||||||
|
|
||||||
|
```
|
||||||
|
User → team_runner.start(goal, config) # via CLI or any caller
|
||||||
|
→ generate run_id
|
||||||
|
→ init blackboard (create runs/<run_id>/blackboard.db)
|
||||||
|
→ build T1 brief (goal_anchor = goal, retry_budget from config)
|
||||||
|
→ spawn T1 via runtime adapter
|
||||||
|
→ await T1 workplan
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. T1 Scope Assessment
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 receives brief
|
||||||
|
→ assess complexity → decide depth
|
||||||
|
→ identify workstreams
|
||||||
|
→ set retry_budget multiplier per workstream (1x simple, 2x complex)
|
||||||
|
→ emit N workstream briefs for T2 (or T3 if shallow)
|
||||||
|
→ write workplan to blackboard
|
||||||
|
→ team_runner spawns T2s in parallel
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. T4 Retry Loop (escalation.py)
|
||||||
|
|
||||||
|
```
|
||||||
|
spawn T4 with brief
|
||||||
|
→ receive result
|
||||||
|
→ classify: bad_output | blocked | partial | success
|
||||||
|
|
||||||
|
blocked:
|
||||||
|
→ log event(escalated)
|
||||||
|
→ pass to T3 immediately
|
||||||
|
|
||||||
|
bad_output, retries_remaining:
|
||||||
|
→ amend brief with failure context, increment retry_count
|
||||||
|
→ re-spawn T4
|
||||||
|
→ log event(retried)
|
||||||
|
|
||||||
|
bad_output, retries_exhausted:
|
||||||
|
→ log event(escalated)
|
||||||
|
→ pass to T3
|
||||||
|
|
||||||
|
partial:
|
||||||
|
→ write salvageable parts to blackboard
|
||||||
|
→ re-task remainder with new brief
|
||||||
|
|
||||||
|
success:
|
||||||
|
→ write result to blackboard
|
||||||
|
→ log event(completed)
|
||||||
|
→ notify T3
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Inspection Gate Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
runner reaches configured gate (e.g. t2_synthesis)
|
||||||
|
→ write event(gate_pending, detail={tier, summary, what_happens_next})
|
||||||
|
→ notify_adapter.send(tier summary + gate context)
|
||||||
|
→ halt: poll blackboard for gate_approved or gate_rejected
|
||||||
|
|
||||||
|
gate_approved:
|
||||||
|
→ write event(gate_approved)
|
||||||
|
→ continue run
|
||||||
|
|
||||||
|
gate_rejected:
|
||||||
|
→ write event(gate_rejected, detail={reason})
|
||||||
|
→ re-invoke tier with rejection reason in brief context
|
||||||
|
→ loop back to gate_pending when tier completes again
|
||||||
|
|
||||||
|
gate_timeout (gate_timeout_minutes elapsed):
|
||||||
|
→ treat as gate_rejected
|
||||||
|
→ notify Andrew: "Gate timed out, re-invoking tier"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Review Gate
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 completes integration
|
||||||
|
→ vcs_adapter.create_pr(
|
||||||
|
title="[agent-teams] <run_id>: <goal summary>",
|
||||||
|
body="<workplan + workstream summaries>",
|
||||||
|
head="integration/<run_id>",
|
||||||
|
base="main"
|
||||||
|
)
|
||||||
|
→ notify_adapter.send(
|
||||||
|
"Run <run_id> complete. PR ready for review: <pr_url>",
|
||||||
|
context={run_id, goal, workstreams, pr_url}
|
||||||
|
)
|
||||||
|
→ blackboard: update run status → "review"
|
||||||
|
→ halt — no auto-merge
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Build Order
|
||||||
|
|
||||||
|
1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
|
||||||
|
2. `config/role_registry.yaml` — map tier+domain → agent personality files
|
||||||
|
3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
|
||||||
|
4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
|
||||||
|
5. `adapters/base/*` — all four abstract interfaces
|
||||||
|
6. `adapters/llm/anthropic.py` — first LLM implementation
|
||||||
|
7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
|
||||||
|
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
|
||||||
|
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
|
||||||
|
10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
|
||||||
|
11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
|
||||||
|
12. `prompts/` — fallback tier prompts (used when no agent_personality set)
|
||||||
|
13. `adapters/vcs/github.py` — PR creation + branch management
|
||||||
|
14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
|
||||||
|
15. `config/team.yaml` — example config with full visibility block
|
||||||
|
16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of Scope (Phase 2)
|
||||||
|
|
||||||
|
- Cost accounting per tier + run rollup
|
||||||
|
- Parallel workstream progress dashboard
|
||||||
|
- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
|
||||||
|
- Persistent standing teams
|
||||||
|
- Web UI for run monitoring
|
||||||
681
docs/design.md
Normal file
681
docs/design.md
Normal file
@@ -0,0 +1,681 @@
|
|||||||
|
# Tiered Agent Team System — Design Document
|
||||||
|
|
||||||
|
_Started: 2026-03-14. Last updated: 2026-03-30._
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Design Decisions (formerly Open Questions)
|
||||||
|
|
||||||
|
All eight open questions resolved 2026-03-30. Details in Decisions Log.
|
||||||
|
|
||||||
|
1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
|
||||||
|
|
||||||
|
2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
|
||||||
|
|
||||||
|
3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
|
||||||
|
|
||||||
|
4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
|
||||||
|
|
||||||
|
5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
|
||||||
|
|
||||||
|
6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
|
||||||
|
|
||||||
|
7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
|
||||||
|
|
||||||
|
8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Principles
|
||||||
|
|
||||||
|
**1. Tiers represent cognitive modes, not org chart levels.**
|
||||||
|
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
|
||||||
|
|
||||||
|
**2. Depth is proportional to complexity.**
|
||||||
|
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
|
||||||
|
|
||||||
|
**3. Goal anchoring at every level.**
|
||||||
|
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
|
||||||
|
|
||||||
|
**4. Artifacts, not summaries.**
|
||||||
|
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
|
||||||
|
|
||||||
|
**5. Verification is mandatory.**
|
||||||
|
T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
|
||||||
|
|
||||||
|
**6. Provider agnostic.**
|
||||||
|
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
|
||||||
|
|
||||||
|
**7. Specialist talent pool.**
|
||||||
|
Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tier Definitions
|
||||||
|
|
||||||
|
| Tier | Role | Owns | Capability Level |
|
||||||
|
|------|------|------|-----------------|
|
||||||
|
| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
|
||||||
|
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
|
||||||
|
| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
|
||||||
|
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
|
||||||
|
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
|
||||||
|
|
||||||
|
T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
|
||||||
|
|
||||||
|
Capability levels map to actual models per provider in config — the core system never references a specific model name.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dispatch Model
|
||||||
|
|
||||||
|
### T1 Owns the Plan
|
||||||
|
|
||||||
|
T1 is not just a decomposer — it is the dispatch planner. Its output declares:
|
||||||
|
|
||||||
|
- **Workstreams** — the decomposed units of work
|
||||||
|
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
|
||||||
|
- **Parallelism** — which workstreams are independent and can run concurrently
|
||||||
|
|
||||||
|
T1 does not prescribe how each tier operates internally. That is the tier's own concern.
|
||||||
|
|
||||||
|
### T1 Lifecycle — Two Explicit Phases
|
||||||
|
|
||||||
|
T1 is invoked twice per run, each with a distinct prompt and purpose:
|
||||||
|
|
||||||
|
**Phase 1 — Plan:**
|
||||||
|
1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
|
||||||
|
2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
|
||||||
|
3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
|
||||||
|
|
||||||
|
**Phase 2 — Accept:**
|
||||||
|
After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
|
||||||
|
|
||||||
|
Both phases are named explicitly in the task brief schema and tracked on the blackboard.
|
||||||
|
|
||||||
|
### Each Tier Owns the Layer Below
|
||||||
|
|
||||||
|
Control flow is distributed, not centralised:
|
||||||
|
|
||||||
|
- T1 manages its T2s
|
||||||
|
- T2 Lead manages T2 specialists and their domain boundaries
|
||||||
|
- T2 specialists each own their T3s
|
||||||
|
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
|
||||||
|
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
|
||||||
|
|
||||||
|
This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
|
||||||
|
|
||||||
|
**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
|
||||||
|
|
||||||
|
### Dynamic Paths
|
||||||
|
|
||||||
|
Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Orchestration Patterns Per Tier
|
||||||
|
|
||||||
|
Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
|
||||||
|
|
||||||
|
| Tier | Pattern | Rationale |
|
||||||
|
|------|---------|-----------|
|
||||||
|
| T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
|
||||||
|
| T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
|
||||||
|
| T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
|
||||||
|
| T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
|
||||||
|
| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
|
||||||
|
| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
|
||||||
|
|
||||||
|
### T2 Flow in Detail
|
||||||
|
|
||||||
|
1. T1 spawns **T2 Lead Architect** with goal + workstream context
|
||||||
|
2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
|
||||||
|
3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
|
||||||
|
4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
|
||||||
|
5. Specialists work in parallel, each within their defined domain
|
||||||
|
6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
|
||||||
|
7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
|
||||||
|
8. T1 (Accept phase) validates canonical architecture against goal anchor
|
||||||
|
9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Horizontal Scaling Within Tiers
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 — Phase 1: Plan (self-critique → Andrew approval)
|
||||||
|
│
|
||||||
|
├── T2: Lead Architect (boundaries + shared assumptions first)
|
||||||
|
│ ├── T2: Backend Architect ─┐
|
||||||
|
│ ├── T2: Frontend Architect ├─ parallel, within defined domains
|
||||||
|
│ └── T2: Infra Architect ─┘
|
||||||
|
│ │
|
||||||
|
│ └── (Lead synthesises → conflict resolution if needed → canonical architecture)
|
||||||
|
│
|
||||||
|
├── T2 Backend Architect owns:
|
||||||
|
│ ├── T3: API Squad Lead ─┐
|
||||||
|
│ └── T3: DB Squad Lead ─┴─ light mesh within domain
|
||||||
|
│ ├── T4: Worker A ─┐
|
||||||
|
│ ├── T4: Worker B ─┼─ swarm / pipeline (T3 decides)
|
||||||
|
│ └── T4: Worker C ─┘
|
||||||
|
│ └── T5: Verifier(s) — fan-out + consensus
|
||||||
|
│
|
||||||
|
└── T1 — Phase 2: Accept (validates against goal anchor → PR)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Use Case Flows
|
||||||
|
|
||||||
|
T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
|
||||||
|
|
||||||
|
### Full Stack — T1→T2→T3→T4→T5
|
||||||
|
*Complex feature, new product, cross-domain changes*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess complexity (high)
|
||||||
|
→ output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
|
||||||
|
→ self-critique pass
|
||||||
|
→ GATE: surface to Andrew ← approval required
|
||||||
|
|
||||||
|
T2 Lead (spawned by runner after approval)
|
||||||
|
→ receive: goal + full workplan
|
||||||
|
→ publish: domain boundaries + shared assumptions doc → blackboard
|
||||||
|
→ GATE (optional): review boundaries before specialists spawn
|
||||||
|
|
||||||
|
T2 Specialists (parallel fan-out, wait on Lead)
|
||||||
|
→ each receives: their domain boundary + shared assumptions
|
||||||
|
→ produce: architecture proposal for their slice
|
||||||
|
→ Lead synthesises, drives conflict resolution if needed
|
||||||
|
→ Lead writes: canonical architecture → blackboard
|
||||||
|
→ GATE (recommended): review architecture before implementation
|
||||||
|
|
||||||
|
Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
|
||||||
|
|
||||||
|
T3s (light mesh within T2 domain)
|
||||||
|
→ write draft task lists to blackboard
|
||||||
|
→ read peers' lists, reconcile boundaries
|
||||||
|
→ commit merged task plan before T4 dispatch
|
||||||
|
→ GATE (optional): review task breakdown
|
||||||
|
|
||||||
|
T4s
|
||||||
|
→ swarm: independent tasks run in parallel
|
||||||
|
→ pipeline: T4-A output feeds T4-B (T3 declares dependencies)
|
||||||
|
→ commit to feature branches
|
||||||
|
|
||||||
|
T5s (fan-out per T4 slice)
|
||||||
|
→ each reviews its slice independently
|
||||||
|
→ T3 collects results → joint verdict
|
||||||
|
→ GATE (optional): review T5 verdict before T3 marks done
|
||||||
|
→ partial: T3 retries only failed slices
|
||||||
|
→ pass: T3 signals workstream done to T2
|
||||||
|
|
||||||
|
T2 specialists → signal T2 Lead
|
||||||
|
T2 Lead → writes integration summary → blackboard
|
||||||
|
|
||||||
|
T1 Accept
|
||||||
|
→ validate against goal anchor
|
||||||
|
→ open PR, notify_adapter.send(pr summary + url)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Medium Complexity — T1→T3→T4→T5
|
||||||
|
*Config change, isolated bug fix — T1 determines no cross-domain design needed*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: contained scope, single domain, no T2 architecture needed
|
||||||
|
→ workplan: tier paths [T3, T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T3s spawned directly by runner
|
||||||
|
→ receives T1 brief with task context (no T2 architecture layer)
|
||||||
|
→ T3 light mesh → T4 dispatch → T5 verify → signal done
|
||||||
|
|
||||||
|
T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
### Simple / Hotfix — T1→T4→T5
|
||||||
|
*Single file, single function, trivial atomic task*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: trivial, single workstream
|
||||||
|
→ tier path: [T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T4 (coding agent)
|
||||||
|
→ single atomic task, commits
|
||||||
|
|
||||||
|
T5 (single verifier, not full fan-out)
|
||||||
|
→ code review + correctness check
|
||||||
|
→ pass → T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Mechanics
|
||||||
|
|
||||||
|
### T3 Mesh via Blackboard
|
||||||
|
|
||||||
|
T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
|
||||||
|
|
||||||
|
1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
|
||||||
|
2. Each T3 reads all sibling T3 draft lists in its T2 domain
|
||||||
|
3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
|
||||||
|
4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
|
||||||
|
5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
|
||||||
|
|
||||||
|
The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T1 Plan Output Schema
|
||||||
|
|
||||||
|
T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"run_id": "uuid",
|
||||||
|
"goal_anchor": "Original goal — immutable, propagated to every downstream brief",
|
||||||
|
"complexity": "high | medium | low",
|
||||||
|
"retry_budget_multiplier": 2,
|
||||||
|
"workstreams": [
|
||||||
|
{
|
||||||
|
"id": "ws-backend-api",
|
||||||
|
"name": "Backend API",
|
||||||
|
"domain": "backend",
|
||||||
|
"tier_path": ["t2", "t3", "t4", "t5"],
|
||||||
|
"parallel_group": "A",
|
||||||
|
"t2_specialist": "agents/engineering/engineering-software-architect.md",
|
||||||
|
"notes": "Focus on webhook ingest and retry queue"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"parallelism": {
|
||||||
|
"groups": {
|
||||||
|
"A": ["ws-backend-api", "ws-frontend"],
|
||||||
|
"B": ["ws-infra"]
|
||||||
|
},
|
||||||
|
"sequence": ["A", "B"]
|
||||||
|
},
|
||||||
|
"self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T5 Consensus & Verdict Schema
|
||||||
|
|
||||||
|
T3 aggregates all T5 results into a joint verdict after fan-out completes.
|
||||||
|
|
||||||
|
**Individual T5 result:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"verifier_id": "uuid",
|
||||||
|
"scope": "queue-client",
|
||||||
|
"verdict": "pass | fail",
|
||||||
|
"issues": ["issue description..."],
|
||||||
|
"notes": "human-readable summary"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**T3 joint verdict (written to blackboard):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"t5_results": [...],
|
||||||
|
"joint_verdict": "pass | partial | fail",
|
||||||
|
"failed_scopes": ["queue-client"],
|
||||||
|
"summary": "Human-readable summary for gate surface and logs"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Split verdict handling:**
|
||||||
|
- `pass` → T3 marks workstream done, signals T2
|
||||||
|
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
|
||||||
|
- `fail` → T3 escalates to T2 (or T1 if shallow path)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Spawn Call Ownership
|
||||||
|
|
||||||
|
The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
|
||||||
|
2. Runner's spawn loop detects pending rows
|
||||||
|
3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
|
||||||
|
4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
|
||||||
|
5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
|
||||||
|
|
||||||
|
This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Gate Approval UX
|
||||||
|
|
||||||
|
**Core mechanic (platform-agnostic):**
|
||||||
|
|
||||||
|
1. Runner writes `gate_pending` to blackboard
|
||||||
|
2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
|
||||||
|
3. Runner polls blackboard for `gate_approved` or `gate_rejected`
|
||||||
|
4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
|
||||||
|
|
||||||
|
Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
|
||||||
|
|
||||||
|
**Adapter responsibility:**
|
||||||
|
Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
|
||||||
|
|
||||||
|
Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T3 Mesh Timeout
|
||||||
|
|
||||||
|
If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
|
||||||
|
|
||||||
|
1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
|
||||||
|
|
||||||
|
2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
|
||||||
|
|
||||||
|
Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Path Amendment Mechanism
|
||||||
|
|
||||||
|
When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
|
||||||
|
|
||||||
|
1. The discovering tier writes a `path_amendment` event to the blackboard:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"kind": "path_amendment",
|
||||||
|
"proposed_by": "t3/ws-backend-api",
|
||||||
|
"reason": "Discovered auth dependency requires T2 architectural pass",
|
||||||
|
"amendment": {
|
||||||
|
"workstream": "ws-backend-api",
|
||||||
|
"add_tiers": ["t2"],
|
||||||
|
"insert_before": "t3"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
|
||||||
|
3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
|
||||||
|
4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
|
||||||
|
|
||||||
|
No agent needs callback plumbing. The runner is the notification bridge.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Shared State
|
||||||
|
|
||||||
|
For software pipelines, **the repo is the primary blackboard**:
|
||||||
|
- T4 workers commit to feature branches
|
||||||
|
- T3 leads review and merge to workstream branches
|
||||||
|
- T2 architects own integration branches
|
||||||
|
- T1 does final integration and acceptance
|
||||||
|
|
||||||
|
Supplemented by a SQLite coordination store per run tracking:
|
||||||
|
- In-flight workstreams and their current execution plans
|
||||||
|
- Handoff artifacts and tier status
|
||||||
|
- Retry counts and escalation history
|
||||||
|
- Path amendments (proposed, by whom, timestamp)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failure Handling
|
||||||
|
|
||||||
|
Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
|
||||||
|
|
||||||
|
| Failure | Owner | Handler | Action |
|
||||||
|
|---------|-------|---------|--------|
|
||||||
|
| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
|
||||||
|
| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
|
||||||
|
| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
|
||||||
|
| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
|
||||||
|
| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
|
||||||
|
| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
|
||||||
|
| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
|
||||||
|
| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
|
||||||
|
| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
|
||||||
|
|
||||||
|
**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
|
||||||
|
|
||||||
|
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
||||||
|
|
||||||
|
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Talent Pool
|
||||||
|
|
||||||
|
The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
|
||||||
|
|
||||||
|
**Division of responsibility:**
|
||||||
|
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
|
||||||
|
- Agency-agents provides: the specialist knowledge each agent brings to its role
|
||||||
|
|
||||||
|
T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
|
||||||
|
|
||||||
|
**Default tier-to-specialist mapping for software pipelines:**
|
||||||
|
|
||||||
|
| Tier | Domain | Agent |
|
||||||
|
|------|--------|-------|
|
||||||
|
| T1 | Strategy | nexus-strategy |
|
||||||
|
| T2 | Backend | software-architect |
|
||||||
|
| T2 | Infra | devops-automator |
|
||||||
|
| T2 | Data | data-engineer |
|
||||||
|
| T3 | Backend | senior-developer |
|
||||||
|
| T3 | Reliability | sre |
|
||||||
|
| T4 | Frontend | frontend-developer |
|
||||||
|
| T4 | Backend | backend-architect |
|
||||||
|
| T4 | Database | database-optimizer |
|
||||||
|
| T4 | DevOps | devops-automator |
|
||||||
|
| T4 | Mobile | mobile-app-builder |
|
||||||
|
| T4 | AI/ML | ai-engineer |
|
||||||
|
| T4 | Security | security-engineer |
|
||||||
|
| T4 | Docs | technical-writer |
|
||||||
|
| T5 | Code review | code-reviewer |
|
||||||
|
| T5 | Integration | testing-reality-checker |
|
||||||
|
| T5 | API | testing-api-tester |
|
||||||
|
| T5 | Performance | testing-performance-benchmarker |
|
||||||
|
| T5 | Security | security-engineer |
|
||||||
|
|
||||||
|
The roster is not fixed — T1 can select any agent from the library based on workstream needs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adapter Layers
|
||||||
|
|
||||||
|
Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
|
||||||
|
|
||||||
|
```
|
||||||
|
Core (platform-agnostic)
|
||||||
|
├── team_runner — thin bootstrap: spawn T1, monitor blackboard, handle result
|
||||||
|
├── blackboard — SQLite coordination state
|
||||||
|
├── task_brief — schema + validation
|
||||||
|
└── escalation — retry logic, failure routing
|
||||||
|
|
||||||
|
Adapters (swappable)
|
||||||
|
├── llm/ — anthropic (now), openai, ollama, any API
|
||||||
|
├── notify/ — openclaw (now), slack, email, webhook...
|
||||||
|
├── vcs/ — github (now), gitlab, gitea, bare git...
|
||||||
|
└── runtime/
|
||||||
|
├── standard — openclaw sessions_spawn (T1/T2/T3)
|
||||||
|
└── coding_agent — claude_code (T4/T5 default), codex, aider...
|
||||||
|
```
|
||||||
|
|
||||||
|
Swapping providers means writing a new adapter file — nothing in core changes.
|
||||||
|
|
||||||
|
T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Run Visibility Layer
|
||||||
|
|
||||||
|
Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
|
||||||
|
|
||||||
|
### 1. Human-Readable Live Log
|
||||||
|
|
||||||
|
Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
|
||||||
|
|
||||||
|
```
|
||||||
|
[abc123] 12:30:01 T1 PLAN_START Assessing scope: "Build webhook ingestion system"
|
||||||
|
[abc123] 12:30:14 T1 PLAN_DONE 3 workstreams — backend-api, infra, docs (2 parallel)
|
||||||
|
[abc123] 12:30:14 GATE APPROVAL ⏸ Waiting on approval before T2 spawns
|
||||||
|
[abc123] 12:31:02 GATE APPROVED ✓ Approved — continuing
|
||||||
|
[abc123] 12:31:03 T2 LEAD_START Lead Architect spawned
|
||||||
|
[abc123] 12:31:41 T2 BOUNDS_READY Domain boundaries + shared assumptions published
|
||||||
|
[abc123] 12:31:42 T2 SPEC_START 3 specialists spawned (parallel): backend, infra, docs
|
||||||
|
[abc123] 12:32:15 T2 SPEC_DONE backend-api architecture draft ready
|
||||||
|
[abc123] 12:32:58 T2 SYNTH_DONE Canonical architecture written to blackboard
|
||||||
|
[abc123] 12:32:58 GATE INSPECTION ⏸ T2 synthesis ready for review
|
||||||
|
[abc123] 12:33:44 T3 MESH_START backend-api: 2 squad leads negotiating task boundaries
|
||||||
|
[abc123] 12:34:01 T3 MESH_DONE Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
|
||||||
|
[abc123] 12:34:02 T4 SWARM_START 5 workers spawned in parallel
|
||||||
|
[abc123] 12:35:10 T4 DONE worker-3 auth-middleware ✓
|
||||||
|
[abc123] 12:35:22 T4 FAIL worker-4 queue-client ✗ (retry 1/3)
|
||||||
|
[abc123] 12:36:04 T4 DONE worker-4 queue-client ✓ (retry resolved)
|
||||||
|
[abc123] 12:36:05 T5 VERIFY_START 4 verifiers spawned
|
||||||
|
[abc123] 12:36:45 T5 VERDICT partial — queue-client needs rework
|
||||||
|
[abc123] 12:37:12 T5 VERDICT ✓ all pass — workstream backend-api done
|
||||||
|
```
|
||||||
|
|
||||||
|
Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
|
||||||
|
|
||||||
|
### 2. Inspection Gates
|
||||||
|
|
||||||
|
Configurable pause points. When the runner hits a gate, it:
|
||||||
|
1. Writes a `gate_pending` event to the blackboard
|
||||||
|
2. Fires `notify_adapter.send()` with the tier summary + gate context
|
||||||
|
3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
|
||||||
|
|
||||||
|
The tier summary surfaced at each gate includes:
|
||||||
|
- **What was produced** (the tier artifact in readable form)
|
||||||
|
- **What happens next** (which agents will spawn, doing what)
|
||||||
|
- **Any anomalies** flagged by the tier itself
|
||||||
|
|
||||||
|
Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
visibility:
|
||||||
|
strict_mode: false
|
||||||
|
log_level: normal # normal | verbose
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no response within this window
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Inspection CLI — `cli/agency.py`
|
||||||
|
|
||||||
|
```
|
||||||
|
agency run <config.yaml> # start a run, returns run_id
|
||||||
|
agency watch <run_id> # tail live log (follows blackboard events)
|
||||||
|
agency inspect <run_id> # interactive tree view of run state
|
||||||
|
agency inspect <run_id> --tier t2 # jump to T2 artifacts
|
||||||
|
agency inspect <run_id> --brief <id> # show full brief + result JSON
|
||||||
|
|
||||||
|
agency approve <run_id> # approve current gate → continue
|
||||||
|
agency approve <run_id> --note "..." # approve with a note written to blackboard
|
||||||
|
agency reject <run_id> --reason "..." # reject → tier re-invoked
|
||||||
|
agency pause <run_id> # force-pause at next tier boundary
|
||||||
|
agency resume <run_id> # release a manual pause
|
||||||
|
```
|
||||||
|
|
||||||
|
`agency inspect` (no flags) renders a live tree:
|
||||||
|
```
|
||||||
|
Run abc123 — "Build webhook ingestion system"
|
||||||
|
├── T1 Plan ✓
|
||||||
|
│ └── [view workplan]
|
||||||
|
├── T2 Architecture ✓ [GATE: pending review]
|
||||||
|
│ ├── [view domain boundaries]
|
||||||
|
│ ├── [view shared assumptions]
|
||||||
|
│ └── [view canonical architecture]
|
||||||
|
├── T3 backend-api (active)
|
||||||
|
│ ├── [view task breakdown]
|
||||||
|
│ └── T4 workers: 3/7 done, 1 retrying, 3 pending
|
||||||
|
└── T3 infra (pending)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Blackboard Event Vocabulary (extended)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# existing
|
||||||
|
"spawned" | "completed" | "failed" | "escalated" | "retried"
|
||||||
|
|
||||||
|
# new — visibility layer
|
||||||
|
"gate_pending" # runner hit a gate, waiting for human
|
||||||
|
"gate_approved" # human approved, run continues
|
||||||
|
"gate_rejected" # human rejected, tier re-invoked
|
||||||
|
"gate_paused" # manual pause via CLI
|
||||||
|
"gate_resumed" # manual resume via CLI
|
||||||
|
"path_amendment" # mid-run tier proposed path change
|
||||||
|
"log" # human-readable log line (level + message)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decisions Log
|
||||||
|
|
||||||
|
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
||||||
|
|
||||||
|
**T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
|
||||||
|
|
||||||
|
**T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
|
||||||
|
|
||||||
|
**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
|
||||||
|
|
||||||
|
**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
|
||||||
|
|
||||||
|
**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
|
||||||
|
|
||||||
|
**T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
|
||||||
|
|
||||||
|
**T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
|
||||||
|
|
||||||
|
**T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
|
||||||
|
|
||||||
|
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
|
||||||
|
|
||||||
|
**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
|
||||||
|
|
||||||
|
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
||||||
|
|
||||||
|
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
|
||||||
|
|
||||||
|
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
|
||||||
|
|
||||||
|
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
||||||
|
|
||||||
|
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|
||||||
|
|
||||||
|
**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
|
||||||
|
|
||||||
|
**Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
|
||||||
|
|
||||||
|
**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
|
||||||
|
|
||||||
|
**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
|
||||||
|
|
||||||
|
**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
|
||||||
|
|
||||||
|
**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
|
||||||
|
|
||||||
|
**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
|
||||||
|
|
||||||
|
**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
|
||||||
|
|
||||||
|
**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
|
||||||
@@ -10,6 +10,9 @@ pyyaml
|
|||||||
# Environment variable management
|
# Environment variable management
|
||||||
python-dotenv
|
python-dotenv
|
||||||
|
|
||||||
|
# GitHub VCS adapter
|
||||||
|
PyGithub
|
||||||
|
|
||||||
# --- stdlib-only (no pip install needed) ---
|
# --- stdlib-only (no pip install needed) ---
|
||||||
# sqlite3 — blackboard persistence
|
# sqlite3 — blackboard persistence
|
||||||
# dataclasses — task_brief schema
|
# dataclasses — task_brief schema
|
||||||
|
|||||||
Reference in New Issue
Block a user