Compare commits
15 Commits
71316b3090
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 342832fa5e | |||
| 641f122cdb | |||
| 54afa0f53f | |||
| f228061c4d | |||
| 1c99e40f98 | |||
| 8f143e779d | |||
| a721db63f6 | |||
| 882b769d21 | |||
| ce3c020de2 | |||
| b54436f474 | |||
| 1ed7023c08 | |||
| 9efbb3b010 | |||
| 72bd744664 | |||
| 084cfb0bb2 | |||
| ce1ce85b87 |
2
.gitmodules
vendored
2
.gitmodules
vendored
@@ -1,3 +1,3 @@
|
|||||||
[submodule "agents"]
|
[submodule "agents"]
|
||||||
path = agents
|
path = agents
|
||||||
url = https://github.com/coding-with-hans-heinemann/agency-agents.git
|
url = https://git.tandrewng.com/cw-hans/agency-agents.git
|
||||||
|
|||||||
48
CLAUDE.md
Normal file
48
CLAUDE.md
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
# CLAUDE.md — Agent Quick Reference
|
||||||
|
|
||||||
|
Read this before exploring the codebase. It saves tokens.
|
||||||
|
|
||||||
|
## What This Is
|
||||||
|
|
||||||
|
A tiered multi-agent orchestration framework. T1 decomposes goals → T2 architects → T3 leads → T4 implements → T5 verifies. SQLite blackboard tracks state. All external dependencies (LLM, VCS, notify, runtime) are pluggable adapters.
|
||||||
|
|
||||||
|
## Key Docs
|
||||||
|
|
||||||
|
- `docs/design.md` — architecture decisions, tier design, key choices
|
||||||
|
- `docs/buildspec.md` — 15-step build order, phase breakdown
|
||||||
|
|
||||||
|
## Project Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
core/ — task_brief.py, blackboard.py, escalation.py, team_runner.py
|
||||||
|
adapters/base/ — abstract base classes (LLMAdapter, VCSAdapter, NotifyAdapter, RuntimeAdapter)
|
||||||
|
adapters/llm/ — anthropic.py
|
||||||
|
adapters/vcs/ — github.py
|
||||||
|
adapters/notify/— openclaw.py
|
||||||
|
adapters/runtime— openclaw.py, claude_code.py
|
||||||
|
prompts/ — T1–T5 system prompt .md files
|
||||||
|
config/ — team.yaml (run config), role_registry.yaml (tier→role→persona)
|
||||||
|
agents/ — git submodule, agent persona .md files
|
||||||
|
runs/ — per-run blackboard.db files (gitignored)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
- **Never commit or push directly to `main`** — always branch (`hans/...` or `feature/...`) and PR
|
||||||
|
- New adapters: subclass the relevant `adapters/base/*.py` abstract class
|
||||||
|
- New roles: add persona `.md` to `agents/` submodule + entry in `config/role_registry.yaml`
|
||||||
|
- Failure handling lives in `core/escalation.py` — extend `FailureType` there
|
||||||
|
- `TaskBrief` is the canonical work unit — all tiers pass briefs to each other
|
||||||
|
- Blackboard is the single source of truth per run — always write events there
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
Phase 2 adapter implementations exist. `core/team_runner.py` may still have stubs — check before assuming it's wired up end-to-end.
|
||||||
|
|
||||||
|
## Running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv .venv && source .venv/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
python -m core.team_runner --config config/team.yaml
|
||||||
|
```
|
||||||
@@ -1,16 +1,15 @@
|
|||||||
"""
|
"""
|
||||||
adapters/llm/anthropic.py
|
adapters/llm/anthropic.py
|
||||||
Anthropic Claude adapter — Phase 2 stub.
|
Anthropic Claude LLM adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Uses the ``anthropic`` SDK to call Claude models. Model selection is driven
|
||||||
- Implement complete() using the anthropic SDK (anthropic.Anthropic client).
|
by the capability_map in team.yaml so the adapter stays provider-agnostic in
|
||||||
- Implement resolve_model() by reading config/team.yaml capability_map.
|
configuration.
|
||||||
- Handle streaming responses, rate-limit retries, and token counting.
|
|
||||||
- Support system-prompt injection via context["system_prompt"].
|
|
||||||
- Map capability → model using the provider's capability_map config.
|
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
from adapters.base.llm import LLMAdapter
|
from adapters.base.llm import LLMAdapter
|
||||||
|
|
||||||
|
|
||||||
@@ -18,27 +17,123 @@ class AnthropicAdapter(LLMAdapter):
|
|||||||
"""
|
"""
|
||||||
LLM adapter for Anthropic Claude models.
|
LLM adapter for Anthropic Claude models.
|
||||||
|
|
||||||
Reads model configuration from config/team.yaml:
|
Reads model configuration from the loaded team.yaml config dict::
|
||||||
models.provider: anthropic
|
|
||||||
models.capability_map.reasoning-heavy.anthropic: claude-opus-4-6
|
models:
|
||||||
models.capability_map.capable.anthropic: claude-sonnet-4-6
|
default_max_tokens: 4096 # fallback max_tokens for all calls
|
||||||
models.capability_map.fast-cheap.anthropic: claude-haiku-3-5
|
default_temperature: 0 # fallback temperature for all calls
|
||||||
|
capability_map:
|
||||||
|
reasoning-heavy:
|
||||||
|
anthropic: claude-opus-4-6
|
||||||
|
capable:
|
||||||
|
anthropic: claude-sonnet-4-6
|
||||||
|
fast-cheap:
|
||||||
|
anthropic: claude-haiku-3-5
|
||||||
|
|
||||||
|
The provider key used when looking up ``capability_map`` is hardcoded to
|
||||||
|
``"anthropic"`` — the adapter knows its own provider; there is no need for
|
||||||
|
a separate ``models.provider`` config field.
|
||||||
|
|
||||||
|
Both ``default_max_tokens`` and ``default_temperature`` can be overridden
|
||||||
|
per-call via the ``context`` dict passed to :meth:`complete`.
|
||||||
|
|
||||||
|
Environment variables
|
||||||
|
---------------------
|
||||||
|
ANTHROPIC_API_KEY : Required. Authenticates with the Anthropic API.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract API key from environment (ANTHROPIC_API_KEY).
|
Initialise the Anthropic adapter.
|
||||||
# Initialise the anthropic.Anthropic() client.
|
|
||||||
raise NotImplementedError("AnthropicAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
ValueError
|
||||||
|
If ANTHROPIC_API_KEY is not set in the environment.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import anthropic as _anthropic
|
||||||
|
except ModuleNotFoundError as exc:
|
||||||
|
raise ImportError(
|
||||||
|
"The 'anthropic' package is required for AnthropicAdapter. "
|
||||||
|
"Install it with: pip install anthropic"
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
self._config = config
|
||||||
|
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
||||||
|
if not api_key:
|
||||||
|
raise ValueError(
|
||||||
|
"ANTHROPIC_API_KEY environment variable is not set. "
|
||||||
|
"Export it before running the-agency."
|
||||||
|
)
|
||||||
|
self._client = _anthropic.Anthropic(api_key=api_key)
|
||||||
|
self._models_cfg: dict = config.get("models", {})
|
||||||
|
self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
|
||||||
|
self._default_temperature: float = self._models_cfg.get("default_temperature", 0)
|
||||||
|
|
||||||
def complete(self, prompt: str, capability: str, context: dict) -> str:
|
def complete(self, prompt: str, capability: str, context: dict) -> str:
|
||||||
# TODO (Phase 2): Call anthropic client messages.create().
|
"""
|
||||||
# Use resolve_model(capability) to pick the model.
|
Send a prompt to a Claude model and return the text response.
|
||||||
# Support context keys: system_prompt, max_tokens, temperature.
|
|
||||||
# Return response text as a plain string.
|
Parameters
|
||||||
raise NotImplementedError("AnthropicAdapter.complete is not yet implemented.")
|
----------
|
||||||
|
prompt : User-role prompt content.
|
||||||
|
capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
|
||||||
|
context : Optional per-call overrides:
|
||||||
|
system_prompt (str) — prepended as the system turn.
|
||||||
|
max_tokens (int) — defaults to models.default_max_tokens in team.yaml.
|
||||||
|
temperature (float) — defaults to models.default_temperature in team.yaml.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
The model's text completion as a plain string.
|
||||||
|
"""
|
||||||
|
model = self.resolve_model(capability)
|
||||||
|
max_tokens: int = context.get("max_tokens", self._default_max_tokens)
|
||||||
|
temperature: float = context.get("temperature", self._default_temperature)
|
||||||
|
system_prompt: str = context.get("system_prompt", "")
|
||||||
|
|
||||||
|
create_kwargs: dict = {
|
||||||
|
"model": model,
|
||||||
|
"max_tokens": max_tokens,
|
||||||
|
"messages": [{"role": "user", "content": prompt}],
|
||||||
|
}
|
||||||
|
if system_prompt:
|
||||||
|
create_kwargs["system"] = system_prompt
|
||||||
|
if temperature != 0.0:
|
||||||
|
create_kwargs["temperature"] = temperature
|
||||||
|
|
||||||
|
response = self._client.messages.create(**create_kwargs)
|
||||||
|
return response.content[0].text
|
||||||
|
|
||||||
def resolve_model(self, capability: str) -> str:
|
def resolve_model(self, capability: str) -> str:
|
||||||
# TODO (Phase 2): Look up capability in team.yaml capability_map.
|
"""
|
||||||
# Fall back to "capable" tier model if capability is unknown.
|
Map a capability string to the Anthropic model identifier.
|
||||||
raise NotImplementedError("AnthropicAdapter.resolve_model is not yet implemented.")
|
|
||||||
|
Looks up ``config.models.capability_map[capability][provider]``.
|
||||||
|
Falls back to the "capable" tier model if the capability is unknown.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
capability : One of "reasoning-heavy" | "capable" | "fast-cheap".
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
Anthropic model identifier (e.g. "claude-opus-4-6").
|
||||||
|
"""
|
||||||
|
# The adapter knows its own provider — no need to read it from config.
|
||||||
|
cap_map: dict = self._models_cfg.get("capability_map", {})
|
||||||
|
|
||||||
|
if capability in cap_map and "anthropic" in cap_map[capability]:
|
||||||
|
return cap_map[capability]["anthropic"]
|
||||||
|
|
||||||
|
# Fall back to "capable" tier
|
||||||
|
if "capable" in cap_map and "anthropic" in cap_map["capable"]:
|
||||||
|
return cap_map["capable"]["anthropic"]
|
||||||
|
|
||||||
|
# Hard-coded last resort
|
||||||
|
return "claude-sonnet-4-6"
|
||||||
|
|||||||
@@ -1,35 +1,93 @@
|
|||||||
"""
|
"""
|
||||||
adapters/notify/openclaw.py
|
adapters/notify/openclaw.py
|
||||||
OpenClaw notification adapter — Phase 2 stub.
|
OpenClaw notification adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Sends notifications by shelling out to the ``openclaw`` CLI::
|
||||||
- Implement send() to dispatch notifications via the OpenClaw API.
|
|
||||||
- Support context keys: channel, severity, run_id, brief_id.
|
openclaw system event --text "<message>" --mode now
|
||||||
- Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
|
|
||||||
- Handle rate limiting and delivery retries.
|
If the binary is not on PATH the method logs a warning and returns without
|
||||||
|
raising — notifications are best-effort and should never crash the pipeline.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
|
||||||
from adapters.base.notify import NotifyAdapter
|
from adapters.base.notify import NotifyAdapter
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
class OpenClawNotifyAdapter(NotifyAdapter):
|
class OpenClawNotifyAdapter(NotifyAdapter):
|
||||||
"""
|
"""
|
||||||
Notification adapter that sends messages via OpenClaw.
|
Notification adapter that dispatches messages via the ``openclaw`` CLI.
|
||||||
|
|
||||||
Expects environment variables:
|
Environment variables
|
||||||
OPENCLAW_API_KEY — authentication token
|
---------------------
|
||||||
OPENCLAW_URL — base URL for the OpenClaw API (optional, defaults to hosted)
|
OPENCLAW_SIGNAL_NUMBER : Optional. Direct signal target for OpenClaw sends.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
|
Initialise the OpenClaw notification adapter.
|
||||||
# Initialise an HTTP client (e.g. httpx or requests).
|
|
||||||
raise NotImplementedError("OpenClawNotifyAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict (reserved for future options).
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
self._signal_number: str = os.environ.get("OPENCLAW_SIGNAL_NUMBER", "")
|
||||||
|
|
||||||
def send(self, message: str, context: dict) -> None:
|
def send(self, message: str, context: dict) -> None:
|
||||||
# TODO (Phase 2): POST notification payload to OpenClaw API.
|
"""
|
||||||
# Include message, context (channel, severity, run_id, brief_id).
|
Send a notification via ``openclaw system event``.
|
||||||
# Log delivery confirmation or raise on failure.
|
|
||||||
raise NotImplementedError("OpenClawNotifyAdapter.send is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
message : Human-readable notification text.
|
||||||
|
context : Optional metadata. Recognised keys:
|
||||||
|
level (str) — "info" | "warning" | "error"; logged locally.
|
||||||
|
run_id (str) — included in the local log record.
|
||||||
|
brief_id (str) — included in the local log record.
|
||||||
|
|
||||||
|
Notes
|
||||||
|
-----
|
||||||
|
If the ``openclaw`` binary is not present on PATH, the method logs a
|
||||||
|
warning and returns silently. Notifications are best-effort.
|
||||||
|
"""
|
||||||
|
level: str = context.get("level", "info")
|
||||||
|
run_id: str = context.get("run_id", "")
|
||||||
|
brief_id: str = context.get("brief_id", "")
|
||||||
|
|
||||||
|
# Always log locally regardless of CLI availability.
|
||||||
|
log_msg = "[notify:%s] %s (run=%s brief=%s)" % (level, message, run_id, brief_id)
|
||||||
|
if level == "error":
|
||||||
|
logger.error(log_msg)
|
||||||
|
elif level == "warning":
|
||||||
|
logger.warning(log_msg)
|
||||||
|
else:
|
||||||
|
logger.info(log_msg)
|
||||||
|
|
||||||
|
cmd = ["openclaw", "system", "event", "--text", message, "--mode", "now"]
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
cmd,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if result.returncode != 0:
|
||||||
|
logger.warning(
|
||||||
|
"openclaw event returned non-zero exit %d: %s",
|
||||||
|
result.returncode,
|
||||||
|
result.stderr.strip(),
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
logger.warning(
|
||||||
|
"openclaw CLI not found on PATH; notification not delivered: %s",
|
||||||
|
message,
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.warning("openclaw event timed out for message: %s", message)
|
||||||
|
|||||||
@@ -1,51 +1,163 @@
|
|||||||
"""
|
"""
|
||||||
adapters/runtime/claude_code.py
|
adapters/runtime/claude_code.py
|
||||||
Claude Code agent runtime adapter — Phase 2 stub.
|
Claude Code sub-agent runtime adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Spawns the ``claude`` CLI as a non-interactive subprocess for T4/T5
|
||||||
- Implement spawn() to launch a Claude Code sub-agent via the Agent SDK.
|
implementation tasks::
|
||||||
- Implement get_result() to await agent completion and parse the output.
|
|
||||||
- Implement kill() to terminate the sub-agent process or session.
|
claude --permission-mode bypassPermissions --print "<task>"
|
||||||
- Map task brief context (files, constraints, artifacts) into the agent's
|
|
||||||
system prompt and tool context.
|
Each spawned process is tracked by a UUID job_id so callers can later poll
|
||||||
- Handle Claude Code tool-use responses and extract structured output.
|
for the result or terminate the job. Stdout is captured and returned as the
|
||||||
|
agent output; stderr is included for debugging.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
import tempfile
|
||||||
|
import threading
|
||||||
|
import uuid
|
||||||
|
|
||||||
from adapters.base.runtime import RuntimeAdapter
|
from adapters.base.runtime import RuntimeAdapter
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
class ClaudeCodeRuntimeAdapter(RuntimeAdapter):
|
class ClaudeCodeRuntimeAdapter(RuntimeAdapter):
|
||||||
"""
|
"""
|
||||||
Runtime adapter that spawns Claude Code sub-agents for coding tasks.
|
Runtime adapter that spawns ``claude`` CLI sub-agents for coding tasks.
|
||||||
|
|
||||||
Used when a TaskBrief has preferred_runtime == "coding_agent".
|
Credentials are inherited from the environment (``ANTHROPIC_API_KEY``).
|
||||||
|
The ``claude`` CLI must be installed and reachable on PATH.
|
||||||
|
|
||||||
Expects the Claude Code CLI / Agent SDK to be available in the environment.
|
Used when a TaskBrief has ``preferred_runtime == "coding_agent"``.
|
||||||
Credentials are inherited from the environment (ANTHROPIC_API_KEY).
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Validate that Claude Code CLI or SDK is accessible.
|
Initialise the Claude Code runtime adapter.
|
||||||
# Initialise any agent session management state.
|
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict (reserved for future options).
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
# Maps job_id → running Popen instance.
|
||||||
|
self._jobs: dict[str, subprocess.Popen] = {}
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# RuntimeAdapter interface
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def spawn(self, task: str, capability: str, context: dict) -> str:
|
def spawn(self, task: str, capability: str, context: dict) -> str:
|
||||||
# TODO (Phase 2): Launch a Claude Code sub-agent.
|
"""
|
||||||
# Compose a structured system prompt from task + context.
|
Launch ``claude --permission-mode bypassPermissions --print "<task>"``
|
||||||
# Inject relevant files and constraints as tool context.
|
as a non-interactive subprocess.
|
||||||
# Return an agent_id that maps to a running agent session.
|
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.spawn is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
task : Full task description (typically a JSON-serialised brief).
|
||||||
|
capability : Capability hint (not forwarded; Claude Code resolves its
|
||||||
|
own model from the local environment).
|
||||||
|
context : Optional keys:
|
||||||
|
workdir (str) — cwd for the subprocess. A fresh
|
||||||
|
temporary directory is created if omitted.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
A UUID job_id string that uniquely identifies this subprocess.
|
||||||
|
"""
|
||||||
|
workdir: str = context.get("workdir") or tempfile.mkdtemp(
|
||||||
|
prefix="agency-claude-"
|
||||||
|
)
|
||||||
|
job_id = str(uuid.uuid4())
|
||||||
|
logger.info("Spawning Claude Code job %s in %s", job_id, workdir)
|
||||||
|
|
||||||
|
proc = subprocess.Popen(
|
||||||
|
["claude", "--permission-mode", "bypassPermissions", "--print", task],
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
stderr=subprocess.PIPE,
|
||||||
|
text=True,
|
||||||
|
cwd=workdir,
|
||||||
|
)
|
||||||
|
|
||||||
|
with self._lock:
|
||||||
|
self._jobs[job_id] = proc
|
||||||
|
|
||||||
|
return job_id
|
||||||
|
|
||||||
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
||||||
# TODO (Phase 2): Await the Claude Code agent session to complete.
|
"""
|
||||||
# Parse the agent's final message for structured JSON output.
|
Wait for the Claude Code subprocess to complete and return its output.
|
||||||
# Return dict with: {"status": ..., "output": ..., "artifacts": [...]}.
|
|
||||||
# Raise TimeoutError if timeout_s elapses.
|
Parameters
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.get_result is not yet implemented.")
|
----------
|
||||||
|
agent_id : Job id returned by spawn().
|
||||||
|
timeout_s : Maximum seconds to wait before raising TimeoutError.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
dict with keys:
|
||||||
|
status ("completed" | "failed")
|
||||||
|
output (str — full stdout)
|
||||||
|
artifacts (list — always empty; callers must parse output)
|
||||||
|
stderr (str — full stderr)
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
KeyError
|
||||||
|
If agent_id does not correspond to a known job.
|
||||||
|
TimeoutError
|
||||||
|
If the subprocess does not finish within timeout_s seconds.
|
||||||
|
"""
|
||||||
|
with self._lock:
|
||||||
|
proc = self._jobs.get(agent_id)
|
||||||
|
|
||||||
|
if proc is None:
|
||||||
|
raise KeyError(f"No Claude Code job found for agent_id={agent_id!r}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
stdout, stderr = proc.communicate(timeout=timeout_s)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
proc.kill()
|
||||||
|
stdout, stderr = proc.communicate()
|
||||||
|
raise TimeoutError(
|
||||||
|
f"Claude Code job {agent_id!r} did not complete within {timeout_s}s."
|
||||||
|
)
|
||||||
|
|
||||||
|
status = "completed" if proc.returncode == 0 else "failed"
|
||||||
|
logger.info(
|
||||||
|
"Claude Code job %s finished: status=%s returncode=%d",
|
||||||
|
agent_id,
|
||||||
|
status,
|
||||||
|
proc.returncode,
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": status,
|
||||||
|
"output": stdout,
|
||||||
|
"artifacts": [],
|
||||||
|
"stderr": stderr,
|
||||||
|
}
|
||||||
|
|
||||||
def kill(self, agent_id: str) -> None:
|
def kill(self, agent_id: str) -> None:
|
||||||
# TODO (Phase 2): Terminate the Claude Code agent session.
|
"""
|
||||||
# Clean up any temporary files or session state.
|
Terminate a running Claude Code subprocess.
|
||||||
raise NotImplementedError("ClaudeCodeRuntimeAdapter.kill is not yet implemented.")
|
|
||||||
|
Silently succeeds if the job has already finished or the id is unknown.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
agent_id : Job id returned by spawn().
|
||||||
|
"""
|
||||||
|
with self._lock:
|
||||||
|
proc = self._jobs.get(agent_id)
|
||||||
|
|
||||||
|
if proc is not None:
|
||||||
|
try:
|
||||||
|
proc.terminate()
|
||||||
|
logger.info("Terminated Claude Code job %s", agent_id)
|
||||||
|
except OSError:
|
||||||
|
pass # Process already gone — that is fine.
|
||||||
|
|||||||
@@ -1,48 +1,241 @@
|
|||||||
"""
|
"""
|
||||||
adapters/runtime/openclaw.py
|
adapters/runtime/openclaw.py
|
||||||
OpenClaw agent runtime adapter — Phase 2 stub.
|
OpenClaw agent runtime adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Spawns sub-agents by shelling out to the ``openclaw`` CLI::
|
||||||
- Implement spawn() to submit a task to an OpenClaw worker pool.
|
|
||||||
- Implement get_result() to poll or subscribe for agent completion.
|
openclaw session spawn --task "<task>" --mode run
|
||||||
- Implement kill() to cancel a running OpenClaw agent job.
|
openclaw session get <session_id>
|
||||||
- Read endpoint and credentials from environment (OPENCLAW_API_KEY, OPENCLAW_URL).
|
openclaw session kill <session_id>
|
||||||
- Map capability hint to an appropriate worker class/queue.
|
|
||||||
|
If the ``openclaw`` binary is unavailable, all methods raise
|
||||||
|
``NotImplementedError`` with a helpful message rather than crashing with a
|
||||||
|
raw ``FileNotFoundError``.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
|
||||||
from adapters.base.runtime import RuntimeAdapter
|
from adapters.base.runtime import RuntimeAdapter
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Status strings from the openclaw CLI that indicate a session has finished.
|
||||||
|
_TERMINAL_STATUSES = frozenset(
|
||||||
|
{"done", "completed", "failed", "partial", "blocked", "error"}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class OpenClawRuntimeAdapter(RuntimeAdapter):
|
class OpenClawRuntimeAdapter(RuntimeAdapter):
|
||||||
"""
|
"""
|
||||||
Runtime adapter that dispatches agent tasks to OpenClaw workers.
|
Runtime adapter that dispatches agent tasks to OpenClaw worker sessions.
|
||||||
|
|
||||||
Expects environment variables:
|
All interactions use the ``openclaw`` CLI. No additional credentials are
|
||||||
OPENCLAW_API_KEY — authentication token
|
required beyond what OpenClaw manages in the local environment.
|
||||||
OPENCLAW_URL — base URL for the OpenClaw API
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract OPENCLAW_API_KEY and OPENCLAW_URL from environment.
|
Initialise the OpenClaw runtime adapter.
|
||||||
# Initialise HTTP client and any job-tracking state.
|
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict (reserved for future options).
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# RuntimeAdapter interface
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def spawn(self, task: str, capability: str, context: dict) -> str:
|
def spawn(self, task: str, capability: str, context: dict) -> str:
|
||||||
# TODO (Phase 2): Submit task to OpenClaw worker pool.
|
"""
|
||||||
# Map capability ("reasoning-heavy" | "capable" | "fast-cheap") to
|
Spawn an OpenClaw agent session for the given task.
|
||||||
# an appropriate worker queue or model hint.
|
|
||||||
# Return an agent_id string that can be used to poll for results.
|
Parameters
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.spawn is not yet implemented.")
|
----------
|
||||||
|
task : Natural-language task description.
|
||||||
|
capability : Capability hint ("reasoning-heavy" | "capable" | "fast-cheap").
|
||||||
|
Passed informally; actual routing is handled by OpenClaw.
|
||||||
|
context : Arbitrary context bag (currently unused by this adapter).
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
session_id string parsed from the CLI output.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
NotImplementedError
|
||||||
|
If the ``openclaw`` CLI is not available on PATH.
|
||||||
|
RuntimeError
|
||||||
|
If the session_id cannot be parsed from the CLI output.
|
||||||
|
"""
|
||||||
|
# TODO: map capability to an openclaw worker tier / model hint if the
|
||||||
|
# openclaw CLI gains that flag in a future release.
|
||||||
|
cmd = ["openclaw", "session", "spawn", "--task", task, "--mode", "run"]
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
cmd,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise NotImplementedError(
|
||||||
|
"openclaw CLI not found on PATH. "
|
||||||
|
"Install OpenClaw or configure a different runtime adapter "
|
||||||
|
"(e.g. adapters.runtime.claude_code.ClaudeCodeRuntimeAdapter)."
|
||||||
|
)
|
||||||
|
except subprocess.CalledProcessError as exc:
|
||||||
|
raise RuntimeError(
|
||||||
|
f"openclaw session spawn failed (exit {exc.returncode}): "
|
||||||
|
f"{exc.stderr.strip()}"
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
return self._parse_session_id(result.stdout)
|
||||||
|
|
||||||
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
def get_result(self, agent_id: str, timeout_s: int) -> dict:
|
||||||
# TODO (Phase 2): Poll or long-poll the OpenClaw API for job completion.
|
"""
|
||||||
# Raise TimeoutError if timeout_s elapses before the job finishes.
|
Poll ``openclaw session get`` until the session reaches a terminal
|
||||||
# Return a dict with at minimum: {"status": ..., "output": ..., "artifacts": [...]}.
|
state or *timeout_s* seconds elapse.
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.get_result is not yet implemented.")
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
agent_id : Session ID returned by spawn().
|
||||||
|
timeout_s : Maximum seconds to wait before raising TimeoutError.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
dict with keys: ``status``, ``output``, ``artifacts``.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
TimeoutError
|
||||||
|
If the session does not finish within timeout_s seconds.
|
||||||
|
NotImplementedError
|
||||||
|
If the ``openclaw`` CLI is not available on PATH.
|
||||||
|
"""
|
||||||
|
deadline = time.monotonic() + timeout_s
|
||||||
|
poll_interval = 2.0
|
||||||
|
|
||||||
|
while time.monotonic() < deadline:
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["openclaw", "session", "get", agent_id],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise NotImplementedError(
|
||||||
|
"openclaw CLI not found on PATH. "
|
||||||
|
"Install OpenClaw or switch to a different runtime adapter."
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.debug("openclaw session get timed out; will retry")
|
||||||
|
time.sleep(poll_interval)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if result.returncode == 0 and result.stdout.strip():
|
||||||
|
parsed = self._parse_get_output(result.stdout)
|
||||||
|
if parsed.get("status", "").lower() in _TERMINAL_STATUSES:
|
||||||
|
return parsed
|
||||||
|
else:
|
||||||
|
logger.debug(
|
||||||
|
"openclaw session get returned exit=%d; retrying. stderr=%s",
|
||||||
|
result.returncode,
|
||||||
|
result.stderr.strip(),
|
||||||
|
)
|
||||||
|
|
||||||
|
time.sleep(poll_interval)
|
||||||
|
|
||||||
|
raise TimeoutError(
|
||||||
|
f"Agent {agent_id!r} did not complete within {timeout_s}s."
|
||||||
|
)
|
||||||
|
|
||||||
def kill(self, agent_id: str) -> None:
|
def kill(self, agent_id: str) -> None:
|
||||||
# TODO (Phase 2): Send a cancellation request to the OpenClaw API.
|
"""
|
||||||
# Silently succeed if the agent has already finished.
|
Terminate an OpenClaw session unconditionally.
|
||||||
raise NotImplementedError("OpenClawRuntimeAdapter.kill is not yet implemented.")
|
|
||||||
|
Silently succeeds if the session has already finished.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
agent_id : Session ID returned by spawn().
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
NotImplementedError
|
||||||
|
If the ``openclaw`` CLI is not available on PATH.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
subprocess.run(
|
||||||
|
["openclaw", "session", "kill", agent_id],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise NotImplementedError(
|
||||||
|
"openclaw CLI not found on PATH. "
|
||||||
|
"Install OpenClaw or switch to a different runtime adapter."
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.warning("openclaw session kill timed out for agent %s", agent_id)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Private helpers
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _parse_session_id(self, output: str) -> str:
|
||||||
|
"""Extract a session_id from the raw stdout of ``openclaw session spawn``."""
|
||||||
|
output = output.strip()
|
||||||
|
|
||||||
|
# Prefer structured JSON output.
|
||||||
|
try:
|
||||||
|
data = json.loads(output)
|
||||||
|
for key in ("session_id", "sessionId", "id"):
|
||||||
|
if key in data:
|
||||||
|
return str(data[key])
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Regex: look for "session_id: <id>" or similar.
|
||||||
|
m = re.search(
|
||||||
|
r"(?:session[_\s]?id|sessionId)[:\s]+([a-zA-Z0-9_\-]+)",
|
||||||
|
output,
|
||||||
|
re.IGNORECASE,
|
||||||
|
)
|
||||||
|
if m:
|
||||||
|
return m.group(1)
|
||||||
|
|
||||||
|
# Last resort: return the first non-empty line.
|
||||||
|
lines = [ln.strip() for ln in output.splitlines() if ln.strip()]
|
||||||
|
if lines:
|
||||||
|
return lines[0]
|
||||||
|
|
||||||
|
raise RuntimeError(
|
||||||
|
f"Could not parse session_id from openclaw output: {output!r}"
|
||||||
|
)
|
||||||
|
|
||||||
|
def _parse_get_output(self, output: str) -> dict:
|
||||||
|
"""Parse the stdout of ``openclaw session get`` into a result dict."""
|
||||||
|
output = output.strip()
|
||||||
|
try:
|
||||||
|
data = json.loads(output)
|
||||||
|
return {
|
||||||
|
"status": data.get("status", "done"),
|
||||||
|
"output": data.get("output", output),
|
||||||
|
"artifacts": data.get("artifacts", []),
|
||||||
|
}
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
# Non-JSON output — treat as completed with raw text output.
|
||||||
|
return {
|
||||||
|
"status": "done",
|
||||||
|
"output": output,
|
||||||
|
"artifacts": [],
|
||||||
|
}
|
||||||
|
|||||||
@@ -1,16 +1,30 @@
|
|||||||
"""
|
"""
|
||||||
adapters/vcs/github.py
|
adapters/vcs/github.py
|
||||||
GitHub VCS adapter — Phase 2 stub.
|
GitHub VCS adapter — Phase 2 implementation.
|
||||||
|
|
||||||
TODO (Phase 2):
|
Uses PyGithub (``pip install PyGithub``) to interact with the GitHub REST API.
|
||||||
- Implement create_branch() using PyGithub or gh CLI subprocess.
|
Reads the repository URL and base branch from the team.yaml config dict.
|
||||||
- Implement commit() — stage files and push via git subprocess or API.
|
|
||||||
- Implement create_pr() using GitHub REST API (POST /repos/{owner}/{repo}/pulls).
|
Note on commit() signature
|
||||||
- Implement get_pr_status() using GET /repos/{owner}/{repo}/pulls/{pull_number}.
|
--------------------------
|
||||||
- Read repo and credentials from config/team.yaml and environment (GITHUB_TOKEN).
|
The base class declares ``commit(files: list[str], message: str)``, which is
|
||||||
|
insufficient for the GitHub Contents API (which requires file *content*, not
|
||||||
|
just paths). This implementation extends the signature to accept either:
|
||||||
|
|
||||||
|
* ``dict[str, str]`` — ``{path: content}`` mapping (preferred; uses the API).
|
||||||
|
* ``list[str]`` — local file paths; content is read from disk and pushed.
|
||||||
|
|
||||||
|
The optional ``branch`` keyword argument targets a specific branch; it
|
||||||
|
defaults to the configured base branch.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
from typing import Union
|
||||||
|
|
||||||
|
from github import Github, GithubException
|
||||||
|
|
||||||
from adapters.base.vcs import VCSAdapter
|
from adapters.base.vcs import VCSAdapter
|
||||||
|
|
||||||
|
|
||||||
@@ -18,34 +32,175 @@ class GitHubAdapter(VCSAdapter):
|
|||||||
"""
|
"""
|
||||||
VCS adapter for GitHub repositories.
|
VCS adapter for GitHub repositories.
|
||||||
|
|
||||||
Expects environment variable GITHUB_TOKEN and config values:
|
Authenticates via GITHUB_TOKEN and interacts with the GitHub REST API
|
||||||
run.repo — SSH or HTTPS clone URL
|
through PyGithub.
|
||||||
run.base_branch — default base branch (e.g. "main")
|
|
||||||
|
Environment variables
|
||||||
|
---------------------
|
||||||
|
GITHUB_TOKEN : Required. Personal access token or GitHub App installation token.
|
||||||
|
|
||||||
|
Config keys (from team.yaml)
|
||||||
|
----------------------------
|
||||||
|
run.repo : SSH or HTTPS clone URL (e.g. "git@github.com:org/repo.git").
|
||||||
|
run.base_branch : Default base branch (e.g. "main").
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, config: dict) -> None:
|
def __init__(self, config: dict) -> None:
|
||||||
# TODO (Phase 2): Accept loaded team.yaml config dict.
|
"""
|
||||||
# Extract GITHUB_TOKEN from environment.
|
Initialise the GitHub adapter.
|
||||||
# Parse owner/repo from config.run.repo.
|
|
||||||
raise NotImplementedError("GitHubAdapter.__init__ is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
config : Loaded team.yaml config dict.
|
||||||
|
|
||||||
|
Raises
|
||||||
|
------
|
||||||
|
ValueError
|
||||||
|
If GITHUB_TOKEN is not set or the repo URL cannot be parsed.
|
||||||
|
"""
|
||||||
|
self._config = config
|
||||||
|
token = os.environ.get("GITHUB_TOKEN")
|
||||||
|
if not token:
|
||||||
|
raise ValueError(
|
||||||
|
"GITHUB_TOKEN environment variable is not set. "
|
||||||
|
"Create a personal access token and export it before running the-agency."
|
||||||
|
)
|
||||||
|
self._g = Github(token)
|
||||||
|
|
||||||
|
run_cfg: dict = config.get("run", {})
|
||||||
|
repo_url: str = run_cfg.get("repo", "")
|
||||||
|
self._base_branch: str = run_cfg.get("base_branch", "main")
|
||||||
|
|
||||||
|
self._owner, self._repo_name = self._parse_repo_url(repo_url)
|
||||||
|
self._repo = self._g.get_repo(f"{self._owner}/{self._repo_name}")
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def _parse_repo_url(self, url: str) -> tuple[str, str]:
|
||||||
|
"""Parse *owner* and *repo* name from an SSH or HTTPS GitHub URL."""
|
||||||
|
# git@github.com:owner/repo.git
|
||||||
|
m = re.match(r"git@github\.com:([^/]+)/([^/]+?)(?:\.git)?$", url)
|
||||||
|
if m:
|
||||||
|
return m.group(1), m.group(2)
|
||||||
|
# https://github.com/owner/repo[.git]
|
||||||
|
m = re.match(r"https?://github\.com/([^/]+)/([^/]+?)(?:\.git)?/?$", url)
|
||||||
|
if m:
|
||||||
|
return m.group(1), m.group(2)
|
||||||
|
raise ValueError(
|
||||||
|
f"Cannot parse GitHub owner/repo from URL: {url!r}. "
|
||||||
|
"Expected SSH (git@github.com:owner/repo.git) or "
|
||||||
|
"HTTPS (https://github.com/owner/repo.git) format."
|
||||||
|
)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# VCSAdapter interface
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
def create_branch(self, name: str) -> None:
|
def create_branch(self, name: str) -> None:
|
||||||
# TODO (Phase 2): Create branch via GitHub API or local git subprocess.
|
"""
|
||||||
# Use config.run.base_branch as the branch point.
|
Create a new branch off ``self._base_branch`` on the remote.
|
||||||
raise NotImplementedError("GitHubAdapter.create_branch is not yet implemented.")
|
|
||||||
|
|
||||||
def commit(self, files: list[str], message: str) -> str:
|
Parameters
|
||||||
# TODO (Phase 2): Stage files (git add), create commit (git commit), push.
|
----------
|
||||||
# Return the resulting commit SHA.
|
name : New branch name (e.g. "feat/webhook-ingestion").
|
||||||
raise NotImplementedError("GitHubAdapter.commit is not yet implemented.")
|
"""
|
||||||
|
base_ref = self._repo.get_git_ref(f"heads/{self._base_branch}")
|
||||||
|
self._repo.create_git_ref(f"refs/heads/{name}", base_ref.object.sha)
|
||||||
|
|
||||||
|
def commit(
|
||||||
|
self,
|
||||||
|
files: Union[dict[str, str], list[str]],
|
||||||
|
message: str,
|
||||||
|
branch: str | None = None,
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Commit files to the repository via the GitHub Contents API.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
files : Either a ``dict[path, content]`` mapping (preferred), or a
|
||||||
|
``list[path]`` of local file paths whose content is read from
|
||||||
|
disk.
|
||||||
|
message : Commit message.
|
||||||
|
branch : Target branch. Defaults to ``self._base_branch``.
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
SHA of the last created/updated commit, or empty string if no files
|
||||||
|
were committed.
|
||||||
|
"""
|
||||||
|
target_branch = branch or self._base_branch
|
||||||
|
|
||||||
|
# Normalise to {path: content}
|
||||||
|
if isinstance(files, list):
|
||||||
|
files_dict: dict[str, str] = {}
|
||||||
|
for path in files:
|
||||||
|
with open(path, "r", encoding="utf-8") as fh:
|
||||||
|
files_dict[path] = fh.read()
|
||||||
|
else:
|
||||||
|
files_dict = files
|
||||||
|
|
||||||
|
last_sha: str = ""
|
||||||
|
for path, content in files_dict.items():
|
||||||
|
try:
|
||||||
|
existing = self._repo.get_contents(path, ref=target_branch)
|
||||||
|
result = self._repo.update_file(
|
||||||
|
path=path,
|
||||||
|
message=message,
|
||||||
|
content=content,
|
||||||
|
sha=existing.sha, # type: ignore[union-attr]
|
||||||
|
branch=target_branch,
|
||||||
|
)
|
||||||
|
except GithubException:
|
||||||
|
# File does not exist yet — create it
|
||||||
|
result = self._repo.create_file(
|
||||||
|
path=path,
|
||||||
|
message=message,
|
||||||
|
content=content,
|
||||||
|
branch=target_branch,
|
||||||
|
)
|
||||||
|
last_sha = result["commit"].sha
|
||||||
|
|
||||||
|
return last_sha
|
||||||
|
|
||||||
def create_pr(self, title: str, body: str, head: str, base: str) -> str:
|
def create_pr(self, title: str, body: str, head: str, base: str) -> str:
|
||||||
# TODO (Phase 2): POST to GitHub API /repos/{owner}/{repo}/pulls.
|
"""
|
||||||
# Return the HTML URL of the created PR.
|
Open a pull request on GitHub.
|
||||||
raise NotImplementedError("GitHubAdapter.create_pr is not yet implemented.")
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
title : PR title.
|
||||||
|
body : PR description / body markdown.
|
||||||
|
head : Head branch name (the branch with changes).
|
||||||
|
base : Base branch name (e.g. "main").
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
HTML URL of the created pull request.
|
||||||
|
"""
|
||||||
|
pr = self._repo.create_pull(
|
||||||
|
title=title,
|
||||||
|
body=body,
|
||||||
|
head=head,
|
||||||
|
base=base,
|
||||||
|
)
|
||||||
|
return pr.html_url
|
||||||
|
|
||||||
def get_pr_status(self, pr_id: str) -> str:
|
def get_pr_status(self, pr_id: str) -> str:
|
||||||
# TODO (Phase 2): GET /repos/{owner}/{repo}/pulls/{number}.
|
"""
|
||||||
# Map GitHub PR state ("open", "closed") + merged flag to
|
Fetch the current status of a pull request.
|
||||||
# our schema: "open" | "merged" | "closed".
|
|
||||||
raise NotImplementedError("GitHubAdapter.get_pr_status is not yet implemented.")
|
Parameters
|
||||||
|
----------
|
||||||
|
pr_id : Pull request number as a string (e.g. "42").
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
One of: "open" | "merged" | "closed".
|
||||||
|
"""
|
||||||
|
pr = self._repo.get_pull(int(pr_id))
|
||||||
|
if pr.merged:
|
||||||
|
return "merged"
|
||||||
|
return pr.state # "open" or "closed"
|
||||||
|
|||||||
2
agents
2
agents
Submodule agents updated: 5c669c28e6...5f1204a023
@@ -2,28 +2,40 @@ t1:
|
|||||||
default: agents/strategy/nexus-strategy.md
|
default: agents/strategy/nexus-strategy.md
|
||||||
|
|
||||||
t2:
|
t2:
|
||||||
backend: agents/engineering/engineering-software-architect.md
|
backend: agents/engineering/engineering-backend-architect.md
|
||||||
frontend: agents/engineering/engineering-software-architect.md
|
frontend: agents/engineering/engineering-frontend-architect.md
|
||||||
infra: agents/engineering/engineering-devops-automator.md
|
infra: agents/engineering/engineering-devops-automator.md
|
||||||
data: agents/engineering/engineering-data-engineer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
ai: agents/engineering/engineering-software-architect.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
mobile: agents/engineering/engineering-software-architect.md
|
||||||
default: agents/engineering/engineering-software-architect.md
|
default: agents/engineering/engineering-software-architect.md
|
||||||
|
|
||||||
t3:
|
t3:
|
||||||
backend: agents/engineering/engineering-senior-developer.md
|
backend: agents/engineering/engineering-senior-backend-developer.md
|
||||||
frontend: agents/engineering/engineering-senior-developer.md
|
frontend: agents/engineering/engineering-senior-frontend-developer.md
|
||||||
infra: agents/engineering/engineering-sre.md
|
infra: agents/engineering/engineering-sre.md
|
||||||
default: agents/engineering/engineering-senior-developer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
|
devops: agents/engineering/engineering-sre.md
|
||||||
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
|
default: agents/engineering/engineering-backend-developer.md
|
||||||
|
|
||||||
t4:
|
t4:
|
||||||
frontend: agents/engineering/engineering-frontend-developer.md
|
frontend: agents/engineering/engineering-frontend-developer.md
|
||||||
backend: agents/engineering/engineering-backend-architect.md
|
backend: agents/engineering/engineering-backend-developer.md
|
||||||
database: agents/engineering/engineering-database-optimizer.md
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
devops: agents/engineering/engineering-devops-automator.md
|
devops: agents/engineering/engineering-devops-automator.md
|
||||||
mobile: agents/engineering/engineering-mobile-app-builder.md
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
ai: agents/engineering/engineering-ai-engineer.md
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
security: agents/engineering/engineering-security-engineer.md
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
docs: agents/engineering/engineering-technical-writer.md
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
default: agents/engineering/engineering-senior-developer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
embedded: agents/engineering/engineering-embedded-firmware-engineer.md
|
||||||
|
default: agents/engineering/engineering-backend-developer.md
|
||||||
|
|
||||||
t5:
|
t5:
|
||||||
code: agents/engineering/engineering-code-reviewer.md
|
code: agents/engineering/engineering-code-reviewer.md
|
||||||
@@ -31,4 +43,8 @@ t5:
|
|||||||
api: agents/testing/testing-api-tester.md
|
api: agents/testing/testing-api-tester.md
|
||||||
performance: agents/testing/testing-performance-benchmarker.md
|
performance: agents/testing/testing-performance-benchmarker.md
|
||||||
security: agents/engineering/engineering-security-engineer.md
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
accessibility: agents/testing/testing-accessibility-auditor.md
|
||||||
|
e2e: agents/testing/testing-evidence-collector.md
|
||||||
|
frontend: agents/testing/testing-accessibility-auditor.md
|
||||||
|
data: agents/testing/testing-reality-checker.md
|
||||||
default: agents/engineering/engineering-code-reviewer.md
|
default: agents/engineering/engineering-code-reviewer.md
|
||||||
|
|||||||
507
docs/buildspec.md
Normal file
507
docs/buildspec.md
Normal file
@@ -0,0 +1,507 @@
|
|||||||
|
# Tiered Agent Team System — Build Spec
|
||||||
|
|
||||||
|
_Started: 2026-03-15. Last updated: 2026-03-30._
|
||||||
|
_See design.md for the design doc and decisions log._
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Language & Runtime
|
||||||
|
|
||||||
|
**Python 3.11+.** Reasons:
|
||||||
|
- Agent/AI tooling is Python-first
|
||||||
|
- Clean type hints + dataclasses for schemas
|
||||||
|
- Agents can read and modify their own orchestration code
|
||||||
|
- Runs anywhere — no Node, no OpenClaw dependency
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository
|
||||||
|
|
||||||
|
Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`
|
||||||
|
|
||||||
|
Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Directory Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
agent-teams/
|
||||||
|
├── core/
|
||||||
|
│ ├── team_runner.py — run lifecycle, agent spawning
|
||||||
|
│ ├── blackboard.py — SQLite coordination state
|
||||||
|
│ ├── task_brief.py — schema + validation
|
||||||
|
│ └── escalation.py — retry logic, failure routing
|
||||||
|
│
|
||||||
|
├── adapters/
|
||||||
|
│ ├── base/
|
||||||
|
│ │ ├── llm.py — abstract LLM interface
|
||||||
|
│ │ ├── vcs.py — abstract VCS interface
|
||||||
|
│ │ ├── notify.py — abstract notification interface
|
||||||
|
│ │ └── runtime.py — abstract agent runtime interface
|
||||||
|
│ ├── llm/
|
||||||
|
│ │ ├── anthropic.py — Claude via direct Anthropic API
|
||||||
|
│ │ ├── openai.py — GPT / o-series
|
||||||
|
│ │ └── ollama.py — local models
|
||||||
|
│ ├── vcs/
|
||||||
|
│ │ └── github.py
|
||||||
|
│ ├── notify/
|
||||||
|
│ │ └── openclaw.py — messages Hans who notifies Andrew
|
||||||
|
│ └── runtime/
|
||||||
|
│ ├── openclaw.py — sessions_spawn (general purpose)
|
||||||
|
│ └── claude_code.py — coding agent runtime (file/git/exec tools)
|
||||||
|
│
|
||||||
|
├── agents/ — git submodule: msitarzewski/agency-agents
|
||||||
|
│ ├── engineering/
|
||||||
|
│ ├── testing/
|
||||||
|
│ ├── strategy/
|
||||||
|
│ └── ... — full agency-agents roster
|
||||||
|
│
|
||||||
|
├── prompts/
|
||||||
|
│ ├── t1_visionary.md — fallback if no agent_personality set
|
||||||
|
│ ├── t2_architect.md
|
||||||
|
│ ├── t3_squad_lead.md
|
||||||
|
│ ├── t4_implementer.md
|
||||||
|
│ └── t5_verifier.md
|
||||||
|
│
|
||||||
|
├── config/
|
||||||
|
│ ├── team.yaml — example run configuration
|
||||||
|
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
|
||||||
|
│
|
||||||
|
├── cli/
|
||||||
|
│ └── agency.py — run, watch, inspect, approve, reject, pause, resume
|
||||||
|
│
|
||||||
|
├── runs/ — runtime state, one subdir per run_id
|
||||||
|
│ └── .gitkeep
|
||||||
|
│
|
||||||
|
└── README.md
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Blackboard
|
||||||
|
|
||||||
|
SQLite. One file per run at `runs/<run_id>/blackboard.db`.
|
||||||
|
|
||||||
|
### Tables
|
||||||
|
|
||||||
|
**runs**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE runs (
|
||||||
|
run_id TEXT PRIMARY KEY,
|
||||||
|
goal TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | review | done | failed
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**workstreams**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE workstreams (
|
||||||
|
workstream_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
name TEXT NOT NULL,
|
||||||
|
tier INTEGER NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | blocked | done | failed
|
||||||
|
owner_agent_id TEXT,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**briefs**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE briefs (
|
||||||
|
brief_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
parent_brief_id TEXT,
|
||||||
|
workstream_id TEXT,
|
||||||
|
tier INTEGER NOT NULL,
|
||||||
|
role TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | done | failed
|
||||||
|
payload TEXT NOT NULL, -- full JSON brief
|
||||||
|
result TEXT, -- JSON result when done
|
||||||
|
retry_count INTEGER DEFAULT 0,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**events**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE events (
|
||||||
|
event_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
brief_id TEXT,
|
||||||
|
kind TEXT NOT NULL, -- see event vocabulary below
|
||||||
|
detail TEXT, -- JSON
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Event kind vocabulary:**
|
||||||
|
```
|
||||||
|
-- lifecycle
|
||||||
|
spawned | completed | failed | escalated | retried
|
||||||
|
|
||||||
|
-- visibility / gates
|
||||||
|
gate_pending -- runner hit an inspection gate, waiting for human
|
||||||
|
gate_approved -- human approved via CLI or notify
|
||||||
|
gate_rejected -- human rejected, tier re-invoked
|
||||||
|
gate_paused -- manual pause via CLI
|
||||||
|
gate_resumed -- manual resume via CLI
|
||||||
|
|
||||||
|
-- amendments / informational
|
||||||
|
path_amendment -- mid-run tier proposed a tier path change
|
||||||
|
log -- human-readable log line (detail: {level, message})
|
||||||
|
```
|
||||||
|
|
||||||
|
**t3_task_lists** *(T3 mesh coordination)*
|
||||||
|
```sql
|
||||||
|
CREATE TABLE t3_task_lists (
|
||||||
|
entry_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
workstream_id TEXT NOT NULL,
|
||||||
|
t3_agent_id TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- draft | committed
|
||||||
|
tasks TEXT NOT NULL, -- JSON array of proposed T4 task descriptors
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task Brief Schema
|
||||||
|
|
||||||
|
Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"brief_id": "uuid",
|
||||||
|
"run_id": "uuid",
|
||||||
|
"parent_brief_id": "uuid | null",
|
||||||
|
"tier": 4,
|
||||||
|
"role": "implementer",
|
||||||
|
"goal_anchor": "Original T1 intent — always propagated unchanged",
|
||||||
|
"workstream": "backend-api",
|
||||||
|
"task": "Implement POST /webhooks/ingest endpoint",
|
||||||
|
"acceptance_criteria": [
|
||||||
|
"Accepts JSON payload",
|
||||||
|
"Returns 202 on success",
|
||||||
|
"Writes to queue"
|
||||||
|
],
|
||||||
|
"constraints": [
|
||||||
|
"Use existing queue client in src/queue.py",
|
||||||
|
"No new dependencies"
|
||||||
|
],
|
||||||
|
"context": {
|
||||||
|
"relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
|
||||||
|
"interface_contract": "..."
|
||||||
|
},
|
||||||
|
"retry_budget": 3,
|
||||||
|
"retry_count": 0,
|
||||||
|
"preferred_runtime": "coding_agent",
|
||||||
|
"agent_personality": "agents/engineering/engineering-code-reviewer.md",
|
||||||
|
"created_at": "ISO-8601"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
|
||||||
|
|
||||||
|
`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
|
||||||
|
|
||||||
|
```
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adapter Interfaces
|
||||||
|
|
||||||
|
### LLM (`adapters/base/llm.py`)
|
||||||
|
```python
|
||||||
|
class LLMAdapter:
|
||||||
|
def complete(self, prompt: str, capability: str, context: dict) -> str
|
||||||
|
def resolve_model(self, capability: str) -> str
|
||||||
|
# capability: "reasoning-heavy" | "capable" | "fast-cheap"
|
||||||
|
```
|
||||||
|
|
||||||
|
### VCS (`adapters/base/vcs.py`)
|
||||||
|
```python
|
||||||
|
class VCSAdapter:
|
||||||
|
def create_branch(self, name: str) -> None
|
||||||
|
def commit(self, files: list[str], message: str) -> str # returns commit sha
|
||||||
|
def create_pr(self, title: str, body: str, head: str, base: str) -> str # returns pr url
|
||||||
|
def get_pr_status(self, pr_id: str) -> str # open | merged | closed
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notify (`adapters/base/notify.py`)
|
||||||
|
```python
|
||||||
|
class NotifyAdapter:
|
||||||
|
def send(self, message: str, context: dict) -> None
|
||||||
|
```
|
||||||
|
|
||||||
|
### Runtime (`adapters/base/runtime.py`)
|
||||||
|
```python
|
||||||
|
class RuntimeAdapter:
|
||||||
|
def spawn(self, task: str, capability: str, context: dict) -> str # returns agent_id
|
||||||
|
def get_result(self, agent_id: str, timeout_s: int) -> dict
|
||||||
|
def kill(self, agent_id: str) -> None
|
||||||
|
|
||||||
|
# Two implementations:
|
||||||
|
# openclaw.py — general purpose, uses sessions_spawn, suits T1/T2/T3
|
||||||
|
# claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
|
||||||
|
#
|
||||||
|
# The runner selects runtime based on brief.preferred_runtime:
|
||||||
|
# "standard" → openclaw.py (default)
|
||||||
|
# "coding_agent" → claude_code.py (falls back to standard if unavailable)
|
||||||
|
#
|
||||||
|
# Both implementations inject brief.agent_personality as the system prompt
|
||||||
|
# when spawning, if present. Falls back to generic tier prompt otherwise.
|
||||||
|
# claude_code.py passes the agent file via --system-prompt flag natively
|
||||||
|
# (agency-agents was designed for Claude Code's agents/ directory).
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Run Config (`config/team.yaml`)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
run:
|
||||||
|
goal: "Build webhook ingestion system with retry logic and DLQ"
|
||||||
|
repo: "git@github.com:org/repo.git"
|
||||||
|
base_branch: "main"
|
||||||
|
|
||||||
|
adapters:
|
||||||
|
llm: anthropic
|
||||||
|
vcs: github
|
||||||
|
notify: openclaw
|
||||||
|
runtime: openclaw
|
||||||
|
|
||||||
|
models:
|
||||||
|
provider: anthropic # default provider
|
||||||
|
capability_map:
|
||||||
|
reasoning-heavy:
|
||||||
|
anthropic: claude-opus-4-6
|
||||||
|
openai: o3
|
||||||
|
capable:
|
||||||
|
anthropic: claude-sonnet-4-6
|
||||||
|
openai: gpt-4o
|
||||||
|
ollama: llama3.1:70b
|
||||||
|
fast-cheap:
|
||||||
|
anthropic: claude-haiku-3-5
|
||||||
|
openai: gpt-4o-mini
|
||||||
|
ollama: llama3.2
|
||||||
|
|
||||||
|
# optional: override provider per tier
|
||||||
|
tier_overrides:
|
||||||
|
t1: { provider: openai, capability: reasoning-heavy }
|
||||||
|
t4: { provider: ollama, capability: fast-cheap }
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
default: openclaw
|
||||||
|
coding_agent: claude_code # used for T4/T5 when available; omit to disable
|
||||||
|
native_teams: false # Claude Code's experimental agent teams — opt-in only
|
||||||
|
# when true: T3 hands full workstream to Claude Code,
|
||||||
|
# which fans out internally. faster but less blackboard
|
||||||
|
# visibility. default: false (explicit T4 spawning)
|
||||||
|
# tier_runtime_map (optional overrides):
|
||||||
|
# t1: standard
|
||||||
|
# t2: standard
|
||||||
|
# t3: standard
|
||||||
|
# t4: coding_agent
|
||||||
|
# t5: coding_agent
|
||||||
|
|
||||||
|
retry_defaults:
|
||||||
|
bad_output: 3
|
||||||
|
partial: 2
|
||||||
|
blocked: 0 # always escalate immediately
|
||||||
|
|
||||||
|
visibility:
|
||||||
|
strict_mode: false # true = all gates on (recommended for first runs)
|
||||||
|
log_level: normal # normal | verbose (verbose = per-T4 start/done lines)
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists spawn
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no human response within this window
|
||||||
|
|
||||||
|
t3_mesh_timeout_minutes: 10 # max time for T3s to commit task lists before runner escalates
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Role Registry (`config/role_registry.yaml`)
|
||||||
|
|
||||||
|
Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
t1:
|
||||||
|
default: agents/strategy/nexus-strategy.md
|
||||||
|
|
||||||
|
t2:
|
||||||
|
backend: agents/engineering/engineering-software-architect.md
|
||||||
|
frontend: agents/engineering/engineering-software-architect.md
|
||||||
|
infra: agents/engineering/engineering-devops-automator.md
|
||||||
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
default: agents/engineering/engineering-software-architect.md
|
||||||
|
|
||||||
|
t3:
|
||||||
|
backend: agents/engineering/engineering-senior-developer.md
|
||||||
|
frontend: agents/engineering/engineering-senior-developer.md
|
||||||
|
infra: agents/engineering/engineering-sre.md
|
||||||
|
default: agents/engineering/engineering-senior-developer.md
|
||||||
|
|
||||||
|
t4:
|
||||||
|
frontend: agents/engineering/engineering-frontend-developer.md
|
||||||
|
backend: agents/engineering/engineering-backend-architect.md
|
||||||
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
|
devops: agents/engineering/engineering-devops-automator.md
|
||||||
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
|
default: agents/engineering/engineering-senior-developer.md
|
||||||
|
|
||||||
|
t5:
|
||||||
|
code: agents/engineering/engineering-code-reviewer.md
|
||||||
|
integration: agents/testing/testing-reality-checker.md
|
||||||
|
api: agents/testing/testing-api-tester.md
|
||||||
|
performance: agents/testing/testing-performance-benchmarker.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
default: agents/engineering/engineering-code-reviewer.md
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Flows
|
||||||
|
|
||||||
|
### 1. Run Kickoff
|
||||||
|
|
||||||
|
```
|
||||||
|
User → team_runner.start(goal, config) # via CLI or any caller
|
||||||
|
→ generate run_id
|
||||||
|
→ init blackboard (create runs/<run_id>/blackboard.db)
|
||||||
|
→ build T1 brief (goal_anchor = goal, retry_budget from config)
|
||||||
|
→ spawn T1 via runtime adapter
|
||||||
|
→ await T1 workplan
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. T1 Scope Assessment
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 receives brief
|
||||||
|
→ assess complexity → decide depth
|
||||||
|
→ identify workstreams
|
||||||
|
→ set retry_budget multiplier per workstream (1x simple, 2x complex)
|
||||||
|
→ emit N workstream briefs for T2 (or T3 if shallow)
|
||||||
|
→ write workplan to blackboard
|
||||||
|
→ team_runner spawns T2s in parallel
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. T4 Retry Loop (escalation.py)
|
||||||
|
|
||||||
|
```
|
||||||
|
spawn T4 with brief
|
||||||
|
→ receive result
|
||||||
|
→ classify: bad_output | blocked | partial | success
|
||||||
|
|
||||||
|
blocked:
|
||||||
|
→ log event(escalated)
|
||||||
|
→ pass to T3 immediately
|
||||||
|
|
||||||
|
bad_output, retries_remaining:
|
||||||
|
→ amend brief with failure context, increment retry_count
|
||||||
|
→ re-spawn T4
|
||||||
|
→ log event(retried)
|
||||||
|
|
||||||
|
bad_output, retries_exhausted:
|
||||||
|
→ log event(escalated)
|
||||||
|
→ pass to T3
|
||||||
|
|
||||||
|
partial:
|
||||||
|
→ write salvageable parts to blackboard
|
||||||
|
→ re-task remainder with new brief
|
||||||
|
|
||||||
|
success:
|
||||||
|
→ write result to blackboard
|
||||||
|
→ log event(completed)
|
||||||
|
→ notify T3
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Inspection Gate Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
runner reaches configured gate (e.g. t2_synthesis)
|
||||||
|
→ write event(gate_pending, detail={tier, summary, what_happens_next})
|
||||||
|
→ notify_adapter.send(tier summary + gate context)
|
||||||
|
→ halt: poll blackboard for gate_approved or gate_rejected
|
||||||
|
|
||||||
|
gate_approved:
|
||||||
|
→ write event(gate_approved)
|
||||||
|
→ continue run
|
||||||
|
|
||||||
|
gate_rejected:
|
||||||
|
→ write event(gate_rejected, detail={reason})
|
||||||
|
→ re-invoke tier with rejection reason in brief context
|
||||||
|
→ loop back to gate_pending when tier completes again
|
||||||
|
|
||||||
|
gate_timeout (gate_timeout_minutes elapsed):
|
||||||
|
→ treat as gate_rejected
|
||||||
|
→ notify Andrew: "Gate timed out, re-invoking tier"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Review Gate
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 completes integration
|
||||||
|
→ vcs_adapter.create_pr(
|
||||||
|
title="[agent-teams] <run_id>: <goal summary>",
|
||||||
|
body="<workplan + workstream summaries>",
|
||||||
|
head="integration/<run_id>",
|
||||||
|
base="main"
|
||||||
|
)
|
||||||
|
→ notify_adapter.send(
|
||||||
|
"Run <run_id> complete. PR ready for review: <pr_url>",
|
||||||
|
context={run_id, goal, workstreams, pr_url}
|
||||||
|
)
|
||||||
|
→ blackboard: update run status → "review"
|
||||||
|
→ halt — no auto-merge
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Build Order
|
||||||
|
|
||||||
|
1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
|
||||||
|
2. `config/role_registry.yaml` — map tier+domain → agent personality files
|
||||||
|
3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
|
||||||
|
4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
|
||||||
|
5. `adapters/base/*` — all four abstract interfaces
|
||||||
|
6. `adapters/llm/anthropic.py` — first LLM implementation
|
||||||
|
7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
|
||||||
|
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
|
||||||
|
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
|
||||||
|
10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
|
||||||
|
11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
|
||||||
|
12. `prompts/` — fallback tier prompts (used when no agent_personality set)
|
||||||
|
13. `adapters/vcs/github.py` — PR creation + branch management
|
||||||
|
14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
|
||||||
|
15. `config/team.yaml` — example config with full visibility block
|
||||||
|
16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of Scope (Phase 2)
|
||||||
|
|
||||||
|
- Cost accounting per tier + run rollup
|
||||||
|
- Parallel workstream progress dashboard
|
||||||
|
- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
|
||||||
|
- Persistent standing teams
|
||||||
|
- Web UI for run monitoring
|
||||||
681
docs/design.md
Normal file
681
docs/design.md
Normal file
@@ -0,0 +1,681 @@
|
|||||||
|
# Tiered Agent Team System — Design Document
|
||||||
|
|
||||||
|
_Started: 2026-03-14. Last updated: 2026-03-30._
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Design Decisions (formerly Open Questions)
|
||||||
|
|
||||||
|
All eight open questions resolved 2026-03-30. Details in Decisions Log.
|
||||||
|
|
||||||
|
1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
|
||||||
|
|
||||||
|
2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
|
||||||
|
|
||||||
|
3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
|
||||||
|
|
||||||
|
4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
|
||||||
|
|
||||||
|
5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
|
||||||
|
|
||||||
|
6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
|
||||||
|
|
||||||
|
7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
|
||||||
|
|
||||||
|
8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Principles
|
||||||
|
|
||||||
|
**1. Tiers represent cognitive modes, not org chart levels.**
|
||||||
|
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
|
||||||
|
|
||||||
|
**2. Depth is proportional to complexity.**
|
||||||
|
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
|
||||||
|
|
||||||
|
**3. Goal anchoring at every level.**
|
||||||
|
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
|
||||||
|
|
||||||
|
**4. Artifacts, not summaries.**
|
||||||
|
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
|
||||||
|
|
||||||
|
**5. Verification is mandatory.**
|
||||||
|
T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
|
||||||
|
|
||||||
|
**6. Provider agnostic.**
|
||||||
|
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
|
||||||
|
|
||||||
|
**7. Specialist talent pool.**
|
||||||
|
Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tier Definitions
|
||||||
|
|
||||||
|
| Tier | Role | Owns | Capability Level |
|
||||||
|
|------|------|------|-----------------|
|
||||||
|
| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
|
||||||
|
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
|
||||||
|
| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
|
||||||
|
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
|
||||||
|
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
|
||||||
|
|
||||||
|
T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
|
||||||
|
|
||||||
|
Capability levels map to actual models per provider in config — the core system never references a specific model name.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dispatch Model
|
||||||
|
|
||||||
|
### T1 Owns the Plan
|
||||||
|
|
||||||
|
T1 is not just a decomposer — it is the dispatch planner. Its output declares:
|
||||||
|
|
||||||
|
- **Workstreams** — the decomposed units of work
|
||||||
|
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
|
||||||
|
- **Parallelism** — which workstreams are independent and can run concurrently
|
||||||
|
|
||||||
|
T1 does not prescribe how each tier operates internally. That is the tier's own concern.
|
||||||
|
|
||||||
|
### T1 Lifecycle — Two Explicit Phases
|
||||||
|
|
||||||
|
T1 is invoked twice per run, each with a distinct prompt and purpose:
|
||||||
|
|
||||||
|
**Phase 1 — Plan:**
|
||||||
|
1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
|
||||||
|
2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
|
||||||
|
3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
|
||||||
|
|
||||||
|
**Phase 2 — Accept:**
|
||||||
|
After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
|
||||||
|
|
||||||
|
Both phases are named explicitly in the task brief schema and tracked on the blackboard.
|
||||||
|
|
||||||
|
### Each Tier Owns the Layer Below
|
||||||
|
|
||||||
|
Control flow is distributed, not centralised:
|
||||||
|
|
||||||
|
- T1 manages its T2s
|
||||||
|
- T2 Lead manages T2 specialists and their domain boundaries
|
||||||
|
- T2 specialists each own their T3s
|
||||||
|
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
|
||||||
|
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
|
||||||
|
|
||||||
|
This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
|
||||||
|
|
||||||
|
**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
|
||||||
|
|
||||||
|
### Dynamic Paths
|
||||||
|
|
||||||
|
Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Orchestration Patterns Per Tier
|
||||||
|
|
||||||
|
Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
|
||||||
|
|
||||||
|
| Tier | Pattern | Rationale |
|
||||||
|
|------|---------|-----------|
|
||||||
|
| T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
|
||||||
|
| T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
|
||||||
|
| T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
|
||||||
|
| T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
|
||||||
|
| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
|
||||||
|
| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
|
||||||
|
|
||||||
|
### T2 Flow in Detail
|
||||||
|
|
||||||
|
1. T1 spawns **T2 Lead Architect** with goal + workstream context
|
||||||
|
2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
|
||||||
|
3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
|
||||||
|
4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
|
||||||
|
5. Specialists work in parallel, each within their defined domain
|
||||||
|
6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
|
||||||
|
7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
|
||||||
|
8. T1 (Accept phase) validates canonical architecture against goal anchor
|
||||||
|
9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Horizontal Scaling Within Tiers
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 — Phase 1: Plan (self-critique → Andrew approval)
|
||||||
|
│
|
||||||
|
├── T2: Lead Architect (boundaries + shared assumptions first)
|
||||||
|
│ ├── T2: Backend Architect ─┐
|
||||||
|
│ ├── T2: Frontend Architect ├─ parallel, within defined domains
|
||||||
|
│ └── T2: Infra Architect ─┘
|
||||||
|
│ │
|
||||||
|
│ └── (Lead synthesises → conflict resolution if needed → canonical architecture)
|
||||||
|
│
|
||||||
|
├── T2 Backend Architect owns:
|
||||||
|
│ ├── T3: API Squad Lead ─┐
|
||||||
|
│ └── T3: DB Squad Lead ─┴─ light mesh within domain
|
||||||
|
│ ├── T4: Worker A ─┐
|
||||||
|
│ ├── T4: Worker B ─┼─ swarm / pipeline (T3 decides)
|
||||||
|
│ └── T4: Worker C ─┘
|
||||||
|
│ └── T5: Verifier(s) — fan-out + consensus
|
||||||
|
│
|
||||||
|
└── T1 — Phase 2: Accept (validates against goal anchor → PR)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Use Case Flows
|
||||||
|
|
||||||
|
T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
|
||||||
|
|
||||||
|
### Full Stack — T1→T2→T3→T4→T5
|
||||||
|
*Complex feature, new product, cross-domain changes*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess complexity (high)
|
||||||
|
→ output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
|
||||||
|
→ self-critique pass
|
||||||
|
→ GATE: surface to Andrew ← approval required
|
||||||
|
|
||||||
|
T2 Lead (spawned by runner after approval)
|
||||||
|
→ receive: goal + full workplan
|
||||||
|
→ publish: domain boundaries + shared assumptions doc → blackboard
|
||||||
|
→ GATE (optional): review boundaries before specialists spawn
|
||||||
|
|
||||||
|
T2 Specialists (parallel fan-out, wait on Lead)
|
||||||
|
→ each receives: their domain boundary + shared assumptions
|
||||||
|
→ produce: architecture proposal for their slice
|
||||||
|
→ Lead synthesises, drives conflict resolution if needed
|
||||||
|
→ Lead writes: canonical architecture → blackboard
|
||||||
|
→ GATE (recommended): review architecture before implementation
|
||||||
|
|
||||||
|
Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
|
||||||
|
|
||||||
|
T3s (light mesh within T2 domain)
|
||||||
|
→ write draft task lists to blackboard
|
||||||
|
→ read peers' lists, reconcile boundaries
|
||||||
|
→ commit merged task plan before T4 dispatch
|
||||||
|
→ GATE (optional): review task breakdown
|
||||||
|
|
||||||
|
T4s
|
||||||
|
→ swarm: independent tasks run in parallel
|
||||||
|
→ pipeline: T4-A output feeds T4-B (T3 declares dependencies)
|
||||||
|
→ commit to feature branches
|
||||||
|
|
||||||
|
T5s (fan-out per T4 slice)
|
||||||
|
→ each reviews its slice independently
|
||||||
|
→ T3 collects results → joint verdict
|
||||||
|
→ GATE (optional): review T5 verdict before T3 marks done
|
||||||
|
→ partial: T3 retries only failed slices
|
||||||
|
→ pass: T3 signals workstream done to T2
|
||||||
|
|
||||||
|
T2 specialists → signal T2 Lead
|
||||||
|
T2 Lead → writes integration summary → blackboard
|
||||||
|
|
||||||
|
T1 Accept
|
||||||
|
→ validate against goal anchor
|
||||||
|
→ open PR, notify_adapter.send(pr summary + url)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Medium Complexity — T1→T3→T4→T5
|
||||||
|
*Config change, isolated bug fix — T1 determines no cross-domain design needed*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: contained scope, single domain, no T2 architecture needed
|
||||||
|
→ workplan: tier paths [T3, T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T3s spawned directly by runner
|
||||||
|
→ receives T1 brief with task context (no T2 architecture layer)
|
||||||
|
→ T3 light mesh → T4 dispatch → T5 verify → signal done
|
||||||
|
|
||||||
|
T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
### Simple / Hotfix — T1→T4→T5
|
||||||
|
*Single file, single function, trivial atomic task*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: trivial, single workstream
|
||||||
|
→ tier path: [T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T4 (coding agent)
|
||||||
|
→ single atomic task, commits
|
||||||
|
|
||||||
|
T5 (single verifier, not full fan-out)
|
||||||
|
→ code review + correctness check
|
||||||
|
→ pass → T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Mechanics
|
||||||
|
|
||||||
|
### T3 Mesh via Blackboard
|
||||||
|
|
||||||
|
T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
|
||||||
|
|
||||||
|
1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
|
||||||
|
2. Each T3 reads all sibling T3 draft lists in its T2 domain
|
||||||
|
3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
|
||||||
|
4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
|
||||||
|
5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
|
||||||
|
|
||||||
|
The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T1 Plan Output Schema
|
||||||
|
|
||||||
|
T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"run_id": "uuid",
|
||||||
|
"goal_anchor": "Original goal — immutable, propagated to every downstream brief",
|
||||||
|
"complexity": "high | medium | low",
|
||||||
|
"retry_budget_multiplier": 2,
|
||||||
|
"workstreams": [
|
||||||
|
{
|
||||||
|
"id": "ws-backend-api",
|
||||||
|
"name": "Backend API",
|
||||||
|
"domain": "backend",
|
||||||
|
"tier_path": ["t2", "t3", "t4", "t5"],
|
||||||
|
"parallel_group": "A",
|
||||||
|
"t2_specialist": "agents/engineering/engineering-software-architect.md",
|
||||||
|
"notes": "Focus on webhook ingest and retry queue"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"parallelism": {
|
||||||
|
"groups": {
|
||||||
|
"A": ["ws-backend-api", "ws-frontend"],
|
||||||
|
"B": ["ws-infra"]
|
||||||
|
},
|
||||||
|
"sequence": ["A", "B"]
|
||||||
|
},
|
||||||
|
"self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T5 Consensus & Verdict Schema
|
||||||
|
|
||||||
|
T3 aggregates all T5 results into a joint verdict after fan-out completes.
|
||||||
|
|
||||||
|
**Individual T5 result:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"verifier_id": "uuid",
|
||||||
|
"scope": "queue-client",
|
||||||
|
"verdict": "pass | fail",
|
||||||
|
"issues": ["issue description..."],
|
||||||
|
"notes": "human-readable summary"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**T3 joint verdict (written to blackboard):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"t5_results": [...],
|
||||||
|
"joint_verdict": "pass | partial | fail",
|
||||||
|
"failed_scopes": ["queue-client"],
|
||||||
|
"summary": "Human-readable summary for gate surface and logs"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Split verdict handling:**
|
||||||
|
- `pass` → T3 marks workstream done, signals T2
|
||||||
|
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
|
||||||
|
- `fail` → T3 escalates to T2 (or T1 if shallow path)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Spawn Call Ownership
|
||||||
|
|
||||||
|
The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
|
||||||
|
2. Runner's spawn loop detects pending rows
|
||||||
|
3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
|
||||||
|
4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
|
||||||
|
5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
|
||||||
|
|
||||||
|
This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Gate Approval UX
|
||||||
|
|
||||||
|
**Core mechanic (platform-agnostic):**
|
||||||
|
|
||||||
|
1. Runner writes `gate_pending` to blackboard
|
||||||
|
2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
|
||||||
|
3. Runner polls blackboard for `gate_approved` or `gate_rejected`
|
||||||
|
4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
|
||||||
|
|
||||||
|
Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
|
||||||
|
|
||||||
|
**Adapter responsibility:**
|
||||||
|
Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
|
||||||
|
|
||||||
|
Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T3 Mesh Timeout
|
||||||
|
|
||||||
|
If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
|
||||||
|
|
||||||
|
1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
|
||||||
|
|
||||||
|
2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
|
||||||
|
|
||||||
|
Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Path Amendment Mechanism
|
||||||
|
|
||||||
|
When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
|
||||||
|
|
||||||
|
1. The discovering tier writes a `path_amendment` event to the blackboard:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"kind": "path_amendment",
|
||||||
|
"proposed_by": "t3/ws-backend-api",
|
||||||
|
"reason": "Discovered auth dependency requires T2 architectural pass",
|
||||||
|
"amendment": {
|
||||||
|
"workstream": "ws-backend-api",
|
||||||
|
"add_tiers": ["t2"],
|
||||||
|
"insert_before": "t3"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
|
||||||
|
3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
|
||||||
|
4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
|
||||||
|
|
||||||
|
No agent needs callback plumbing. The runner is the notification bridge.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Shared State
|
||||||
|
|
||||||
|
For software pipelines, **the repo is the primary blackboard**:
|
||||||
|
- T4 workers commit to feature branches
|
||||||
|
- T3 leads review and merge to workstream branches
|
||||||
|
- T2 architects own integration branches
|
||||||
|
- T1 does final integration and acceptance
|
||||||
|
|
||||||
|
Supplemented by a SQLite coordination store per run tracking:
|
||||||
|
- In-flight workstreams and their current execution plans
|
||||||
|
- Handoff artifacts and tier status
|
||||||
|
- Retry counts and escalation history
|
||||||
|
- Path amendments (proposed, by whom, timestamp)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failure Handling
|
||||||
|
|
||||||
|
Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
|
||||||
|
|
||||||
|
| Failure | Owner | Handler | Action |
|
||||||
|
|---------|-------|---------|--------|
|
||||||
|
| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
|
||||||
|
| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
|
||||||
|
| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
|
||||||
|
| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
|
||||||
|
| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
|
||||||
|
| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
|
||||||
|
| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
|
||||||
|
| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
|
||||||
|
| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
|
||||||
|
|
||||||
|
**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
|
||||||
|
|
||||||
|
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
||||||
|
|
||||||
|
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Talent Pool
|
||||||
|
|
||||||
|
The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
|
||||||
|
|
||||||
|
**Division of responsibility:**
|
||||||
|
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
|
||||||
|
- Agency-agents provides: the specialist knowledge each agent brings to its role
|
||||||
|
|
||||||
|
T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
|
||||||
|
|
||||||
|
**Default tier-to-specialist mapping for software pipelines:**
|
||||||
|
|
||||||
|
| Tier | Domain | Agent |
|
||||||
|
|------|--------|-------|
|
||||||
|
| T1 | Strategy | nexus-strategy |
|
||||||
|
| T2 | Backend | software-architect |
|
||||||
|
| T2 | Infra | devops-automator |
|
||||||
|
| T2 | Data | data-engineer |
|
||||||
|
| T3 | Backend | senior-developer |
|
||||||
|
| T3 | Reliability | sre |
|
||||||
|
| T4 | Frontend | frontend-developer |
|
||||||
|
| T4 | Backend | backend-architect |
|
||||||
|
| T4 | Database | database-optimizer |
|
||||||
|
| T4 | DevOps | devops-automator |
|
||||||
|
| T4 | Mobile | mobile-app-builder |
|
||||||
|
| T4 | AI/ML | ai-engineer |
|
||||||
|
| T4 | Security | security-engineer |
|
||||||
|
| T4 | Docs | technical-writer |
|
||||||
|
| T5 | Code review | code-reviewer |
|
||||||
|
| T5 | Integration | testing-reality-checker |
|
||||||
|
| T5 | API | testing-api-tester |
|
||||||
|
| T5 | Performance | testing-performance-benchmarker |
|
||||||
|
| T5 | Security | security-engineer |
|
||||||
|
|
||||||
|
The roster is not fixed — T1 can select any agent from the library based on workstream needs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adapter Layers
|
||||||
|
|
||||||
|
Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
|
||||||
|
|
||||||
|
```
|
||||||
|
Core (platform-agnostic)
|
||||||
|
├── team_runner — thin bootstrap: spawn T1, monitor blackboard, handle result
|
||||||
|
├── blackboard — SQLite coordination state
|
||||||
|
├── task_brief — schema + validation
|
||||||
|
└── escalation — retry logic, failure routing
|
||||||
|
|
||||||
|
Adapters (swappable)
|
||||||
|
├── llm/ — anthropic (now), openai, ollama, any API
|
||||||
|
├── notify/ — openclaw (now), slack, email, webhook...
|
||||||
|
├── vcs/ — github (now), gitlab, gitea, bare git...
|
||||||
|
└── runtime/
|
||||||
|
├── standard — openclaw sessions_spawn (T1/T2/T3)
|
||||||
|
└── coding_agent — claude_code (T4/T5 default), codex, aider...
|
||||||
|
```
|
||||||
|
|
||||||
|
Swapping providers means writing a new adapter file — nothing in core changes.
|
||||||
|
|
||||||
|
T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Run Visibility Layer
|
||||||
|
|
||||||
|
Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
|
||||||
|
|
||||||
|
### 1. Human-Readable Live Log
|
||||||
|
|
||||||
|
Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
|
||||||
|
|
||||||
|
```
|
||||||
|
[abc123] 12:30:01 T1 PLAN_START Assessing scope: "Build webhook ingestion system"
|
||||||
|
[abc123] 12:30:14 T1 PLAN_DONE 3 workstreams — backend-api, infra, docs (2 parallel)
|
||||||
|
[abc123] 12:30:14 GATE APPROVAL ⏸ Waiting on approval before T2 spawns
|
||||||
|
[abc123] 12:31:02 GATE APPROVED ✓ Approved — continuing
|
||||||
|
[abc123] 12:31:03 T2 LEAD_START Lead Architect spawned
|
||||||
|
[abc123] 12:31:41 T2 BOUNDS_READY Domain boundaries + shared assumptions published
|
||||||
|
[abc123] 12:31:42 T2 SPEC_START 3 specialists spawned (parallel): backend, infra, docs
|
||||||
|
[abc123] 12:32:15 T2 SPEC_DONE backend-api architecture draft ready
|
||||||
|
[abc123] 12:32:58 T2 SYNTH_DONE Canonical architecture written to blackboard
|
||||||
|
[abc123] 12:32:58 GATE INSPECTION ⏸ T2 synthesis ready for review
|
||||||
|
[abc123] 12:33:44 T3 MESH_START backend-api: 2 squad leads negotiating task boundaries
|
||||||
|
[abc123] 12:34:01 T3 MESH_DONE Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
|
||||||
|
[abc123] 12:34:02 T4 SWARM_START 5 workers spawned in parallel
|
||||||
|
[abc123] 12:35:10 T4 DONE worker-3 auth-middleware ✓
|
||||||
|
[abc123] 12:35:22 T4 FAIL worker-4 queue-client ✗ (retry 1/3)
|
||||||
|
[abc123] 12:36:04 T4 DONE worker-4 queue-client ✓ (retry resolved)
|
||||||
|
[abc123] 12:36:05 T5 VERIFY_START 4 verifiers spawned
|
||||||
|
[abc123] 12:36:45 T5 VERDICT partial — queue-client needs rework
|
||||||
|
[abc123] 12:37:12 T5 VERDICT ✓ all pass — workstream backend-api done
|
||||||
|
```
|
||||||
|
|
||||||
|
Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
|
||||||
|
|
||||||
|
### 2. Inspection Gates
|
||||||
|
|
||||||
|
Configurable pause points. When the runner hits a gate, it:
|
||||||
|
1. Writes a `gate_pending` event to the blackboard
|
||||||
|
2. Fires `notify_adapter.send()` with the tier summary + gate context
|
||||||
|
3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
|
||||||
|
|
||||||
|
The tier summary surfaced at each gate includes:
|
||||||
|
- **What was produced** (the tier artifact in readable form)
|
||||||
|
- **What happens next** (which agents will spawn, doing what)
|
||||||
|
- **Any anomalies** flagged by the tier itself
|
||||||
|
|
||||||
|
Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
visibility:
|
||||||
|
strict_mode: false
|
||||||
|
log_level: normal # normal | verbose
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no response within this window
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Inspection CLI — `cli/agency.py`
|
||||||
|
|
||||||
|
```
|
||||||
|
agency run <config.yaml> # start a run, returns run_id
|
||||||
|
agency watch <run_id> # tail live log (follows blackboard events)
|
||||||
|
agency inspect <run_id> # interactive tree view of run state
|
||||||
|
agency inspect <run_id> --tier t2 # jump to T2 artifacts
|
||||||
|
agency inspect <run_id> --brief <id> # show full brief + result JSON
|
||||||
|
|
||||||
|
agency approve <run_id> # approve current gate → continue
|
||||||
|
agency approve <run_id> --note "..." # approve with a note written to blackboard
|
||||||
|
agency reject <run_id> --reason "..." # reject → tier re-invoked
|
||||||
|
agency pause <run_id> # force-pause at next tier boundary
|
||||||
|
agency resume <run_id> # release a manual pause
|
||||||
|
```
|
||||||
|
|
||||||
|
`agency inspect` (no flags) renders a live tree:
|
||||||
|
```
|
||||||
|
Run abc123 — "Build webhook ingestion system"
|
||||||
|
├── T1 Plan ✓
|
||||||
|
│ └── [view workplan]
|
||||||
|
├── T2 Architecture ✓ [GATE: pending review]
|
||||||
|
│ ├── [view domain boundaries]
|
||||||
|
│ ├── [view shared assumptions]
|
||||||
|
│ └── [view canonical architecture]
|
||||||
|
├── T3 backend-api (active)
|
||||||
|
│ ├── [view task breakdown]
|
||||||
|
│ └── T4 workers: 3/7 done, 1 retrying, 3 pending
|
||||||
|
└── T3 infra (pending)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Blackboard Event Vocabulary (extended)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# existing
|
||||||
|
"spawned" | "completed" | "failed" | "escalated" | "retried"
|
||||||
|
|
||||||
|
# new — visibility layer
|
||||||
|
"gate_pending" # runner hit a gate, waiting for human
|
||||||
|
"gate_approved" # human approved, run continues
|
||||||
|
"gate_rejected" # human rejected, tier re-invoked
|
||||||
|
"gate_paused" # manual pause via CLI
|
||||||
|
"gate_resumed" # manual resume via CLI
|
||||||
|
"path_amendment" # mid-run tier proposed path change
|
||||||
|
"log" # human-readable log line (level + message)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decisions Log
|
||||||
|
|
||||||
|
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
||||||
|
|
||||||
|
**T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
|
||||||
|
|
||||||
|
**T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
|
||||||
|
|
||||||
|
**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
|
||||||
|
|
||||||
|
**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
|
||||||
|
|
||||||
|
**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
|
||||||
|
|
||||||
|
**T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
|
||||||
|
|
||||||
|
**T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
|
||||||
|
|
||||||
|
**T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
|
||||||
|
|
||||||
|
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
|
||||||
|
|
||||||
|
**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
|
||||||
|
|
||||||
|
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
||||||
|
|
||||||
|
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
|
||||||
|
|
||||||
|
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
|
||||||
|
|
||||||
|
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
||||||
|
|
||||||
|
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|
||||||
|
|
||||||
|
**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
|
||||||
|
|
||||||
|
**Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
|
||||||
|
|
||||||
|
**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
|
||||||
|
|
||||||
|
**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
|
||||||
|
|
||||||
|
**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
|
||||||
|
|
||||||
|
**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
|
||||||
|
|
||||||
|
**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
|
||||||
|
|
||||||
|
**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
|
||||||
|
|
||||||
|
**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
|
||||||
@@ -10,6 +10,9 @@ pyyaml
|
|||||||
# Environment variable management
|
# Environment variable management
|
||||||
python-dotenv
|
python-dotenv
|
||||||
|
|
||||||
|
# GitHub VCS adapter
|
||||||
|
PyGithub
|
||||||
|
|
||||||
# --- stdlib-only (no pip install needed) ---
|
# --- stdlib-only (no pip install needed) ---
|
||||||
# sqlite3 — blackboard persistence
|
# sqlite3 — blackboard persistence
|
||||||
# dataclasses — task_brief schema
|
# dataclasses — task_brief schema
|
||||||
|
|||||||
Reference in New Issue
Block a user