Compare commits
15 Commits
71316b3090
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 342832fa5e | |||
| 641f122cdb | |||
| 54afa0f53f | |||
| f228061c4d | |||
| 1c99e40f98 | |||
| 8f143e779d | |||
| a721db63f6 | |||
| 882b769d21 | |||
| ce3c020de2 | |||
| b54436f474 | |||
| 1ed7023c08 | |||
| 9efbb3b010 | |||
| 72bd744664 | |||
| 084cfb0bb2 | |||
| ce1ce85b87 |
2
.gitmodules
vendored
2
.gitmodules
vendored
@@ -1,3 +1,3 @@
|
|||||||
[submodule "agents"]
|
[submodule "agents"]
|
||||||
path = agents
|
path = agents
|
||||||
url = https://github.com/coding-with-hans-heinemann/agency-agents.git
|
url = https://git.tandrewng.com/cw-hans/agency-agents.git
|
||||||
|
|||||||
48
CLAUDE.md
Normal file
48
CLAUDE.md
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
# CLAUDE.md — Agent Quick Reference
|
||||||
|
|
||||||
|
Read this before exploring the codebase. It saves tokens.
|
||||||
|
|
||||||
|
## What This Is
|
||||||
|
|
||||||
|
A tiered multi-agent orchestration framework. T1 decomposes goals → T2 architects → T3 leads → T4 implements → T5 verifies. SQLite blackboard tracks state. All external dependencies (LLM, VCS, notify, runtime) are pluggable adapters.
|
||||||
|
|
||||||
|
## Key Docs
|
||||||
|
|
||||||
|
- `docs/design.md` — architecture decisions, tier design, key choices
|
||||||
|
- `docs/buildspec.md` — 15-step build order, phase breakdown
|
||||||
|
|
||||||
|
## Project Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
core/ — task_brief.py, blackboard.py, escalation.py, team_runner.py
|
||||||
|
adapters/base/ — abstract base classes (LLMAdapter, VCSAdapter, NotifyAdapter, RuntimeAdapter)
|
||||||
|
adapters/llm/ — anthropic.py
|
||||||
|
adapters/vcs/ — github.py
|
||||||
|
adapters/notify/— openclaw.py
|
||||||
|
adapters/runtime— openclaw.py, claude_code.py
|
||||||
|
prompts/ — T1–T5 system prompt .md files
|
||||||
|
config/ — team.yaml (run config), role_registry.yaml (tier→role→persona)
|
||||||
|
agents/ — git submodule, agent persona .md files
|
||||||
|
runs/ — per-run blackboard.db files (gitignored)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
- **Never commit or push directly to `main`** — always branch (`hans/...` or `feature/...`) and PR
|
||||||
|
- New adapters: subclass the relevant `adapters/base/*.py` abstract class
|
||||||
|
- New roles: add persona `.md` to `agents/` submodule + entry in `config/role_registry.yaml`
|
||||||
|
- Failure handling lives in `core/escalation.py` — extend `FailureType` there
|
||||||
|
- `TaskBrief` is the canonical work unit — all tiers pass briefs to each other
|
||||||
|
- Blackboard is the single source of truth per run — always write events there
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
Phase 2 adapter implementations exist. `core/team_runner.py` may still have stubs — check before assuming it's wired up end-to-end.
|
||||||
|
|
||||||
|
## Running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv .venv && source .venv/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
python -m core.team_runner --config config/team.yaml
|
||||||
|
```
|
||||||
@@ -10,8 +10,6 @@ from __future__ import annotations
|
|||||||
|
|
||||||
import os
|
import os
|
||||||
|
|
||||||
import anthropic
|
|
||||||
|
|
||||||
from adapters.base.llm import LLMAdapter
|
from adapters.base.llm import LLMAdapter
|
||||||
|
|
||||||
|
|
||||||
@@ -57,6 +55,14 @@ class AnthropicAdapter(LLMAdapter):
|
|||||||
ValueError
|
ValueError
|
||||||
If ANTHROPIC_API_KEY is not set in the environment.
|
If ANTHROPIC_API_KEY is not set in the environment.
|
||||||
"""
|
"""
|
||||||
|
try:
|
||||||
|
import anthropic as _anthropic
|
||||||
|
except ModuleNotFoundError as exc:
|
||||||
|
raise ImportError(
|
||||||
|
"The 'anthropic' package is required for AnthropicAdapter. "
|
||||||
|
"Install it with: pip install anthropic"
|
||||||
|
) from exc
|
||||||
|
|
||||||
self._config = config
|
self._config = config
|
||||||
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
||||||
if not api_key:
|
if not api_key:
|
||||||
@@ -64,7 +70,7 @@ class AnthropicAdapter(LLMAdapter):
|
|||||||
"ANTHROPIC_API_KEY environment variable is not set. "
|
"ANTHROPIC_API_KEY environment variable is not set. "
|
||||||
"Export it before running the-agency."
|
"Export it before running the-agency."
|
||||||
)
|
)
|
||||||
self._client = anthropic.Anthropic(api_key=api_key)
|
self._client = _anthropic.Anthropic(api_key=api_key)
|
||||||
self._models_cfg: dict = config.get("models", {})
|
self._models_cfg: dict = config.get("models", {})
|
||||||
self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
|
self._default_max_tokens: int = self._models_cfg.get("default_max_tokens", 4096)
|
||||||
self._default_temperature: float = self._models_cfg.get("default_temperature", 0)
|
self._default_temperature: float = self._models_cfg.get("default_temperature", 0)
|
||||||
|
|||||||
2
agents
2
agents
Submodule agents updated: 5c669c28e6...5f1204a023
@@ -2,28 +2,40 @@ t1:
|
|||||||
default: agents/strategy/nexus-strategy.md
|
default: agents/strategy/nexus-strategy.md
|
||||||
|
|
||||||
t2:
|
t2:
|
||||||
backend: agents/engineering/engineering-software-architect.md
|
backend: agents/engineering/engineering-backend-architect.md
|
||||||
frontend: agents/engineering/engineering-software-architect.md
|
frontend: agents/engineering/engineering-frontend-architect.md
|
||||||
infra: agents/engineering/engineering-devops-automator.md
|
infra: agents/engineering/engineering-devops-automator.md
|
||||||
data: agents/engineering/engineering-data-engineer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
ai: agents/engineering/engineering-software-architect.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
mobile: agents/engineering/engineering-software-architect.md
|
||||||
default: agents/engineering/engineering-software-architect.md
|
default: agents/engineering/engineering-software-architect.md
|
||||||
|
|
||||||
t3:
|
t3:
|
||||||
backend: agents/engineering/engineering-senior-developer.md
|
backend: agents/engineering/engineering-senior-backend-developer.md
|
||||||
frontend: agents/engineering/engineering-senior-developer.md
|
frontend: agents/engineering/engineering-senior-frontend-developer.md
|
||||||
infra: agents/engineering/engineering-sre.md
|
infra: agents/engineering/engineering-sre.md
|
||||||
default: agents/engineering/engineering-senior-developer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
|
devops: agents/engineering/engineering-sre.md
|
||||||
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
|
default: agents/engineering/engineering-backend-developer.md
|
||||||
|
|
||||||
t4:
|
t4:
|
||||||
frontend: agents/engineering/engineering-frontend-developer.md
|
frontend: agents/engineering/engineering-frontend-developer.md
|
||||||
backend: agents/engineering/engineering-backend-architect.md
|
backend: agents/engineering/engineering-backend-developer.md
|
||||||
database: agents/engineering/engineering-database-optimizer.md
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
devops: agents/engineering/engineering-devops-automator.md
|
devops: agents/engineering/engineering-devops-automator.md
|
||||||
mobile: agents/engineering/engineering-mobile-app-builder.md
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
ai: agents/engineering/engineering-ai-engineer.md
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
security: agents/engineering/engineering-security-engineer.md
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
docs: agents/engineering/engineering-technical-writer.md
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
default: agents/engineering/engineering-senior-developer.md
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
embedded: agents/engineering/engineering-embedded-firmware-engineer.md
|
||||||
|
default: agents/engineering/engineering-backend-developer.md
|
||||||
|
|
||||||
t5:
|
t5:
|
||||||
code: agents/engineering/engineering-code-reviewer.md
|
code: agents/engineering/engineering-code-reviewer.md
|
||||||
@@ -31,4 +43,8 @@ t5:
|
|||||||
api: agents/testing/testing-api-tester.md
|
api: agents/testing/testing-api-tester.md
|
||||||
performance: agents/testing/testing-performance-benchmarker.md
|
performance: agents/testing/testing-performance-benchmarker.md
|
||||||
security: agents/engineering/engineering-security-engineer.md
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
accessibility: agents/testing/testing-accessibility-auditor.md
|
||||||
|
e2e: agents/testing/testing-evidence-collector.md
|
||||||
|
frontend: agents/testing/testing-accessibility-auditor.md
|
||||||
|
data: agents/testing/testing-reality-checker.md
|
||||||
default: agents/engineering/engineering-code-reviewer.md
|
default: agents/engineering/engineering-code-reviewer.md
|
||||||
|
|||||||
@@ -10,8 +10,7 @@ adapters:
|
|||||||
runtime: openclaw
|
runtime: openclaw
|
||||||
|
|
||||||
models:
|
models:
|
||||||
default_max_tokens: 4096
|
provider: anthropic
|
||||||
default_temperature: 0
|
|
||||||
capability_map:
|
capability_map:
|
||||||
reasoning-heavy:
|
reasoning-heavy:
|
||||||
anthropic: claude-opus-4-6
|
anthropic: claude-opus-4-6
|
||||||
|
|||||||
@@ -1,784 +1,99 @@
|
|||||||
"""
|
"""
|
||||||
core/team_runner.py
|
core/team_runner.py
|
||||||
Top-level orchestration entry point for the-agency pipeline.
|
Top-level orchestration entry point — Phase 2 stub.
|
||||||
|
|
||||||
The TeamRunner loads team.yaml, builds the adapter registry, and drives the
|
The TeamRunner is responsible for:
|
||||||
full T1 → T2 → T3 → T4 → T5 dispatch loop with escalation handling.
|
1. Loading config/team.yaml and config/role_registry.yaml.
|
||||||
|
2. Instantiating the correct adapter implementations (LLM, VCS, notify, runtime).
|
||||||
|
3. Creating a Blackboard for the run.
|
||||||
|
4. Constructing the root T1 TaskBrief and dispatching it to the T1 Visionary.
|
||||||
|
5. Recursively spawning T2→T5 briefs based on tier outputs.
|
||||||
|
6. Using EscalationHandler to manage retries, salvage, and escalation.
|
||||||
|
7. Writing final run status and summary to the Blackboard.
|
||||||
|
|
||||||
CLI usage::
|
TODO (Phase 2):
|
||||||
|
- Load and validate team.yaml configuration.
|
||||||
python -m core.team_runner --config config/team.yaml [--dry-run] [--verbose]
|
- Build adapter registry (map adapter keys → concrete adapter classes).
|
||||||
|
- Implement tier dispatch loop: T1 → T2 (per workstream) → T3 → T4 → T5.
|
||||||
|
- Parse tier JSON outputs into child TaskBrief objects via make_child_brief().
|
||||||
|
- Integrate EscalationHandler into the dispatch loop.
|
||||||
|
- Support --dry-run flag (log actions without executing).
|
||||||
|
- Emit blackboard events at each stage (spawned, completed, failed, etc.).
|
||||||
|
- Expose a CLI entry point (argparse or click).
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import argparse
|
# TODO (Phase 2): Uncomment and implement imports as adapters are built.
|
||||||
import json
|
# import argparse
|
||||||
import logging
|
# import yaml
|
||||||
import os
|
# from core.task_brief import TaskBrief
|
||||||
import re
|
# from core.blackboard import Blackboard
|
||||||
import uuid
|
# from core.escalation import EscalationHandler
|
||||||
from typing import Optional
|
# from adapters.llm.anthropic import AnthropicAdapter
|
||||||
|
# from adapters.vcs.github import GitHubAdapter
|
||||||
|
# from adapters.notify.openclaw import OpenClawNotifyAdapter
|
||||||
|
# from adapters.runtime.openclaw import OpenClawRuntimeAdapter
|
||||||
|
# from adapters.runtime.claude_code import ClaudeCodeRuntimeAdapter
|
||||||
|
|
||||||
import yaml
|
|
||||||
|
|
||||||
from core.blackboard import Blackboard
|
|
||||||
from core.escalation import EscalationHandler
|
|
||||||
from core.task_brief import TaskBrief
|
|
||||||
|
|
||||||
import importlib
|
|
||||||
|
|
||||||
from adapters.base.llm import LLMAdapter
|
|
||||||
from adapters.base.notify import NotifyAdapter
|
|
||||||
from adapters.base.runtime import RuntimeAdapter
|
|
||||||
from adapters.base.vcs import VCSAdapter
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Constants
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
# Maps tier number → prompt file path (relative to project root).
|
|
||||||
_TIER_PROMPTS: dict[int, str] = {
|
|
||||||
1: "prompts/t1_visionary.md",
|
|
||||||
2: "prompts/t2_architect.md",
|
|
||||||
3: "prompts/t3_squad_lead.md",
|
|
||||||
4: "prompts/t4_implementer.md",
|
|
||||||
5: "prompts/t5_verifier.md",
|
|
||||||
}
|
|
||||||
|
|
||||||
# Maps tier number → LLM capability hint.
|
|
||||||
_TIER_CAPABILITIES: dict[int, str] = {
|
|
||||||
1: "reasoning-heavy",
|
|
||||||
2: "reasoning-heavy",
|
|
||||||
3: "capable",
|
|
||||||
4: "capable",
|
|
||||||
5: "fast-cheap",
|
|
||||||
}
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Adapter registries
|
|
||||||
#
|
|
||||||
# Values are "module.path:ClassName" strings resolved lazily via importlib.
|
|
||||||
# To add a new adapter, append an entry here — no changes to TeamRunner needed.
|
|
||||||
# team.yaml may also supply a full "module.path:ClassName" value directly,
|
|
||||||
# enabling third-party adapters without touching this file.
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
_LLM_ADAPTERS: dict[str, str] = {
|
|
||||||
"anthropic": "adapters.llm.anthropic:AnthropicAdapter",
|
|
||||||
}
|
|
||||||
_VCS_ADAPTERS: dict[str, str] = {
|
|
||||||
"github": "adapters.vcs.github:GitHubAdapter",
|
|
||||||
}
|
|
||||||
_NOTIFY_ADAPTERS: dict[str, str] = {
|
|
||||||
"openclaw": "adapters.notify.openclaw:OpenClawNotifyAdapter",
|
|
||||||
}
|
|
||||||
_RUNTIME_ADAPTERS: dict[str, str] = {
|
|
||||||
"openclaw": "adapters.runtime.openclaw:OpenClawRuntimeAdapter",
|
|
||||||
"claude_code": "adapters.runtime.claude_code:ClaudeCodeRuntimeAdapter",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def _load_adapter_class(key: str, registry: dict[str, str], label: str) -> type:
|
|
||||||
"""
|
|
||||||
Resolve a short name or dotted "module:ClassName" path to an adapter class.
|
|
||||||
|
|
||||||
Resolution order:
|
|
||||||
1. If *key* is in *registry*, use the mapped dotted path.
|
|
||||||
2. Otherwise, treat *key* itself as a dotted path (custom / third-party).
|
|
||||||
"""
|
|
||||||
dotted = registry.get(key, key)
|
|
||||||
if ":" not in dotted:
|
|
||||||
raise ValueError(
|
|
||||||
f"Unknown {label} adapter {key!r}. "
|
|
||||||
f"Built-in choices: {list(registry)}. "
|
|
||||||
f"Or supply a full 'module.path:ClassName' value in team.yaml."
|
|
||||||
)
|
|
||||||
module_path, class_name = dotted.rsplit(":", 1)
|
|
||||||
try:
|
|
||||||
module = importlib.import_module(module_path)
|
|
||||||
except ModuleNotFoundError as exc:
|
|
||||||
raise ImportError(
|
|
||||||
f"Cannot import {label} adapter module {module_path!r}: {exc}"
|
|
||||||
) from exc
|
|
||||||
try:
|
|
||||||
return getattr(module, class_name)
|
|
||||||
except AttributeError:
|
|
||||||
raise ImportError(
|
|
||||||
f"Module {module_path!r} has no class {class_name!r}"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Exceptions
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
class EscalationError(RuntimeError):
|
|
||||||
"""Raised when a brief escalates past its retry budget with no recovery."""
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# TeamRunner
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
class TeamRunner:
|
class TeamRunner:
|
||||||
"""
|
"""
|
||||||
Orchestrates a full T1→T5 agent pipeline run.
|
Orchestrates a full T1→T5 agent pipeline run.
|
||||||
|
|
||||||
Usage::
|
Usage (Phase 2)::
|
||||||
|
|
||||||
runner = TeamRunner(config_path="config/team.yaml")
|
runner = TeamRunner(config_path="config/team.yaml")
|
||||||
runner.run()
|
runner.run()
|
||||||
|
|
||||||
Dry-run mode logs all planned actions but skips LLM calls, VCS commits,
|
|
||||||
and notifications::
|
|
||||||
|
|
||||||
runner = TeamRunner(config_path="config/team.yaml", dry_run=True)
|
|
||||||
runner.run()
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(
|
def __init__(self, config_path: str = "config/team.yaml") -> None:
|
||||||
self,
|
# TODO (Phase 2): Load YAML config.
|
||||||
config_path: str = "config/team.yaml",
|
# Instantiate adapters based on config.adapters keys.
|
||||||
dry_run: bool = False,
|
# Create a Blackboard for this run.
|
||||||
) -> None:
|
raise NotImplementedError("TeamRunner.__init__ is not yet implemented.")
|
||||||
"""
|
|
||||||
Load configuration and instantiate adapters.
|
|
||||||
|
|
||||||
Parameters
|
|
||||||
----------
|
|
||||||
config_path : Path to team.yaml.
|
|
||||||
dry_run : When True, skip LLM calls, VCS commits, and notifications.
|
|
||||||
All planned actions are logged at INFO level.
|
|
||||||
"""
|
|
||||||
self._dry_run = dry_run
|
|
||||||
|
|
||||||
self._config = self._load_yaml(config_path)
|
|
||||||
self._role_registry = self._load_yaml("config/role_registry.yaml")
|
|
||||||
self._escalation = EscalationHandler()
|
|
||||||
|
|
||||||
run_id = str(uuid.uuid4())
|
|
||||||
self._bb = Blackboard(run_id=run_id)
|
|
||||||
|
|
||||||
# Build adapters — VCS and notify are optional and swallow init errors.
|
|
||||||
adapter_cfg: dict = self._config.get("adapters", {})
|
|
||||||
runtime_cfg: dict = self._config.get("runtime", {})
|
|
||||||
|
|
||||||
self._llm: LLMAdapter = self._build_llm(adapter_cfg.get("llm", "anthropic"))
|
|
||||||
self._vcs: Optional[VCSAdapter] = self._build_optional( # type: ignore[assignment]
|
|
||||||
_VCS_ADAPTERS, adapter_cfg.get("vcs"), "VCS"
|
|
||||||
)
|
|
||||||
self._notify: Optional[NotifyAdapter] = self._build_optional( # type: ignore[assignment]
|
|
||||||
_NOTIFY_ADAPTERS, adapter_cfg.get("notify"), "notify"
|
|
||||||
)
|
|
||||||
self._default_runtime: RuntimeAdapter = self._build_runtime(
|
|
||||||
runtime_cfg.get("default", "openclaw")
|
|
||||||
)
|
|
||||||
self._coding_runtime: RuntimeAdapter = self._build_runtime(
|
|
||||||
runtime_cfg.get("coding_agent", "claude_code")
|
|
||||||
)
|
|
||||||
|
|
||||||
logger.info(
|
|
||||||
"TeamRunner initialised: run_id=%s dry_run=%s", run_id, dry_run
|
|
||||||
)
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Configuration helpers
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _load_yaml(path: str) -> dict:
|
|
||||||
with open(path, "r", encoding="utf-8") as fh:
|
|
||||||
return yaml.safe_load(fh) or {}
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _load_text(path: str) -> str:
|
|
||||||
with open(path, "r", encoding="utf-8") as fh:
|
|
||||||
return fh.read()
|
|
||||||
|
|
||||||
def _build_llm(self, key: str) -> LLMAdapter:
|
|
||||||
cls = _load_adapter_class(key, _LLM_ADAPTERS, "LLM")
|
|
||||||
return cls(self._config)
|
|
||||||
|
|
||||||
def _build_optional(
|
|
||||||
self,
|
|
||||||
registry: dict[str, str],
|
|
||||||
key: Optional[str],
|
|
||||||
label: str,
|
|
||||||
) -> Optional[object]:
|
|
||||||
"""Build an optional adapter, returning None on any init error."""
|
|
||||||
if not key:
|
|
||||||
return None
|
|
||||||
try:
|
|
||||||
cls = _load_adapter_class(key, registry, label)
|
|
||||||
return cls(self._config)
|
|
||||||
except (ImportError, ValueError) as exc:
|
|
||||||
logger.warning("Unknown %s adapter %r — skipping. (%s)", label, key, exc)
|
|
||||||
return None
|
|
||||||
except Exception as exc:
|
|
||||||
logger.warning(
|
|
||||||
"%s adapter %r could not be initialised (%s) — skipping.",
|
|
||||||
label,
|
|
||||||
key,
|
|
||||||
exc,
|
|
||||||
)
|
|
||||||
return None
|
|
||||||
|
|
||||||
def _build_runtime(self, key: str) -> RuntimeAdapter:
|
|
||||||
cls = _load_adapter_class(key, _RUNTIME_ADAPTERS, "runtime")
|
|
||||||
return cls(self._config)
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Role registry
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _resolve_personality(self, tier: int, role: str) -> Optional[str]:
|
|
||||||
"""Return the path to the agent persona .md file, or None."""
|
|
||||||
tier_key = f"t{tier}"
|
|
||||||
tier_map: dict = self._role_registry.get(tier_key, {})
|
|
||||||
path = tier_map.get(role) or tier_map.get("default")
|
|
||||||
if path and os.path.isfile(path):
|
|
||||||
return path
|
|
||||||
return None
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Prompt helpers
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _load_tier_prompt(self, tier: int) -> str:
|
|
||||||
"""Load the system prompt for a tier from the prompts/ directory."""
|
|
||||||
path = _TIER_PROMPTS.get(tier, "")
|
|
||||||
if path and os.path.isfile(path):
|
|
||||||
return self._load_text(path)
|
|
||||||
logger.warning("Tier %d prompt not found at %r", tier, path)
|
|
||||||
return ""
|
|
||||||
|
|
||||||
def _load_personality(self, path: Optional[str]) -> str:
|
|
||||||
if path and os.path.isfile(path):
|
|
||||||
return self._load_text(path)
|
|
||||||
return ""
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _extract_json(text: str) -> dict:
|
|
||||||
"""
|
|
||||||
Extract a JSON object from a potentially markdown-wrapped LLM response.
|
|
||||||
|
|
||||||
Strips leading/trailing markdown fences (```json ... ```) then parses.
|
|
||||||
Falls back to a regex scan for the first ``{...}`` block if plain
|
|
||||||
parsing fails.
|
|
||||||
"""
|
|
||||||
text = text.strip()
|
|
||||||
# Strip markdown fences.
|
|
||||||
if text.startswith("```"):
|
|
||||||
text = re.sub(r"^```[a-z]*\n?", "", text)
|
|
||||||
text = re.sub(r"\n?```\s*$", "", text.strip())
|
|
||||||
|
|
||||||
try:
|
|
||||||
return json.loads(text)
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
m = re.search(r"\{.*\}", text, re.DOTALL)
|
|
||||||
if m:
|
|
||||||
try:
|
|
||||||
return json.loads(m.group(0))
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
pass
|
|
||||||
raise ValueError(
|
|
||||||
"Could not parse JSON from LLM response.\n"
|
|
||||||
f"Response (first 500 chars): {text[:500]}"
|
|
||||||
)
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Brief dispatch
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _dispatch_brief(self, brief: TaskBrief) -> dict:
|
|
||||||
"""
|
|
||||||
Send a TaskBrief to the appropriate agent and return the raw result dict.
|
|
||||||
|
|
||||||
Routing
|
|
||||||
-------
|
|
||||||
preferred_runtime == "coding_agent" → coding runtime adapter
|
|
||||||
preferred_runtime == "standard" → LLM adapter directly
|
|
||||||
|
|
||||||
Blackboard events emitted: spawned → completed | failed.
|
|
||||||
"""
|
|
||||||
if self._dry_run:
|
|
||||||
logger.info(
|
|
||||||
"[DRY-RUN] dispatch tier=%d role=%s task=%.80s",
|
|
||||||
brief.tier,
|
|
||||||
brief.role,
|
|
||||||
brief.task,
|
|
||||||
)
|
|
||||||
return {"status": "done", "output": "{}", "artifacts": []}
|
|
||||||
|
|
||||||
self._bb.update_brief_status(brief.brief_id, "active")
|
|
||||||
self._bb.log_event(
|
|
||||||
"spawned",
|
|
||||||
brief_id=brief.brief_id,
|
|
||||||
detail={"tier": brief.tier, "role": brief.role},
|
|
||||||
)
|
|
||||||
|
|
||||||
try:
|
|
||||||
if brief.preferred_runtime == "coding_agent":
|
|
||||||
result = self._dispatch_via_runtime(brief)
|
|
||||||
else:
|
|
||||||
result = self._dispatch_via_llm(brief)
|
|
||||||
|
|
||||||
self._bb.update_brief_result(brief.brief_id, result)
|
|
||||||
self._bb.log_event(
|
|
||||||
"completed",
|
|
||||||
brief_id=brief.brief_id,
|
|
||||||
detail={"status": result.get("status")},
|
|
||||||
)
|
|
||||||
return result
|
|
||||||
|
|
||||||
except Exception as exc:
|
|
||||||
self._bb.update_brief_status(brief.brief_id, "failed")
|
|
||||||
self._bb.log_event(
|
|
||||||
"failed",
|
|
||||||
brief_id=brief.brief_id,
|
|
||||||
detail={"error": str(exc)},
|
|
||||||
)
|
|
||||||
raise
|
|
||||||
|
|
||||||
def _dispatch_via_llm(self, brief: TaskBrief) -> dict:
|
|
||||||
"""Call the LLM adapter with the tier system prompt + brief JSON."""
|
|
||||||
tier_prompt = self._load_tier_prompt(brief.tier)
|
|
||||||
personality = self._load_personality(brief.agent_personality)
|
|
||||||
system_prompt = "\n\n".join(filter(None, [tier_prompt, personality]))
|
|
||||||
capability = _TIER_CAPABILITIES.get(brief.tier, "capable")
|
|
||||||
user_message = json.dumps(brief.to_dict(), indent=2)
|
|
||||||
|
|
||||||
raw = self._llm.complete(
|
|
||||||
prompt=user_message,
|
|
||||||
capability=capability,
|
|
||||||
context={
|
|
||||||
"system_prompt": system_prompt,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
return self._extract_json(raw)
|
|
||||||
|
|
||||||
def _dispatch_via_runtime(self, brief: TaskBrief) -> dict:
|
|
||||||
"""Spawn a coding agent via the runtime adapter and collect its result."""
|
|
||||||
task_str = json.dumps(brief.to_dict(), indent=2)
|
|
||||||
capability = _TIER_CAPABILITIES.get(brief.tier, "capable")
|
|
||||||
timeout_s: int = brief.context.get("timeout_s", 300)
|
|
||||||
|
|
||||||
agent_id = self._coding_runtime.spawn(
|
|
||||||
task=task_str,
|
|
||||||
capability=capability,
|
|
||||||
context=brief.context,
|
|
||||||
)
|
|
||||||
logger.info(
|
|
||||||
"Spawned coding agent %s for brief %s", agent_id, brief.brief_id
|
|
||||||
)
|
|
||||||
|
|
||||||
result = self._coding_runtime.get_result(agent_id, timeout_s=timeout_s)
|
|
||||||
|
|
||||||
# Attempt to parse JSON from the agent's text output.
|
|
||||||
if isinstance(result.get("output"), str) and result["output"].strip():
|
|
||||||
try:
|
|
||||||
parsed = self._extract_json(result["output"])
|
|
||||||
result.update(parsed)
|
|
||||||
except ValueError:
|
|
||||||
pass # Keep raw string output as-is.
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Escalation loop
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _run_with_escalation(
|
|
||||||
self,
|
|
||||||
brief: TaskBrief,
|
|
||||||
workstream_id: Optional[str] = None,
|
|
||||||
) -> dict:
|
|
||||||
"""
|
|
||||||
Dispatch a brief and apply the escalation policy until done or exhausted.
|
|
||||||
|
|
||||||
On retry the amended brief is persisted to the Blackboard before
|
|
||||||
being re-submitted.
|
|
||||||
"""
|
|
||||||
while True:
|
|
||||||
result = self._dispatch_brief(brief)
|
|
||||||
decision = self._escalation.handle(brief, result)
|
|
||||||
|
|
||||||
if decision.action == "complete":
|
|
||||||
return result
|
|
||||||
|
|
||||||
if decision.action == "escalate":
|
|
||||||
self._bb.log_event(
|
|
||||||
"escalated",
|
|
||||||
brief_id=brief.brief_id,
|
|
||||||
detail={"reason": decision.reason},
|
|
||||||
)
|
|
||||||
raise EscalationError(
|
|
||||||
f"Brief {brief.brief_id} (tier={brief.tier} role={brief.role}) "
|
|
||||||
f"escalated: {decision.reason}"
|
|
||||||
)
|
|
||||||
|
|
||||||
# "retry" or "salvage_and_retry"
|
|
||||||
self._bb.log_event(
|
|
||||||
"retried",
|
|
||||||
brief_id=brief.brief_id,
|
|
||||||
detail={"reason": decision.reason, "action": decision.action},
|
|
||||||
)
|
|
||||||
amended = decision.amended_brief
|
|
||||||
if amended is None:
|
|
||||||
raise EscalationError(
|
|
||||||
f"Escalation returned action={decision.action!r} "
|
|
||||||
"but no amended_brief was provided."
|
|
||||||
)
|
|
||||||
# Persist the new brief and loop.
|
|
||||||
self._bb.create_brief(amended, workstream_id=workstream_id)
|
|
||||||
brief = amended
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Tier output parsers
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _parse_t1_output(
|
|
||||||
self, result: dict, root_brief: TaskBrief
|
|
||||||
) -> list[TaskBrief]:
|
|
||||||
"""Build T2 TaskBriefs from T1 (Visionary) JSON output."""
|
|
||||||
retry_bad: int = self._config.get("retry_defaults", {}).get("bad_output", 3)
|
|
||||||
workstreams: list[dict] = result.get("workstreams", [])
|
|
||||||
|
|
||||||
# T1 sets the canonical goal_anchor; propagate it back to root.
|
|
||||||
goal_anchor: str = result.get("goal_anchor") or root_brief.goal_anchor
|
|
||||||
root_brief.goal_anchor = goal_anchor
|
|
||||||
|
|
||||||
briefs: list[TaskBrief] = []
|
|
||||||
for ws in workstreams:
|
|
||||||
role = ws.get("role", "default")
|
|
||||||
brief = root_brief.make_child_brief(
|
|
||||||
tier=2,
|
|
||||||
role=role,
|
|
||||||
task=ws.get("task", ""),
|
|
||||||
workstream=ws.get("name", ""),
|
|
||||||
acceptance_criteria=ws.get("acceptance_criteria", []),
|
|
||||||
preferred_runtime="standard",
|
|
||||||
agent_personality=self._resolve_personality(2, role),
|
|
||||||
retry_budget=retry_bad,
|
|
||||||
)
|
|
||||||
briefs.append(brief)
|
|
||||||
return briefs
|
|
||||||
|
|
||||||
def _parse_t2_output(
|
|
||||||
self, result: dict, parent: TaskBrief
|
|
||||||
) -> list[TaskBrief]:
|
|
||||||
"""Build T3 TaskBriefs from T2 (Architect) JSON output."""
|
|
||||||
retry_bad: int = self._config.get("retry_defaults", {}).get("bad_output", 3)
|
|
||||||
subtasks: list[dict] = result.get("subtasks", [])
|
|
||||||
arch_summary: str = result.get("architecture_summary", "")
|
|
||||||
|
|
||||||
briefs: list[TaskBrief] = []
|
|
||||||
for st in subtasks:
|
|
||||||
role = st.get("role", "default")
|
|
||||||
brief = parent.make_child_brief(
|
|
||||||
tier=3,
|
|
||||||
role=role,
|
|
||||||
task=st.get("task", ""),
|
|
||||||
workstream=parent.workstream,
|
|
||||||
acceptance_criteria=st.get("acceptance_criteria", []),
|
|
||||||
preferred_runtime=st.get("preferred_runtime", "standard"),
|
|
||||||
agent_personality=self._resolve_personality(3, role),
|
|
||||||
retry_budget=retry_bad,
|
|
||||||
context={"architecture_summary": arch_summary},
|
|
||||||
)
|
|
||||||
briefs.append(brief)
|
|
||||||
return briefs
|
|
||||||
|
|
||||||
def _parse_t3_output(
|
|
||||||
self, result: dict, parent: TaskBrief
|
|
||||||
) -> list[TaskBrief]:
|
|
||||||
"""Build T4 TaskBriefs from T3 (Squad Lead) JSON output."""
|
|
||||||
retry_bad: int = self._config.get("retry_defaults", {}).get("bad_output", 3)
|
|
||||||
tasks: list[dict] = result.get("tasks", [])
|
|
||||||
plan_summary: str = result.get("plan_summary", "")
|
|
||||||
|
|
||||||
briefs: list[TaskBrief] = []
|
|
||||||
for task in tasks:
|
|
||||||
role = task.get("role", "default")
|
|
||||||
pref_runtime = task.get("preferred_runtime", "standard")
|
|
||||||
brief = parent.make_child_brief(
|
|
||||||
tier=4,
|
|
||||||
role=role,
|
|
||||||
task=task.get("task", ""),
|
|
||||||
workstream=parent.workstream,
|
|
||||||
acceptance_criteria=task.get("acceptance_criteria", []),
|
|
||||||
preferred_runtime=pref_runtime,
|
|
||||||
agent_personality=self._resolve_personality(4, role),
|
|
||||||
retry_budget=retry_bad,
|
|
||||||
context={
|
|
||||||
"plan_summary": plan_summary,
|
|
||||||
"depends_on": task.get("depends_on", []),
|
|
||||||
},
|
|
||||||
)
|
|
||||||
briefs.append(brief)
|
|
||||||
return briefs
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# VCS helpers
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _commit_artifacts(
|
|
||||||
self,
|
|
||||||
artifacts: list[dict],
|
|
||||||
brief: TaskBrief,
|
|
||||||
) -> None:
|
|
||||||
"""Commit T4 *file* artifacts to the configured VCS adapter."""
|
|
||||||
if not self._vcs or self._dry_run:
|
|
||||||
if self._dry_run:
|
|
||||||
logger.info(
|
|
||||||
"[DRY-RUN] Would commit %d artifact(s) for brief %s",
|
|
||||||
len(artifacts),
|
|
||||||
brief.brief_id,
|
|
||||||
)
|
|
||||||
return
|
|
||||||
|
|
||||||
file_map: dict[str, str] = {
|
|
||||||
a["path"]: a["content"]
|
|
||||||
for a in artifacts
|
|
||||||
if a.get("type") == "file"
|
|
||||||
and a.get("path")
|
|
||||||
and a.get("content") is not None
|
|
||||||
}
|
|
||||||
if not file_map:
|
|
||||||
return
|
|
||||||
|
|
||||||
branch: str = self._config.get("run", {}).get("base_branch", "main")
|
|
||||||
message = (
|
|
||||||
f"feat({brief.workstream}): artifacts from {brief.role} "
|
|
||||||
f"[brief {brief.brief_id[:8]}]"
|
|
||||||
)
|
|
||||||
try:
|
|
||||||
# GitHubAdapter.commit accepts dict[str, str] as files.
|
|
||||||
sha = self._vcs.commit(file_map, message) # type: ignore[call-arg]
|
|
||||||
logger.info(
|
|
||||||
"Committed %d artifact(s) → SHA %s", len(file_map), sha
|
|
||||||
)
|
|
||||||
except Exception as exc:
|
|
||||||
logger.warning("VCS commit failed: %s", exc)
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Notification
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _notify_run(self, outcome: str, goal: str, detail: dict) -> None:
|
|
||||||
if not self._notify or self._dry_run:
|
|
||||||
if self._dry_run:
|
|
||||||
logger.info(
|
|
||||||
"[DRY-RUN] Would notify outcome=%s goal=%.80s", outcome, goal
|
|
||||||
)
|
|
||||||
return
|
|
||||||
|
|
||||||
level = "info" if outcome == "complete" else "error"
|
|
||||||
if outcome == "complete":
|
|
||||||
message = f"Pipeline complete: {goal[:80]}"
|
|
||||||
else:
|
|
||||||
message = f"Pipeline failed: {detail.get('error', 'unknown error')[:120]}"
|
|
||||||
|
|
||||||
self._notify.send(
|
|
||||||
message,
|
|
||||||
context={
|
|
||||||
"level": level,
|
|
||||||
"run_id": self._bb.run_id,
|
|
||||||
"outcome": outcome,
|
|
||||||
**{k: str(v) for k, v in detail.items()},
|
|
||||||
},
|
|
||||||
)
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Public API
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
|
|
||||||
def run(self) -> None:
|
def run(self) -> None:
|
||||||
"""
|
"""
|
||||||
Execute the full T1→T5 pipeline.
|
Execute the full pipeline from T1 decomposition through T5 verification.
|
||||||
|
|
||||||
Steps
|
TODO (Phase 2):
|
||||||
-----
|
- Build root T1 brief from config.run.goal.
|
||||||
1. Dispatch T1 Visionary to decompose the goal into workstreams.
|
- Dispatch to T1 Visionary via LLM adapter.
|
||||||
2. For each workstream: T2 Architect → T3 Squad Lead →
|
- Parse workstreams from T1 output.
|
||||||
T4 Implementer → T5 Verifier.
|
- For each workstream: dispatch T2 Architect.
|
||||||
3. Commit passing T4 artifacts via VCS adapter (if configured).
|
- For each T2 subtask: dispatch T3 Squad Lead.
|
||||||
4. Notify on completion or terminal failure via notify adapter.
|
- For each T3 task: dispatch T4 Implementer.
|
||||||
|
- For each T4 artifact set: dispatch T5 Verifier.
|
||||||
|
- Run escalation handler at each tier on failure.
|
||||||
|
- Commit passing artifacts via VCS adapter.
|
||||||
|
- Notify on completion or terminal failure via notify adapter.
|
||||||
"""
|
"""
|
||||||
goal: str = self._config["run"]["goal"]
|
raise NotImplementedError("TeamRunner.run is not yet implemented.")
|
||||||
self._bb.create_run(goal=goal)
|
|
||||||
self._bb.update_run_status("active")
|
|
||||||
logger.info("Pipeline started — goal: %s", goal)
|
|
||||||
|
|
||||||
try:
|
def _dispatch_brief(self, brief) -> dict:
|
||||||
self._orchestrate(goal)
|
"""
|
||||||
self._bb.update_run_status("done")
|
Send a single TaskBrief to the appropriate agent and return the result.
|
||||||
summary = self._bb.get_run_summary()
|
|
||||||
logger.info("Pipeline complete. Summary: %s", summary)
|
|
||||||
self._notify_run("complete", goal, summary)
|
|
||||||
except Exception as exc:
|
|
||||||
self._bb.update_run_status("failed")
|
|
||||||
logger.error("Pipeline failed: %s", exc, exc_info=True)
|
|
||||||
self._notify_run("failed", goal, {"error": str(exc)})
|
|
||||||
raise
|
|
||||||
finally:
|
|
||||||
self._bb.close()
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
TODO (Phase 2):
|
||||||
# Internal orchestration
|
- Select runtime based on brief.preferred_runtime.
|
||||||
# ------------------------------------------------------------------
|
- Load agent personality from brief.agent_personality (if set).
|
||||||
|
- Compose prompt from tier system prompt + brief payload.
|
||||||
def _orchestrate(self, goal: str) -> None:
|
- Spawn agent via runtime adapter.
|
||||||
"""Build the root T1 brief, dispatch it, and fan out per workstream."""
|
- Await result via runtime.get_result().
|
||||||
retry_bad: int = self._config.get("retry_defaults", {}).get("bad_output", 3)
|
- Log spawned/completed/failed events to Blackboard.
|
||||||
|
"""
|
||||||
# ---- T1: Visionary ----
|
raise NotImplementedError("TeamRunner._dispatch_brief is not yet implemented.")
|
||||||
t1_brief = TaskBrief(
|
|
||||||
run_id=self._bb.run_id,
|
|
||||||
tier=1,
|
|
||||||
role="default",
|
|
||||||
goal_anchor=goal,
|
|
||||||
task=(
|
|
||||||
"You are the T1 Visionary. "
|
|
||||||
"Decompose the following goal into parallel workstreams "
|
|
||||||
f"for the engineering team: {goal}"
|
|
||||||
),
|
|
||||||
workstream="root",
|
|
||||||
retry_budget=retry_bad,
|
|
||||||
preferred_runtime="standard",
|
|
||||||
agent_personality=self._resolve_personality(1, "default"),
|
|
||||||
)
|
|
||||||
self._bb.create_brief(t1_brief)
|
|
||||||
|
|
||||||
t1_result = self._run_with_escalation(t1_brief)
|
|
||||||
t2_briefs = self._parse_t1_output(t1_result, t1_brief)
|
|
||||||
logger.info("T1 produced %d workstream(s)", len(t2_briefs))
|
|
||||||
|
|
||||||
# ---- T2..T5: per workstream ----
|
|
||||||
for t2_brief in t2_briefs:
|
|
||||||
ws_id = self._bb.create_workstream(
|
|
||||||
name=t2_brief.workstream, tier=2
|
|
||||||
)
|
|
||||||
self._bb.create_brief(t2_brief, workstream_id=ws_id)
|
|
||||||
self._bb.update_workstream_status(ws_id, "active")
|
|
||||||
|
|
||||||
try:
|
|
||||||
self._run_workstream(t2_brief, ws_id)
|
|
||||||
self._bb.update_workstream_status(ws_id, "done")
|
|
||||||
except EscalationError as exc:
|
|
||||||
self._bb.update_workstream_status(ws_id, "failed")
|
|
||||||
self._bb.log_event(
|
|
||||||
"failed",
|
|
||||||
detail={"error": str(exc), "workstream": t2_brief.workstream},
|
|
||||||
)
|
|
||||||
logger.error(
|
|
||||||
"Workstream %r failed: %s", t2_brief.workstream, exc
|
|
||||||
)
|
|
||||||
|
|
||||||
def _run_workstream(self, t2_brief: TaskBrief, ws_id: str) -> None:
|
|
||||||
"""Drive T2 → T3 → T4 → T5 for a single workstream."""
|
|
||||||
# T2: Architect
|
|
||||||
t2_result = self._run_with_escalation(t2_brief, workstream_id=ws_id)
|
|
||||||
t3_briefs = self._parse_t2_output(t2_result, t2_brief)
|
|
||||||
logger.info(
|
|
||||||
"T2 (%s) produced %d subtask(s)", t2_brief.workstream, len(t3_briefs)
|
|
||||||
)
|
|
||||||
|
|
||||||
for t3_brief in t3_briefs:
|
|
||||||
self._bb.create_brief(t3_brief, workstream_id=ws_id)
|
|
||||||
try:
|
|
||||||
# T3: Squad Lead
|
|
||||||
t3_result = self._run_with_escalation(t3_brief, workstream_id=ws_id)
|
|
||||||
t4_briefs = self._parse_t3_output(t3_result, t3_brief)
|
|
||||||
logger.info(
|
|
||||||
"T3 (%s) produced %d task(s)", t3_brief.role, len(t4_briefs)
|
|
||||||
)
|
|
||||||
|
|
||||||
for t4_brief in t4_briefs:
|
|
||||||
self._bb.create_brief(t4_brief, workstream_id=ws_id)
|
|
||||||
try:
|
|
||||||
# T4: Implementer
|
|
||||||
t4_result = self._run_with_escalation(
|
|
||||||
t4_brief, workstream_id=ws_id
|
|
||||||
)
|
|
||||||
artifacts: list[dict] = t4_result.get("artifacts", [])
|
|
||||||
|
|
||||||
# T5: Verifier
|
|
||||||
t5_brief = t4_brief.make_child_brief(
|
|
||||||
tier=5,
|
|
||||||
role="code",
|
|
||||||
task=(
|
|
||||||
"Verify the following T4 implementation artifacts "
|
|
||||||
"against all acceptance criteria. "
|
|
||||||
f"T4 output: {json.dumps(t4_result)[:2000]}"
|
|
||||||
),
|
|
||||||
workstream=t4_brief.workstream,
|
|
||||||
acceptance_criteria=t4_brief.acceptance_criteria,
|
|
||||||
preferred_runtime="standard",
|
|
||||||
agent_personality=self._resolve_personality(5, "code"),
|
|
||||||
retry_budget=self._config.get(
|
|
||||||
"retry_defaults", {}
|
|
||||||
).get("bad_output", 3),
|
|
||||||
context={"t4_result": t4_result},
|
|
||||||
)
|
|
||||||
self._bb.create_brief(t5_brief, workstream_id=ws_id)
|
|
||||||
t5_result = self._run_with_escalation(
|
|
||||||
t5_brief, workstream_id=ws_id
|
|
||||||
)
|
|
||||||
|
|
||||||
# Commit on verified pass.
|
|
||||||
if t5_result.get("status") in ("passed", "done"):
|
|
||||||
self._commit_artifacts(artifacts, t4_brief)
|
|
||||||
|
|
||||||
except EscalationError as exc:
|
|
||||||
logger.error(
|
|
||||||
"T4/T5 escalation in %s: %s", t4_brief.role, exc
|
|
||||||
)
|
|
||||||
|
|
||||||
except EscalationError as exc:
|
|
||||||
logger.error("T3 escalation in %s: %s", t3_brief.role, exc)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# CLI entry point
|
# CLI entry point (Phase 2)
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def _configure_logging(verbose: bool = False) -> None:
|
# TODO (Phase 2): Implement argparse CLI.
|
||||||
level = logging.DEBUG if verbose else logging.INFO
|
# if __name__ == "__main__":
|
||||||
logging.basicConfig(
|
# parser = argparse.ArgumentParser(description="Run the-agency pipeline.")
|
||||||
level=level,
|
# parser.add_argument("--config", default="config/team.yaml", help="Path to team.yaml")
|
||||||
format="%(asctime)s %(levelname)-8s %(name)s — %(message)s",
|
# parser.add_argument("--dry-run", action="store_true", help="Log actions without executing")
|
||||||
datefmt="%Y-%m-%dT%H:%M:%S",
|
# args = parser.parse_args()
|
||||||
)
|
# runner = TeamRunner(config_path=args.config)
|
||||||
|
# runner.run()
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
parser = argparse.ArgumentParser(
|
|
||||||
description="Run the-agency T1→T5 pipeline.",
|
|
||||||
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--config",
|
|
||||||
default="config/team.yaml",
|
|
||||||
help="Path to team.yaml configuration file.",
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--dry-run",
|
|
||||||
action="store_true",
|
|
||||||
help=(
|
|
||||||
"Log all planned actions without executing LLM calls, "
|
|
||||||
"VCS commits, or notifications."
|
|
||||||
),
|
|
||||||
)
|
|
||||||
parser.add_argument(
|
|
||||||
"--verbose",
|
|
||||||
action="store_true",
|
|
||||||
help="Enable DEBUG-level logging.",
|
|
||||||
)
|
|
||||||
args = parser.parse_args()
|
|
||||||
_configure_logging(args.verbose)
|
|
||||||
|
|
||||||
runner = TeamRunner(config_path=args.config, dry_run=args.dry_run)
|
|
||||||
runner.run()
|
|
||||||
|
|||||||
507
docs/buildspec.md
Normal file
507
docs/buildspec.md
Normal file
@@ -0,0 +1,507 @@
|
|||||||
|
# Tiered Agent Team System — Build Spec
|
||||||
|
|
||||||
|
_Started: 2026-03-15. Last updated: 2026-03-30._
|
||||||
|
_See design.md for the design doc and decisions log._
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Language & Runtime
|
||||||
|
|
||||||
|
**Python 3.11+.** Reasons:
|
||||||
|
- Agent/AI tooling is Python-first
|
||||||
|
- Clean type hints + dataclasses for schemas
|
||||||
|
- Agents can read and modify their own orchestration code
|
||||||
|
- Runs anywhere — no Node, no OpenClaw dependency
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository
|
||||||
|
|
||||||
|
Standalone repo: `git@github.com:coding-with-hans-heinemann/the-agency.git`
|
||||||
|
|
||||||
|
Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Directory Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
agent-teams/
|
||||||
|
├── core/
|
||||||
|
│ ├── team_runner.py — run lifecycle, agent spawning
|
||||||
|
│ ├── blackboard.py — SQLite coordination state
|
||||||
|
│ ├── task_brief.py — schema + validation
|
||||||
|
│ └── escalation.py — retry logic, failure routing
|
||||||
|
│
|
||||||
|
├── adapters/
|
||||||
|
│ ├── base/
|
||||||
|
│ │ ├── llm.py — abstract LLM interface
|
||||||
|
│ │ ├── vcs.py — abstract VCS interface
|
||||||
|
│ │ ├── notify.py — abstract notification interface
|
||||||
|
│ │ └── runtime.py — abstract agent runtime interface
|
||||||
|
│ ├── llm/
|
||||||
|
│ │ ├── anthropic.py — Claude via direct Anthropic API
|
||||||
|
│ │ ├── openai.py — GPT / o-series
|
||||||
|
│ │ └── ollama.py — local models
|
||||||
|
│ ├── vcs/
|
||||||
|
│ │ └── github.py
|
||||||
|
│ ├── notify/
|
||||||
|
│ │ └── openclaw.py — messages Hans who notifies Andrew
|
||||||
|
│ └── runtime/
|
||||||
|
│ ├── openclaw.py — sessions_spawn (general purpose)
|
||||||
|
│ └── claude_code.py — coding agent runtime (file/git/exec tools)
|
||||||
|
│
|
||||||
|
├── agents/ — git submodule: msitarzewski/agency-agents
|
||||||
|
│ ├── engineering/
|
||||||
|
│ ├── testing/
|
||||||
|
│ ├── strategy/
|
||||||
|
│ └── ... — full agency-agents roster
|
||||||
|
│
|
||||||
|
├── prompts/
|
||||||
|
│ ├── t1_visionary.md — fallback if no agent_personality set
|
||||||
|
│ ├── t2_architect.md
|
||||||
|
│ ├── t3_squad_lead.md
|
||||||
|
│ ├── t4_implementer.md
|
||||||
|
│ └── t5_verifier.md
|
||||||
|
│
|
||||||
|
├── config/
|
||||||
|
│ ├── team.yaml — example run configuration
|
||||||
|
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
|
||||||
|
│
|
||||||
|
├── cli/
|
||||||
|
│ └── agency.py — run, watch, inspect, approve, reject, pause, resume
|
||||||
|
│
|
||||||
|
├── runs/ — runtime state, one subdir per run_id
|
||||||
|
│ └── .gitkeep
|
||||||
|
│
|
||||||
|
└── README.md
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Blackboard
|
||||||
|
|
||||||
|
SQLite. One file per run at `runs/<run_id>/blackboard.db`.
|
||||||
|
|
||||||
|
### Tables
|
||||||
|
|
||||||
|
**runs**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE runs (
|
||||||
|
run_id TEXT PRIMARY KEY,
|
||||||
|
goal TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | review | done | failed
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**workstreams**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE workstreams (
|
||||||
|
workstream_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
name TEXT NOT NULL,
|
||||||
|
tier INTEGER NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | blocked | done | failed
|
||||||
|
owner_agent_id TEXT,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**briefs**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE briefs (
|
||||||
|
brief_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
parent_brief_id TEXT,
|
||||||
|
workstream_id TEXT,
|
||||||
|
tier INTEGER NOT NULL,
|
||||||
|
role TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- pending | active | done | failed
|
||||||
|
payload TEXT NOT NULL, -- full JSON brief
|
||||||
|
result TEXT, -- JSON result when done
|
||||||
|
retry_count INTEGER DEFAULT 0,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**events**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE events (
|
||||||
|
event_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
brief_id TEXT,
|
||||||
|
kind TEXT NOT NULL, -- see event vocabulary below
|
||||||
|
detail TEXT, -- JSON
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Event kind vocabulary:**
|
||||||
|
```
|
||||||
|
-- lifecycle
|
||||||
|
spawned | completed | failed | escalated | retried
|
||||||
|
|
||||||
|
-- visibility / gates
|
||||||
|
gate_pending -- runner hit an inspection gate, waiting for human
|
||||||
|
gate_approved -- human approved via CLI or notify
|
||||||
|
gate_rejected -- human rejected, tier re-invoked
|
||||||
|
gate_paused -- manual pause via CLI
|
||||||
|
gate_resumed -- manual resume via CLI
|
||||||
|
|
||||||
|
-- amendments / informational
|
||||||
|
path_amendment -- mid-run tier proposed a tier path change
|
||||||
|
log -- human-readable log line (detail: {level, message})
|
||||||
|
```
|
||||||
|
|
||||||
|
**t3_task_lists** *(T3 mesh coordination)*
|
||||||
|
```sql
|
||||||
|
CREATE TABLE t3_task_lists (
|
||||||
|
entry_id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL,
|
||||||
|
workstream_id TEXT NOT NULL,
|
||||||
|
t3_agent_id TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL, -- draft | committed
|
||||||
|
tasks TEXT NOT NULL, -- JSON array of proposed T4 task descriptors
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
updated_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task Brief Schema
|
||||||
|
|
||||||
|
Every brief passed between tiers is a validated JSON object. `goal_anchor` is immutable — set by T1, copied verbatim into every downstream brief.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"brief_id": "uuid",
|
||||||
|
"run_id": "uuid",
|
||||||
|
"parent_brief_id": "uuid | null",
|
||||||
|
"tier": 4,
|
||||||
|
"role": "implementer",
|
||||||
|
"goal_anchor": "Original T1 intent — always propagated unchanged",
|
||||||
|
"workstream": "backend-api",
|
||||||
|
"task": "Implement POST /webhooks/ingest endpoint",
|
||||||
|
"acceptance_criteria": [
|
||||||
|
"Accepts JSON payload",
|
||||||
|
"Returns 202 on success",
|
||||||
|
"Writes to queue"
|
||||||
|
],
|
||||||
|
"constraints": [
|
||||||
|
"Use existing queue client in src/queue.py",
|
||||||
|
"No new dependencies"
|
||||||
|
],
|
||||||
|
"context": {
|
||||||
|
"relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
|
||||||
|
"interface_contract": "..."
|
||||||
|
},
|
||||||
|
"retry_budget": 3,
|
||||||
|
"retry_count": 0,
|
||||||
|
"preferred_runtime": "coding_agent",
|
||||||
|
"agent_personality": "agents/engineering/engineering-code-reviewer.md",
|
||||||
|
"created_at": "ISO-8601"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`preferred_runtime` is optional. T3 sets it to `"coding_agent"` when spawning T4/T5 for implementation or verification tasks. Runner falls back to `"standard"` if the coding agent runtime is not configured.
|
||||||
|
|
||||||
|
`agent_personality` is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in `prompts/` if not set.
|
||||||
|
|
||||||
|
```
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adapter Interfaces
|
||||||
|
|
||||||
|
### LLM (`adapters/base/llm.py`)
|
||||||
|
```python
|
||||||
|
class LLMAdapter:
|
||||||
|
def complete(self, prompt: str, capability: str, context: dict) -> str
|
||||||
|
def resolve_model(self, capability: str) -> str
|
||||||
|
# capability: "reasoning-heavy" | "capable" | "fast-cheap"
|
||||||
|
```
|
||||||
|
|
||||||
|
### VCS (`adapters/base/vcs.py`)
|
||||||
|
```python
|
||||||
|
class VCSAdapter:
|
||||||
|
def create_branch(self, name: str) -> None
|
||||||
|
def commit(self, files: list[str], message: str) -> str # returns commit sha
|
||||||
|
def create_pr(self, title: str, body: str, head: str, base: str) -> str # returns pr url
|
||||||
|
def get_pr_status(self, pr_id: str) -> str # open | merged | closed
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notify (`adapters/base/notify.py`)
|
||||||
|
```python
|
||||||
|
class NotifyAdapter:
|
||||||
|
def send(self, message: str, context: dict) -> None
|
||||||
|
```
|
||||||
|
|
||||||
|
### Runtime (`adapters/base/runtime.py`)
|
||||||
|
```python
|
||||||
|
class RuntimeAdapter:
|
||||||
|
def spawn(self, task: str, capability: str, context: dict) -> str # returns agent_id
|
||||||
|
def get_result(self, agent_id: str, timeout_s: int) -> dict
|
||||||
|
def kill(self, agent_id: str) -> None
|
||||||
|
|
||||||
|
# Two implementations:
|
||||||
|
# openclaw.py — general purpose, uses sessions_spawn, suits T1/T2/T3
|
||||||
|
# claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
|
||||||
|
#
|
||||||
|
# The runner selects runtime based on brief.preferred_runtime:
|
||||||
|
# "standard" → openclaw.py (default)
|
||||||
|
# "coding_agent" → claude_code.py (falls back to standard if unavailable)
|
||||||
|
#
|
||||||
|
# Both implementations inject brief.agent_personality as the system prompt
|
||||||
|
# when spawning, if present. Falls back to generic tier prompt otherwise.
|
||||||
|
# claude_code.py passes the agent file via --system-prompt flag natively
|
||||||
|
# (agency-agents was designed for Claude Code's agents/ directory).
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Run Config (`config/team.yaml`)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
run:
|
||||||
|
goal: "Build webhook ingestion system with retry logic and DLQ"
|
||||||
|
repo: "git@github.com:org/repo.git"
|
||||||
|
base_branch: "main"
|
||||||
|
|
||||||
|
adapters:
|
||||||
|
llm: anthropic
|
||||||
|
vcs: github
|
||||||
|
notify: openclaw
|
||||||
|
runtime: openclaw
|
||||||
|
|
||||||
|
models:
|
||||||
|
provider: anthropic # default provider
|
||||||
|
capability_map:
|
||||||
|
reasoning-heavy:
|
||||||
|
anthropic: claude-opus-4-6
|
||||||
|
openai: o3
|
||||||
|
capable:
|
||||||
|
anthropic: claude-sonnet-4-6
|
||||||
|
openai: gpt-4o
|
||||||
|
ollama: llama3.1:70b
|
||||||
|
fast-cheap:
|
||||||
|
anthropic: claude-haiku-3-5
|
||||||
|
openai: gpt-4o-mini
|
||||||
|
ollama: llama3.2
|
||||||
|
|
||||||
|
# optional: override provider per tier
|
||||||
|
tier_overrides:
|
||||||
|
t1: { provider: openai, capability: reasoning-heavy }
|
||||||
|
t4: { provider: ollama, capability: fast-cheap }
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
default: openclaw
|
||||||
|
coding_agent: claude_code # used for T4/T5 when available; omit to disable
|
||||||
|
native_teams: false # Claude Code's experimental agent teams — opt-in only
|
||||||
|
# when true: T3 hands full workstream to Claude Code,
|
||||||
|
# which fans out internally. faster but less blackboard
|
||||||
|
# visibility. default: false (explicit T4 spawning)
|
||||||
|
# tier_runtime_map (optional overrides):
|
||||||
|
# t1: standard
|
||||||
|
# t2: standard
|
||||||
|
# t3: standard
|
||||||
|
# t4: coding_agent
|
||||||
|
# t5: coding_agent
|
||||||
|
|
||||||
|
retry_defaults:
|
||||||
|
bad_output: 3
|
||||||
|
partial: 2
|
||||||
|
blocked: 0 # always escalate immediately
|
||||||
|
|
||||||
|
visibility:
|
||||||
|
strict_mode: false # true = all gates on (recommended for first runs)
|
||||||
|
log_level: normal # normal | verbose (verbose = per-T4 start/done lines)
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists spawn
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no human response within this window
|
||||||
|
|
||||||
|
t3_mesh_timeout_minutes: 10 # max time for T3s to commit task lists before runner escalates
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Role Registry (`config/role_registry.yaml`)
|
||||||
|
|
||||||
|
Maps `(tier, domain)` → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
t1:
|
||||||
|
default: agents/strategy/nexus-strategy.md
|
||||||
|
|
||||||
|
t2:
|
||||||
|
backend: agents/engineering/engineering-software-architect.md
|
||||||
|
frontend: agents/engineering/engineering-software-architect.md
|
||||||
|
infra: agents/engineering/engineering-devops-automator.md
|
||||||
|
data: agents/engineering/engineering-data-engineer.md
|
||||||
|
default: agents/engineering/engineering-software-architect.md
|
||||||
|
|
||||||
|
t3:
|
||||||
|
backend: agents/engineering/engineering-senior-developer.md
|
||||||
|
frontend: agents/engineering/engineering-senior-developer.md
|
||||||
|
infra: agents/engineering/engineering-sre.md
|
||||||
|
default: agents/engineering/engineering-senior-developer.md
|
||||||
|
|
||||||
|
t4:
|
||||||
|
frontend: agents/engineering/engineering-frontend-developer.md
|
||||||
|
backend: agents/engineering/engineering-backend-architect.md
|
||||||
|
database: agents/engineering/engineering-database-optimizer.md
|
||||||
|
devops: agents/engineering/engineering-devops-automator.md
|
||||||
|
mobile: agents/engineering/engineering-mobile-app-builder.md
|
||||||
|
ai: agents/engineering/engineering-ai-engineer.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
docs: agents/engineering/engineering-technical-writer.md
|
||||||
|
default: agents/engineering/engineering-senior-developer.md
|
||||||
|
|
||||||
|
t5:
|
||||||
|
code: agents/engineering/engineering-code-reviewer.md
|
||||||
|
integration: agents/testing/testing-reality-checker.md
|
||||||
|
api: agents/testing/testing-api-tester.md
|
||||||
|
performance: agents/testing/testing-performance-benchmarker.md
|
||||||
|
security: agents/engineering/engineering-security-engineer.md
|
||||||
|
default: agents/engineering/engineering-code-reviewer.md
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Flows
|
||||||
|
|
||||||
|
### 1. Run Kickoff
|
||||||
|
|
||||||
|
```
|
||||||
|
User → team_runner.start(goal, config) # via CLI or any caller
|
||||||
|
→ generate run_id
|
||||||
|
→ init blackboard (create runs/<run_id>/blackboard.db)
|
||||||
|
→ build T1 brief (goal_anchor = goal, retry_budget from config)
|
||||||
|
→ spawn T1 via runtime adapter
|
||||||
|
→ await T1 workplan
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. T1 Scope Assessment
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 receives brief
|
||||||
|
→ assess complexity → decide depth
|
||||||
|
→ identify workstreams
|
||||||
|
→ set retry_budget multiplier per workstream (1x simple, 2x complex)
|
||||||
|
→ emit N workstream briefs for T2 (or T3 if shallow)
|
||||||
|
→ write workplan to blackboard
|
||||||
|
→ team_runner spawns T2s in parallel
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. T4 Retry Loop (escalation.py)
|
||||||
|
|
||||||
|
```
|
||||||
|
spawn T4 with brief
|
||||||
|
→ receive result
|
||||||
|
→ classify: bad_output | blocked | partial | success
|
||||||
|
|
||||||
|
blocked:
|
||||||
|
→ log event(escalated)
|
||||||
|
→ pass to T3 immediately
|
||||||
|
|
||||||
|
bad_output, retries_remaining:
|
||||||
|
→ amend brief with failure context, increment retry_count
|
||||||
|
→ re-spawn T4
|
||||||
|
→ log event(retried)
|
||||||
|
|
||||||
|
bad_output, retries_exhausted:
|
||||||
|
→ log event(escalated)
|
||||||
|
→ pass to T3
|
||||||
|
|
||||||
|
partial:
|
||||||
|
→ write salvageable parts to blackboard
|
||||||
|
→ re-task remainder with new brief
|
||||||
|
|
||||||
|
success:
|
||||||
|
→ write result to blackboard
|
||||||
|
→ log event(completed)
|
||||||
|
→ notify T3
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Inspection Gate Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
runner reaches configured gate (e.g. t2_synthesis)
|
||||||
|
→ write event(gate_pending, detail={tier, summary, what_happens_next})
|
||||||
|
→ notify_adapter.send(tier summary + gate context)
|
||||||
|
→ halt: poll blackboard for gate_approved or gate_rejected
|
||||||
|
|
||||||
|
gate_approved:
|
||||||
|
→ write event(gate_approved)
|
||||||
|
→ continue run
|
||||||
|
|
||||||
|
gate_rejected:
|
||||||
|
→ write event(gate_rejected, detail={reason})
|
||||||
|
→ re-invoke tier with rejection reason in brief context
|
||||||
|
→ loop back to gate_pending when tier completes again
|
||||||
|
|
||||||
|
gate_timeout (gate_timeout_minutes elapsed):
|
||||||
|
→ treat as gate_rejected
|
||||||
|
→ notify Andrew: "Gate timed out, re-invoking tier"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Review Gate
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 completes integration
|
||||||
|
→ vcs_adapter.create_pr(
|
||||||
|
title="[agent-teams] <run_id>: <goal summary>",
|
||||||
|
body="<workplan + workstream summaries>",
|
||||||
|
head="integration/<run_id>",
|
||||||
|
base="main"
|
||||||
|
)
|
||||||
|
→ notify_adapter.send(
|
||||||
|
"Run <run_id> complete. PR ready for review: <pr_url>",
|
||||||
|
context={run_id, goal, workstreams, pr_url}
|
||||||
|
)
|
||||||
|
→ blackboard: update run status → "review"
|
||||||
|
→ halt — no auto-merge
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Build Order
|
||||||
|
|
||||||
|
1. `git submodule add https://github.com/msitarzewski/agency-agents agents/` — pull the talent pool
|
||||||
|
2. `config/role_registry.yaml` — map tier+domain → agent personality files
|
||||||
|
3. `core/task_brief.py` — schema + validation (everything depends on this); include T1 Plan Output Schema
|
||||||
|
4. `core/blackboard.py` — SQLite store, all table definitions including `t3_task_lists`; full event kind vocabulary
|
||||||
|
5. `adapters/base/*` — all four abstract interfaces
|
||||||
|
6. `adapters/llm/anthropic.py` — first LLM implementation
|
||||||
|
7. `core/escalation.py` — retry + failure routing logic (called by tiers, not runner centrally)
|
||||||
|
8. `adapters/runtime/openclaw.py` — wire up sessions_spawn + personality injection
|
||||||
|
9. `adapters/runtime/claude_code.py` — coding agent runtime, personality via --system-prompt
|
||||||
|
10. `core/team_runner.py` — full run lifecycle: spawn loop (monitors briefs table for `status=pending`, calls runtime_adapter.spawn()), gate logic (gate_pending halt, calls notify_adapter.send(), polls for gate_approved/rejected resume), path amendment monitor, T3 mesh timeout → T2 escalation, T1 failure + terminal escalation only
|
||||||
|
11. `cli/agency.py` — run, watch, inspect, approve, reject, pause, resume; `watch` tails blackboard events and renders live log; `inspect` renders run tree
|
||||||
|
12. `prompts/` — fallback tier prompts (used when no agent_personality set)
|
||||||
|
13. `adapters/vcs/github.py` — PR creation + branch management
|
||||||
|
14. `adapters/notify/openclaw.py` — OpenClaw notification adapter; bridges gate summaries and run events to the operator via OpenClaw; manages its own inbound response state for gate approval routing
|
||||||
|
15. `config/team.yaml` — example config with full visibility block
|
||||||
|
16. `README.md` — how to run, how to add adapters, how to extend the roster; include `agency` CLI reference
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of Scope (Phase 2)
|
||||||
|
|
||||||
|
- Cost accounting per tier + run rollup
|
||||||
|
- Parallel workstream progress dashboard
|
||||||
|
- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
|
||||||
|
- Persistent standing teams
|
||||||
|
- Web UI for run monitoring
|
||||||
681
docs/design.md
Normal file
681
docs/design.md
Normal file
@@ -0,0 +1,681 @@
|
|||||||
|
# Tiered Agent Team System — Design Document
|
||||||
|
|
||||||
|
_Started: 2026-03-14. Last updated: 2026-03-30._
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Design Decisions (formerly Open Questions)
|
||||||
|
|
||||||
|
All eight open questions resolved 2026-03-30. Details in Decisions Log.
|
||||||
|
|
||||||
|
1. **T3 mesh mechanics** → Blackboard-based. T3s write draft task lists, read peers', commit merged plan before T4 dispatch. See _T3 Mesh via Blackboard_.
|
||||||
|
|
||||||
|
2. **T1 output schema** → Formal JSON schema defined. See _T1 Plan Output Schema_.
|
||||||
|
|
||||||
|
3. **T5 consensus mechanics** → T3 aggregates all T5 results into a joint verdict. Split verdict (`partial`) triggers retry of failed slices only. See _T5 Consensus & Verdict Schema_.
|
||||||
|
|
||||||
|
4. **Path amendment mechanism** → Amending tier writes a `path_amendment` event to blackboard. Runner monitors events table and notifies the relevant higher tier via a system event. No agent callback plumbing required. See _Path Amendment Mechanism_.
|
||||||
|
|
||||||
|
5. **Failure handling (distributed model)** → Distributed ownership confirmed. Runner only owns T1 failure + terminal human escalation. See updated _Failure Handling_ table.
|
||||||
|
|
||||||
|
6. **Who makes spawn calls for T3+ tiers** → Runner monitors briefs table for `status=pending` rows and makes all spawn calls. "Distributed ownership" means the tier's output determines brief content — runner is the mechanical arm. Gates (hold on `gate_pending`) live naturally in the runner's spawn loop.
|
||||||
|
|
||||||
|
7. **Gate approval UX** → `agency approve <run_id>` CLI writes `gate_approved` directly to the blackboard — the universal path, works on any platform. Runner only cares that the event exists, not how it got there. Notify adapter implementations handle their own inbound response routing (e.g. bridging a chat reply to a CLI call) as internal adapter state — not a core concern.
|
||||||
|
|
||||||
|
8. **T3 mesh timeout** → Escalate to T2 (domain boundary problem, T2 should re-scope). If T2 also exhausts its retry budget, escalates up the normal ladder to T1 → Andrew gate. No force-commit fallback (would hide the problem and cause bad T4 dispatch).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Principles
|
||||||
|
|
||||||
|
**1. Tiers represent cognitive modes, not org chart levels.**
|
||||||
|
Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.
|
||||||
|
|
||||||
|
**2. Depth is proportional to complexity.**
|
||||||
|
Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.
|
||||||
|
|
||||||
|
**3. Goal anchoring at every level.**
|
||||||
|
T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.
|
||||||
|
|
||||||
|
**4. Artifacts, not summaries.**
|
||||||
|
Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.
|
||||||
|
|
||||||
|
**5. Verification is mandatory.**
|
||||||
|
T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.
|
||||||
|
|
||||||
|
**6. Provider agnostic.**
|
||||||
|
The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.
|
||||||
|
|
||||||
|
**7. Specialist talent pool.**
|
||||||
|
Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tier Definitions
|
||||||
|
|
||||||
|
| Tier | Role | Owns | Capability Level |
|
||||||
|
|------|------|------|-----------------|
|
||||||
|
| T1 | Visionary | Goal, constraints, dispatch plan, final acceptance | reasoning-heavy |
|
||||||
|
| T2 | Architect | System design, interface contracts, workstream boundaries | reasoning-heavy / capable |
|
||||||
|
| T3 | Squad Lead | Workstream delivery, T4 management, quality gate | capable |
|
||||||
|
| T4 | Implementer | Atomic task execution (one file, one function, one test) | fast-cheap |
|
||||||
|
| T5 | Verifier | Validation of T4 output — correctness + intent alignment | capable |
|
||||||
|
|
||||||
|
T5 runs **within T3's scope**, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.
|
||||||
|
|
||||||
|
Capability levels map to actual models per provider in config — the core system never references a specific model name.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dispatch Model
|
||||||
|
|
||||||
|
### T1 Owns the Plan
|
||||||
|
|
||||||
|
T1 is not just a decomposer — it is the dispatch planner. Its output declares:
|
||||||
|
|
||||||
|
- **Workstreams** — the decomposed units of work
|
||||||
|
- **Tier path per workstream** — which tiers to engage (e.g. `[T2, T3, T4, T5]` or `[T4, T5]` for trivial tasks)
|
||||||
|
- **Parallelism** — which workstreams are independent and can run concurrently
|
||||||
|
|
||||||
|
T1 does not prescribe how each tier operates internally. That is the tier's own concern.
|
||||||
|
|
||||||
|
### T1 Lifecycle — Two Explicit Phases
|
||||||
|
|
||||||
|
T1 is invoked twice per run, each with a distinct prompt and purpose:
|
||||||
|
|
||||||
|
**Phase 1 — Plan:**
|
||||||
|
1. T1 produces initial dispatch plan (workstreams, tier paths, parallelism, retry budget)
|
||||||
|
2. T1 self-critiques its own plan in a single follow-up pass ("what could go wrong, what did I miss?") and amends
|
||||||
|
3. Amended plan surfaces to Andrew for approval — no T2s spawn until approval is given
|
||||||
|
|
||||||
|
**Phase 2 — Accept:**
|
||||||
|
After the full T2→T3→T4→T5 pipeline completes, T1 is re-invoked with the final output. It validates against the original goal anchor and either accepts (opens PR) or rejects (escalates back down).
|
||||||
|
|
||||||
|
Both phases are named explicitly in the task brief schema and tracked on the blackboard.
|
||||||
|
|
||||||
|
### Each Tier Owns the Layer Below
|
||||||
|
|
||||||
|
Control flow is distributed, not centralised:
|
||||||
|
|
||||||
|
- T1 manages its T2s
|
||||||
|
- T2 Lead manages T2 specialists and their domain boundaries
|
||||||
|
- T2 specialists each own their T3s
|
||||||
|
- **T3 manages its T4s** — including dependency graph, parallelism, and T5 commissioning
|
||||||
|
- The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications
|
||||||
|
|
||||||
|
This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.
|
||||||
|
|
||||||
|
**Tradeoff:** Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.
|
||||||
|
|
||||||
|
### Dynamic Paths
|
||||||
|
|
||||||
|
Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Orchestration Patterns Per Tier
|
||||||
|
|
||||||
|
Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.
|
||||||
|
|
||||||
|
| Tier | Pattern | Rationale |
|
||||||
|
|------|---------|-----------|
|
||||||
|
| T1 | Single agent, two phases | Must be authoritative; plan phase + accept phase |
|
||||||
|
| T2 Lead | Coordinator | Spawned first; defines boundaries + shared assumptions; drives conflict resolution; produces canonical architecture |
|
||||||
|
| T2 Specialists | Parallel fan-out | Each works independently within its domain; reads Lead's boundaries + shared assumptions doc before starting |
|
||||||
|
| T3 | Light mesh | Peer coordination within same T2 domain to negotiate task boundaries before T4 dispatch |
|
||||||
|
| T4 | Swarm + pipeline hybrid | Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which. |
|
||||||
|
| T5 | Parallel fan-out + consensus | Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues |
|
||||||
|
|
||||||
|
### T2 Flow in Detail
|
||||||
|
|
||||||
|
1. T1 spawns **T2 Lead Architect** with goal + workstream context
|
||||||
|
2. Lead defines explicit **domain boundaries** (who owns what, hard edges)
|
||||||
|
3. Lead publishes **shared assumptions doc** — cross-cutting concerns, key conventions, architectural constraints (auth approach, data formats, API patterns, etc.)
|
||||||
|
4. T1 spawns **T2 specialists** with boundaries + shared assumptions baked into their briefs
|
||||||
|
5. Specialists work in parallel, each within their defined domain
|
||||||
|
6. Lead reads all proposals, drives **conflict resolution** with relevant specialists if needed (cycle limit in config — fixed, not per-workstream)
|
||||||
|
7. Lead produces **canonical architecture** → written to blackboard as distinct artifact
|
||||||
|
8. T1 (Accept phase) validates canonical architecture against goal anchor
|
||||||
|
9. Canonical architecture becomes T3 briefs — each T2 specialist hands off to its own T3s
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Horizontal Scaling Within Tiers
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 — Phase 1: Plan (self-critique → Andrew approval)
|
||||||
|
│
|
||||||
|
├── T2: Lead Architect (boundaries + shared assumptions first)
|
||||||
|
│ ├── T2: Backend Architect ─┐
|
||||||
|
│ ├── T2: Frontend Architect ├─ parallel, within defined domains
|
||||||
|
│ └── T2: Infra Architect ─┘
|
||||||
|
│ │
|
||||||
|
│ └── (Lead synthesises → conflict resolution if needed → canonical architecture)
|
||||||
|
│
|
||||||
|
├── T2 Backend Architect owns:
|
||||||
|
│ ├── T3: API Squad Lead ─┐
|
||||||
|
│ └── T3: DB Squad Lead ─┴─ light mesh within domain
|
||||||
|
│ ├── T4: Worker A ─┐
|
||||||
|
│ ├── T4: Worker B ─┼─ swarm / pipeline (T3 decides)
|
||||||
|
│ └── T4: Worker C ─┘
|
||||||
|
│ └── T5: Verifier(s) — fan-out + consensus
|
||||||
|
│
|
||||||
|
└── T1 — Phase 2: Accept (validates against goal anchor → PR)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Use Case Flows
|
||||||
|
|
||||||
|
T1 assesses complexity and prescribes the tier path per workstream. Three standard depth profiles:
|
||||||
|
|
||||||
|
### Full Stack — T1→T2→T3→T4→T5
|
||||||
|
*Complex feature, new product, cross-domain changes*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess complexity (high)
|
||||||
|
→ output T1 Plan Schema (workstreams, tier paths [T2,T3,T4,T5], parallelism, retry budgets)
|
||||||
|
→ self-critique pass
|
||||||
|
→ GATE: surface to Andrew ← approval required
|
||||||
|
|
||||||
|
T2 Lead (spawned by runner after approval)
|
||||||
|
→ receive: goal + full workplan
|
||||||
|
→ publish: domain boundaries + shared assumptions doc → blackboard
|
||||||
|
→ GATE (optional): review boundaries before specialists spawn
|
||||||
|
|
||||||
|
T2 Specialists (parallel fan-out, wait on Lead)
|
||||||
|
→ each receives: their domain boundary + shared assumptions
|
||||||
|
→ produce: architecture proposal for their slice
|
||||||
|
→ Lead synthesises, drives conflict resolution if needed
|
||||||
|
→ Lead writes: canonical architecture → blackboard
|
||||||
|
→ GATE (recommended): review architecture before implementation
|
||||||
|
|
||||||
|
Each T2 Specialist → spawns its own T3s (with canonical architecture slice + interface contracts)
|
||||||
|
|
||||||
|
T3s (light mesh within T2 domain)
|
||||||
|
→ write draft task lists to blackboard
|
||||||
|
→ read peers' lists, reconcile boundaries
|
||||||
|
→ commit merged task plan before T4 dispatch
|
||||||
|
→ GATE (optional): review task breakdown
|
||||||
|
|
||||||
|
T4s
|
||||||
|
→ swarm: independent tasks run in parallel
|
||||||
|
→ pipeline: T4-A output feeds T4-B (T3 declares dependencies)
|
||||||
|
→ commit to feature branches
|
||||||
|
|
||||||
|
T5s (fan-out per T4 slice)
|
||||||
|
→ each reviews its slice independently
|
||||||
|
→ T3 collects results → joint verdict
|
||||||
|
→ GATE (optional): review T5 verdict before T3 marks done
|
||||||
|
→ partial: T3 retries only failed slices
|
||||||
|
→ pass: T3 signals workstream done to T2
|
||||||
|
|
||||||
|
T2 specialists → signal T2 Lead
|
||||||
|
T2 Lead → writes integration summary → blackboard
|
||||||
|
|
||||||
|
T1 Accept
|
||||||
|
→ validate against goal anchor
|
||||||
|
→ open PR, notify_adapter.send(pr summary + url)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Medium Complexity — T1→T3→T4→T5
|
||||||
|
*Config change, isolated bug fix — T1 determines no cross-domain design needed*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: contained scope, single domain, no T2 architecture needed
|
||||||
|
→ workplan: tier paths [T3, T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T3s spawned directly by runner
|
||||||
|
→ receives T1 brief with task context (no T2 architecture layer)
|
||||||
|
→ T3 light mesh → T4 dispatch → T5 verify → signal done
|
||||||
|
|
||||||
|
T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
### Simple / Hotfix — T1→T4→T5
|
||||||
|
*Single file, single function, trivial atomic task*
|
||||||
|
|
||||||
|
```
|
||||||
|
T1 Plan
|
||||||
|
→ assess: trivial, single workstream
|
||||||
|
→ tier path: [T4, T5]
|
||||||
|
→ GATE: Andrew approval
|
||||||
|
|
||||||
|
T4 (coding agent)
|
||||||
|
→ single atomic task, commits
|
||||||
|
|
||||||
|
T5 (single verifier, not full fan-out)
|
||||||
|
→ code review + correctness check
|
||||||
|
→ pass → T1 Accept → PR
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Mechanics
|
||||||
|
|
||||||
|
### T3 Mesh via Blackboard
|
||||||
|
|
||||||
|
T3s coordinate task boundaries before dispatching T4s. All coordination goes through the blackboard — no direct agent-to-agent messaging.
|
||||||
|
|
||||||
|
1. Each T3 writes its **draft task list** to the blackboard (one row per proposed T4 task, status `draft`)
|
||||||
|
2. Each T3 reads all sibling T3 draft lists in its T2 domain
|
||||||
|
3. T3s amend their lists to resolve overlaps (claim tasks, release duplicates)
|
||||||
|
4. Once all T3s in the domain have committed their final task lists (status `committed`), T4 dispatch begins
|
||||||
|
5. No T3 dispatches T4s until all peers in the domain are committed — this prevents duplicate work
|
||||||
|
|
||||||
|
The runner monitors for `all_committed` state and can enforce a timeout (config: `t3_mesh_timeout_minutes`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T1 Plan Output Schema
|
||||||
|
|
||||||
|
T1's Plan phase produces a structured JSON object written to the blackboard. The runner parses this to bootstrap the pipeline.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"run_id": "uuid",
|
||||||
|
"goal_anchor": "Original goal — immutable, propagated to every downstream brief",
|
||||||
|
"complexity": "high | medium | low",
|
||||||
|
"retry_budget_multiplier": 2,
|
||||||
|
"workstreams": [
|
||||||
|
{
|
||||||
|
"id": "ws-backend-api",
|
||||||
|
"name": "Backend API",
|
||||||
|
"domain": "backend",
|
||||||
|
"tier_path": ["t2", "t3", "t4", "t5"],
|
||||||
|
"parallel_group": "A",
|
||||||
|
"t2_specialist": "agents/engineering/engineering-software-architect.md",
|
||||||
|
"notes": "Focus on webhook ingest and retry queue"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"parallelism": {
|
||||||
|
"groups": {
|
||||||
|
"A": ["ws-backend-api", "ws-frontend"],
|
||||||
|
"B": ["ws-infra"]
|
||||||
|
},
|
||||||
|
"sequence": ["A", "B"]
|
||||||
|
},
|
||||||
|
"self_critique_summary": "Brief plain-text summary of what T1 identified and amended in its self-critique pass"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`parallel_group` + `sequence` handles inter-workstream dependencies: group A runs in parallel, then B starts after A completes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T5 Consensus & Verdict Schema
|
||||||
|
|
||||||
|
T3 aggregates all T5 results into a joint verdict after fan-out completes.
|
||||||
|
|
||||||
|
**Individual T5 result:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"verifier_id": "uuid",
|
||||||
|
"scope": "queue-client",
|
||||||
|
"verdict": "pass | fail",
|
||||||
|
"issues": ["issue description..."],
|
||||||
|
"notes": "human-readable summary"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**T3 joint verdict (written to blackboard):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"t5_results": [...],
|
||||||
|
"joint_verdict": "pass | partial | fail",
|
||||||
|
"failed_scopes": ["queue-client"],
|
||||||
|
"summary": "Human-readable summary for gate surface and logs"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Split verdict handling:**
|
||||||
|
- `pass` → T3 marks workstream done, signals T2
|
||||||
|
- `partial` → T3 retries only the failed T4 slices (up to retry budget), re-runs T5 on those slices
|
||||||
|
- `fail` → T3 escalates to T2 (or T1 if shallow path)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Spawn Call Ownership
|
||||||
|
|
||||||
|
The runner is the single point of contact with the runtime adapter. Tiers do not call `sessions_spawn` directly — they write output to the blackboard and the runner acts on it.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
1. A tier completes and writes child briefs to the `briefs` table with `status=pending`
|
||||||
|
2. Runner's spawn loop detects pending rows
|
||||||
|
3. If a gate is configured at this tier boundary → runner writes `gate_pending`, notifies Andrew, halts
|
||||||
|
4. On `gate_approved` → runner calls `runtime_adapter.spawn()` for each pending brief
|
||||||
|
5. Spawned agent runs, writes its own child briefs as pending when done → loop continues
|
||||||
|
|
||||||
|
This keeps gate logic in one place (the runner's spawn loop), makes all spawn calls auditable from a single location, and means agents only need blackboard read/write access — no runtime adapter tool access required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Gate Approval UX
|
||||||
|
|
||||||
|
**Core mechanic (platform-agnostic):**
|
||||||
|
|
||||||
|
1. Runner writes `gate_pending` to blackboard
|
||||||
|
2. Runner calls `notify_adapter.send()` with tier summary + gate context (`run_id`, `gate`, `summary`, `what_happens_next`)
|
||||||
|
3. Runner polls blackboard for `gate_approved` or `gate_rejected`
|
||||||
|
4. `agency approve <run_id>` / `agency reject <run_id> --reason "..."` writes the event directly to the blackboard — the universal approval path, works on any platform with filesystem access
|
||||||
|
|
||||||
|
Runner never reads from a state file, never talks to a notify adapter for inbound responses. It only polls the blackboard.
|
||||||
|
|
||||||
|
**Adapter responsibility:**
|
||||||
|
Each notify adapter handles its own inbound response routing. How a human's approval gets translated into an `agency approve` CLI call is entirely the adapter's concern — not core. Example: an OpenClaw adapter bridges a chat reply to the CLI. A Slack adapter wires up a slash command. A webhook adapter listens on an endpoint. All produce the same result: `gate_approved` written to blackboard.
|
||||||
|
|
||||||
|
Any internal state the adapter needs to resolve ambiguous responses (e.g. which run_id an approval refers to when multiple gates are pending) is managed by the adapter, not the core.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### T3 Mesh Timeout
|
||||||
|
|
||||||
|
If T3s in a domain fail to commit their task lists within `t3_mesh_timeout_minutes`:
|
||||||
|
|
||||||
|
1. **Runner escalates to T2** — writes a `gate_pending` escalation event and notifies the T2 specialist that owns the domain. Context: which T3s timed out, what draft lists (if any) exist on the blackboard. T2 re-scopes or clarifies domain boundaries, spawns fresh T3 briefs.
|
||||||
|
|
||||||
|
2. **If T2 also exhausts its retry budget** → normal escalation ladder: T2 failure → T1 handles → T1 failure → Andrew gate.
|
||||||
|
|
||||||
|
Force-committing partial draft lists (optimistic fallback) is explicitly not done — it hides the boundary problem and produces conflicting or duplicate T4 tasks that fail later with less context.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Path Amendment Mechanism
|
||||||
|
|
||||||
|
When a mid-run tier discovers scope that warrants a different tier path than T1 prescribed:
|
||||||
|
|
||||||
|
1. The discovering tier writes a `path_amendment` event to the blackboard:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"kind": "path_amendment",
|
||||||
|
"proposed_by": "t3/ws-backend-api",
|
||||||
|
"reason": "Discovered auth dependency requires T2 architectural pass",
|
||||||
|
"amendment": {
|
||||||
|
"workstream": "ws-backend-api",
|
||||||
|
"add_tiers": ["t2"],
|
||||||
|
"insert_before": "t3"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
2. The runner monitors the events table, detects `path_amendment`, and sends a system event notification to the relevant higher tier
|
||||||
|
3. The higher tier is **informed, not blocked** — it acknowledges and adjusts its understanding
|
||||||
|
4. Amendment is logged on the blackboard for audit; no approval gate required (the next scheduled human gate will surface it)
|
||||||
|
|
||||||
|
No agent needs callback plumbing. The runner is the notification bridge.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Shared State
|
||||||
|
|
||||||
|
For software pipelines, **the repo is the primary blackboard**:
|
||||||
|
- T4 workers commit to feature branches
|
||||||
|
- T3 leads review and merge to workstream branches
|
||||||
|
- T2 architects own integration branches
|
||||||
|
- T1 does final integration and acceptance
|
||||||
|
|
||||||
|
Supplemented by a SQLite coordination store per run tracking:
|
||||||
|
- In-flight workstreams and their current execution plans
|
||||||
|
- Handoff artifacts and tier status
|
||||||
|
- Retry counts and escalation history
|
||||||
|
- Path amendments (proposed, by whom, timestamp)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failure Handling
|
||||||
|
|
||||||
|
Distributed ownership — each tier handles failures in the tier below it. The runner only handles T1 failure and terminal human escalation.
|
||||||
|
|
||||||
|
| Failure | Owner | Handler | Action |
|
||||||
|
|---------|-------|---------|--------|
|
||||||
|
| T4 bad output | T3 | `escalation.py` called by T3's context | Retry T4 with corrected brief (up to retry_budget) |
|
||||||
|
| T4 blocked | T3 | `escalation.py` | Escalate to T3 immediately — no retries |
|
||||||
|
| T4 partial output | T3 | `escalation.py` | Salvage good parts, re-task remainder |
|
||||||
|
| T5 partial verdict | T3 | T3 joint verdict logic | Retry failed T4 slices only |
|
||||||
|
| T5 full fail | T3 | T3 joint verdict logic | Escalate to T2 |
|
||||||
|
| T3 workstream stuck | T2 | T2 specialist prompt + blackboard | Re-scope or split the workstream |
|
||||||
|
| T2 design wrong | T1 | T1 Accept phase + blackboard | Re-plan; may discard workstream and restart |
|
||||||
|
| T1 failure / crash | Runner | `team_runner.py` | Surface to human, halt run |
|
||||||
|
| Repeated escalation | Runner | `team_runner.py` | Gate: block until human unblocks |
|
||||||
|
|
||||||
|
**Key distinction:** `escalation.py` is not called by the runner centrally. It is logic that tier agents execute (or the runner executes on their behalf when it detects a timeout or dead agent). The runner only owns the last two rows.
|
||||||
|
|
||||||
|
Retry limits prevent infinite loops. Escalation path is always upward, never sideways.
|
||||||
|
|
||||||
|
T1 sets a retry budget multiplier during scope assessment (`1x` simple, `2x` complex). Retry budget is a field on the task brief — not hardcoded in the runner.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Talent Pool
|
||||||
|
|
||||||
|
The system builds on [agency-agents](https://github.com/msitarzewski/agency-agents) — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.
|
||||||
|
|
||||||
|
**Division of responsibility:**
|
||||||
|
- Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
|
||||||
|
- Agency-agents provides: the specialist knowledge each agent brings to its role
|
||||||
|
|
||||||
|
T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.
|
||||||
|
|
||||||
|
**Default tier-to-specialist mapping for software pipelines:**
|
||||||
|
|
||||||
|
| Tier | Domain | Agent |
|
||||||
|
|------|--------|-------|
|
||||||
|
| T1 | Strategy | nexus-strategy |
|
||||||
|
| T2 | Backend | software-architect |
|
||||||
|
| T2 | Infra | devops-automator |
|
||||||
|
| T2 | Data | data-engineer |
|
||||||
|
| T3 | Backend | senior-developer |
|
||||||
|
| T3 | Reliability | sre |
|
||||||
|
| T4 | Frontend | frontend-developer |
|
||||||
|
| T4 | Backend | backend-architect |
|
||||||
|
| T4 | Database | database-optimizer |
|
||||||
|
| T4 | DevOps | devops-automator |
|
||||||
|
| T4 | Mobile | mobile-app-builder |
|
||||||
|
| T4 | AI/ML | ai-engineer |
|
||||||
|
| T4 | Security | security-engineer |
|
||||||
|
| T4 | Docs | technical-writer |
|
||||||
|
| T5 | Code review | code-reviewer |
|
||||||
|
| T5 | Integration | testing-reality-checker |
|
||||||
|
| T5 | API | testing-api-tester |
|
||||||
|
| T5 | Performance | testing-performance-benchmarker |
|
||||||
|
| T5 | Security | security-engineer |
|
||||||
|
|
||||||
|
The roster is not fixed — T1 can select any agent from the library based on workstream needs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adapter Layers
|
||||||
|
|
||||||
|
Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.
|
||||||
|
|
||||||
|
```
|
||||||
|
Core (platform-agnostic)
|
||||||
|
├── team_runner — thin bootstrap: spawn T1, monitor blackboard, handle result
|
||||||
|
├── blackboard — SQLite coordination state
|
||||||
|
├── task_brief — schema + validation
|
||||||
|
└── escalation — retry logic, failure routing
|
||||||
|
|
||||||
|
Adapters (swappable)
|
||||||
|
├── llm/ — anthropic (now), openai, ollama, any API
|
||||||
|
├── notify/ — openclaw (now), slack, email, webhook...
|
||||||
|
├── vcs/ — github (now), gitlab, gitea, bare git...
|
||||||
|
└── runtime/
|
||||||
|
├── standard — openclaw sessions_spawn (T1/T2/T3)
|
||||||
|
└── coding_agent — claude_code (T4/T5 default), codex, aider...
|
||||||
|
```
|
||||||
|
|
||||||
|
Swapping providers means writing a new adapter file — nothing in core changes.
|
||||||
|
|
||||||
|
T4 and T5 default to the **coding agent runtime** when available. Falls back to standard runtime gracefully if not configured.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Run Visibility Layer
|
||||||
|
|
||||||
|
Designed for debugging, test runs, and quality evaluation at each tier. Three interlocking components.
|
||||||
|
|
||||||
|
### 1. Human-Readable Live Log
|
||||||
|
|
||||||
|
Structured events from the blackboard rendered as a timestamped, readable stream. `agency watch <run_id>` tails this live.
|
||||||
|
|
||||||
|
```
|
||||||
|
[abc123] 12:30:01 T1 PLAN_START Assessing scope: "Build webhook ingestion system"
|
||||||
|
[abc123] 12:30:14 T1 PLAN_DONE 3 workstreams — backend-api, infra, docs (2 parallel)
|
||||||
|
[abc123] 12:30:14 GATE APPROVAL ⏸ Waiting on approval before T2 spawns
|
||||||
|
[abc123] 12:31:02 GATE APPROVED ✓ Approved — continuing
|
||||||
|
[abc123] 12:31:03 T2 LEAD_START Lead Architect spawned
|
||||||
|
[abc123] 12:31:41 T2 BOUNDS_READY Domain boundaries + shared assumptions published
|
||||||
|
[abc123] 12:31:42 T2 SPEC_START 3 specialists spawned (parallel): backend, infra, docs
|
||||||
|
[abc123] 12:32:15 T2 SPEC_DONE backend-api architecture draft ready
|
||||||
|
[abc123] 12:32:58 T2 SYNTH_DONE Canonical architecture written to blackboard
|
||||||
|
[abc123] 12:32:58 GATE INSPECTION ⏸ T2 synthesis ready for review
|
||||||
|
[abc123] 12:33:44 T3 MESH_START backend-api: 2 squad leads negotiating task boundaries
|
||||||
|
[abc123] 12:34:01 T3 MESH_DONE Task split committed — 7 T4 tasks (5 swarm, 2 pipeline)
|
||||||
|
[abc123] 12:34:02 T4 SWARM_START 5 workers spawned in parallel
|
||||||
|
[abc123] 12:35:10 T4 DONE worker-3 auth-middleware ✓
|
||||||
|
[abc123] 12:35:22 T4 FAIL worker-4 queue-client ✗ (retry 1/3)
|
||||||
|
[abc123] 12:36:04 T4 DONE worker-4 queue-client ✓ (retry resolved)
|
||||||
|
[abc123] 12:36:05 T5 VERIFY_START 4 verifiers spawned
|
||||||
|
[abc123] 12:36:45 T5 VERDICT partial — queue-client needs rework
|
||||||
|
[abc123] 12:37:12 T5 VERDICT ✓ all pass — workstream backend-api done
|
||||||
|
```
|
||||||
|
|
||||||
|
Log level `verbose` adds per-T4-start/done lines. Default is `normal` (tier-level events only).
|
||||||
|
|
||||||
|
### 2. Inspection Gates
|
||||||
|
|
||||||
|
Configurable pause points. When the runner hits a gate, it:
|
||||||
|
1. Writes a `gate_pending` event to the blackboard
|
||||||
|
2. Fires `notify_adapter.send()` with the tier summary + gate context
|
||||||
|
3. Halts — no next tier spawns until `gate_approved` or `gate_rejected` is written
|
||||||
|
|
||||||
|
The tier summary surfaced at each gate includes:
|
||||||
|
- **What was produced** (the tier artifact in readable form)
|
||||||
|
- **What happens next** (which agents will spawn, doing what)
|
||||||
|
- **Any anomalies** flagged by the tier itself
|
||||||
|
|
||||||
|
Configurable in `team.yaml` under `visibility.inspection_gates`. A `strict_mode: true` flag enables all gates — recommended for first runs on a new codebase or new goal type.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
visibility:
|
||||||
|
strict_mode: false
|
||||||
|
log_level: normal # normal | verbose
|
||||||
|
inspection_gates:
|
||||||
|
t1_plan: true # always — required by design
|
||||||
|
t2_lead: false # optional — review boundaries before specialists
|
||||||
|
t2_synthesis: true # recommended — review architecture before implementation
|
||||||
|
t3_plan: false # verbose — useful early on, disable once T3 is trusted
|
||||||
|
t5_verdict: false # review T5 joint verdict before T3 marks workstream done
|
||||||
|
gate_timeout_minutes: 60 # auto-reject if no response within this window
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Inspection CLI — `cli/agency.py`
|
||||||
|
|
||||||
|
```
|
||||||
|
agency run <config.yaml> # start a run, returns run_id
|
||||||
|
agency watch <run_id> # tail live log (follows blackboard events)
|
||||||
|
agency inspect <run_id> # interactive tree view of run state
|
||||||
|
agency inspect <run_id> --tier t2 # jump to T2 artifacts
|
||||||
|
agency inspect <run_id> --brief <id> # show full brief + result JSON
|
||||||
|
|
||||||
|
agency approve <run_id> # approve current gate → continue
|
||||||
|
agency approve <run_id> --note "..." # approve with a note written to blackboard
|
||||||
|
agency reject <run_id> --reason "..." # reject → tier re-invoked
|
||||||
|
agency pause <run_id> # force-pause at next tier boundary
|
||||||
|
agency resume <run_id> # release a manual pause
|
||||||
|
```
|
||||||
|
|
||||||
|
`agency inspect` (no flags) renders a live tree:
|
||||||
|
```
|
||||||
|
Run abc123 — "Build webhook ingestion system"
|
||||||
|
├── T1 Plan ✓
|
||||||
|
│ └── [view workplan]
|
||||||
|
├── T2 Architecture ✓ [GATE: pending review]
|
||||||
|
│ ├── [view domain boundaries]
|
||||||
|
│ ├── [view shared assumptions]
|
||||||
|
│ └── [view canonical architecture]
|
||||||
|
├── T3 backend-api (active)
|
||||||
|
│ ├── [view task breakdown]
|
||||||
|
│ └── T4 workers: 3/7 done, 1 retrying, 3 pending
|
||||||
|
└── T3 infra (pending)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Blackboard Event Vocabulary (extended)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# existing
|
||||||
|
"spawned" | "completed" | "failed" | "escalated" | "retried"
|
||||||
|
|
||||||
|
# new — visibility layer
|
||||||
|
"gate_pending" # runner hit a gate, waiting for human
|
||||||
|
"gate_approved" # human approved, run continues
|
||||||
|
"gate_rejected" # human rejected, tier re-invoked
|
||||||
|
"gate_paused" # manual pause via CLI
|
||||||
|
"gate_resumed" # manual resume via CLI
|
||||||
|
"path_amendment" # mid-run tier proposed path change
|
||||||
|
"log" # human-readable log line (level + message)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decisions Log
|
||||||
|
|
||||||
|
**T1 dynamic dispatch** — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.
|
||||||
|
|
||||||
|
**T1 two-phase lifecycle** — T1 has two explicit named phases: Plan and Accept. Plan phase includes self-critique (single pass) then human approval gate before T2s spawn. Accept phase validates final output against goal anchor. Both phases tracked on blackboard with distinct prompts.
|
||||||
|
|
||||||
|
**T1 self-critique** — Single pass only. Diminishing returns on multiple self-critique iterations; the human review after is the real safety net. Self-critique catches obvious gaps; Andrew catches strategic ones.
|
||||||
|
|
||||||
|
**Distributed ownership** — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.
|
||||||
|
|
||||||
|
**T5 always mandatory** — No skipping verification. Things should work and work well before surfacing to T1.
|
||||||
|
|
||||||
|
**T3 owns T4 and T5** — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.
|
||||||
|
|
||||||
|
**T2 Lead Architect** — Dedicated T2 role, not a new tier. Spawned first by T1. Owns: domain boundary definition, shared assumptions doc, conflict resolution between specialists, canonical architecture synthesis. Specialists spawn after Lead publishes boundaries + assumptions. Each T2 specialist owns its own T3s — no T3 spans T2 domains.
|
||||||
|
|
||||||
|
**T2 conflict resolution** — Lead sends targeted briefs back to conflicting specialists. Cycle limit is a fixed config value (not per-workstream). Single T1 self-critique parallel: fixed limit, not variable.
|
||||||
|
|
||||||
|
**T2 shared assumptions** — Lead publishes cross-cutting concerns (auth, data formats, API conventions, etc.) before specialists start. Specialists design with shared baseline; implicit dependencies pre-empted rather than caught in synthesis.
|
||||||
|
|
||||||
|
**Orchestration patterns** — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: Lead + parallel specialists. T3: light mesh within T2 domain. T4: swarm+pipeline. T5: fan-out+consensus.
|
||||||
|
|
||||||
|
**Output / review** — Nothing merges to main without explicit human approval. T1 opens a PR and fires `notify_adapter.send()` with the PR summary. Merge is gated on human sign-off. The notify adapter implementation determines how the notification is delivered.
|
||||||
|
|
||||||
|
**Platform agnosticism** — Core is provider and platform agnostic. Capability levels (`reasoning-heavy`, `capable`, `fast-cheap`) map to models in config. Mixing providers across tiers is supported.
|
||||||
|
|
||||||
|
**LLM provider** — Anthropic first implementation. Config supports per-tier provider selection.
|
||||||
|
|
||||||
|
**Gateway modification** — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.
|
||||||
|
|
||||||
|
**Coding agent runtime** — Claude Code is default T4/T5 runtime. Opt-in `native_teams` flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default `false`.
|
||||||
|
|
||||||
|
**Agency-agents integration** — Via git submodule at `agents/`. T1 selects specialists via `config/role_registry.yaml`. `agent_personality` field on task brief; runtime injects as system prompt at spawn time.
|
||||||
|
|
||||||
|
**Spawn call ownership** — Runner is the single point of contact with the runtime adapter. Tiers write `status=pending` child briefs to the blackboard; runner's spawn loop detects and spawns them. Gate logic (hold on `gate_pending`) lives in the spawn loop — no gate plumbing needed in agents. Agents only need blackboard read/write access.
|
||||||
|
|
||||||
|
**Gate approval UX** — `agency approve <run_id>` CLI is the universal approval path — writes `gate_approved` directly to blackboard. Runner only polls blackboard; it does not depend on any specific notification platform. Each notify adapter handles its own inbound response bridge as internal adapter state. Core has no `pending_gates.json` or platform-specific approval logic.
|
||||||
|
|
||||||
|
**T3 mesh timeout** — Escalate to T2 (the specialist that owns the domain). Timeout means T3s can't agree on task boundaries — a domain boundary problem T2 should fix by re-scoping. If T2 exhausts its retry budget, normal escalation ladder handles it (T1 → Andrew gate). No force-commit fallback.
|
||||||
|
|
||||||
|
**T3 mesh mechanics** — Blackboard-based coordination. T3s write draft task lists, read peers', reconcile overlaps, commit merged plan. No T4 dispatch until all T3s in the domain have committed. Runner enforces timeout (`t3_mesh_timeout_minutes` in config). Chosen over designated T3 lead or direct messaging — fits distributed ownership model, gives full audit trail for free.
|
||||||
|
|
||||||
|
**T1 output schema** — Formal JSON schema defined (2026-03-30). Fields: `run_id`, `goal_anchor`, `complexity`, `retry_budget_multiplier`, `workstreams[]` (id, name, domain, tier_path, parallel_group, t2_specialist, notes), `parallelism` (groups + sequence), `self_critique_summary`. `parallel_group` + `sequence` handles inter-workstream dependencies.
|
||||||
|
|
||||||
|
**T5 consensus** — T3 aggregates all T5 results into joint verdict: `pass | partial | fail`. Split verdict (`partial`) → T3 retries only failed slices, re-runs T5 on those slices. Full `fail` escalates to T2. T3 writes structured joint verdict to blackboard; this is what the optional T5 gate surfaces to Andrew.
|
||||||
|
|
||||||
|
**Path amendment mechanism** — Amending tier writes `path_amendment` event to blackboard (structured JSON: proposed_by, reason, amendment). Runner monitors events table, sends system event notification to relevant higher tier. Higher tier is informed, not blocked. No agent callback plumbing. Amendments surface at next scheduled human gate.
|
||||||
|
|
||||||
|
**Failure handling (distributed)** — Confirmed distributed ownership (2026-03-30). `escalation.py` is logic tiers execute (or runner executes on tier's behalf on timeout/crash), not a central runner concern. Runner only owns: T1 failure, terminal human escalation. See updated Failure Handling table.
|
||||||
|
|
||||||
|
**Run visibility layer** — Added 2026-03-30. Human-readable live log, configurable inspection gates, and `cli/agency.py` inspection/control commands. Designed for debugging and quality evaluation at each tier during early runs. `strict_mode: true` enables all gates. Gates surface tier artifacts + "what happens next" summary via `notify_adapter.send()` — platform-agnostic. Resolves Q3 (T5 consensus surfaces as gate event with human-readable summary). T5 gate (optional) lets the operator review joint verdict before T3 marks workstream done.
|
||||||
Reference in New Issue
Block a user