13 KiB
Tiered Agent Team System — Build Spec
Started: 2026-03-15. Status: Pre-build. See agent-teams-design.md for the design doc and decisions log.
Language & Runtime
Python 3.11+. Reasons:
- Agent/AI tooling is Python-first
- Clean type hints + dataclasses for schemas
- Agents can read and modify their own orchestration code
- Runs anywhere — no Node, no OpenClaw dependency
Repository
Standalone repo: git@github.com:coding-with-hans-heinemann/the-agency.git
Separate from the OpenClaw workspace. OpenClaw workspace gets a thin integration layer that calls into it. Core is portable and runnable without OpenClaw.
Directory Structure
agent-teams/
├── core/
│ ├── team_runner.py — run lifecycle, agent spawning
│ ├── blackboard.py — SQLite coordination state
│ ├── task_brief.py — schema + validation
│ └── escalation.py — retry logic, failure routing
│
├── adapters/
│ ├── base/
│ │ ├── llm.py — abstract LLM interface
│ │ ├── vcs.py — abstract VCS interface
│ │ ├── notify.py — abstract notification interface
│ │ └── runtime.py — abstract agent runtime interface
│ ├── llm/
│ │ ├── anthropic.py — Claude via OpenClaw or direct API
│ │ ├── openai.py — GPT / o-series
│ │ └── ollama.py — local models
│ ├── vcs/
│ │ └── github.py
│ ├── notify/
│ │ └── openclaw.py — messages Hans who notifies Andrew
│ └── runtime/
│ ├── openclaw.py — sessions_spawn (general purpose)
│ └── claude_code.py — coding agent runtime (file/git/exec tools)
│
├── agents/ — git submodule: msitarzewski/agency-agents
│ ├── engineering/
│ ├── testing/
│ ├── strategy/
│ └── ... — full agency-agents roster
│
├── prompts/
│ ├── t1_visionary.md — fallback if no agent_personality set
│ ├── t2_architect.md
│ ├── t3_squad_lead.md
│ ├── t4_implementer.md
│ └── t5_verifier.md
│
├── config/
│ ├── team.yaml — example run configuration
│ └── role_registry.yaml — maps (tier, domain) → agent personality file
│
├── runs/ — runtime state, one subdir per run_id
│ └── .gitkeep
│
└── README.md
Blackboard
SQLite. One file per run at runs/<run_id>/blackboard.db.
Tables
runs
CREATE TABLE runs (
run_id TEXT PRIMARY KEY,
goal TEXT NOT NULL,
status TEXT NOT NULL, -- pending | active | review | done | failed
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
workstreams
CREATE TABLE workstreams (
workstream_id TEXT PRIMARY KEY,
run_id TEXT NOT NULL,
name TEXT NOT NULL,
tier INTEGER NOT NULL,
status TEXT NOT NULL, -- pending | active | blocked | done | failed
owner_agent_id TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
briefs
CREATE TABLE briefs (
brief_id TEXT PRIMARY KEY,
run_id TEXT NOT NULL,
parent_brief_id TEXT,
workstream_id TEXT,
tier INTEGER NOT NULL,
role TEXT NOT NULL,
status TEXT NOT NULL, -- pending | active | done | failed
payload TEXT NOT NULL, -- full JSON brief
result TEXT, -- JSON result when done
retry_count INTEGER DEFAULT 0,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
events
CREATE TABLE events (
event_id TEXT PRIMARY KEY,
run_id TEXT NOT NULL,
brief_id TEXT,
kind TEXT NOT NULL, -- spawned | completed | failed | escalated | retried
detail TEXT, -- JSON
created_at TEXT NOT NULL
);
Task Brief Schema
Every brief passed between tiers is a validated JSON object. goal_anchor is immutable — set by T1, copied verbatim into every downstream brief.
{
"brief_id": "uuid",
"run_id": "uuid",
"parent_brief_id": "uuid | null",
"tier": 4,
"role": "implementer",
"goal_anchor": "Original T1 intent — always propagated unchanged",
"workstream": "backend-api",
"task": "Implement POST /webhooks/ingest endpoint",
"acceptance_criteria": [
"Accepts JSON payload",
"Returns 202 on success",
"Writes to queue"
],
"constraints": [
"Use existing queue client in src/queue.py",
"No new dependencies"
],
"context": {
"relevant_files": ["src/routes/webhooks.py", "src/queue.py"],
"interface_contract": "..."
},
"retry_budget": 3,
"retry_count": 0,
"preferred_runtime": "coding_agent",
"agent_personality": "agents/engineering/engineering-code-reviewer.md",
"created_at": "ISO-8601"
}
preferred_runtime is optional. T3 sets it to "coding_agent" when spawning T4/T5 for implementation or verification tasks. Runner falls back to "standard" if the coding agent runtime is not configured.
agent_personality is optional. When set, the runtime adapter reads the file and injects its contents as the system prompt at spawn time. Falls back to the generic tier prompt in prompts/ if not set.
Adapter Interfaces
LLM (adapters/base/llm.py)
class LLMAdapter:
def complete(self, prompt: str, capability: str, context: dict) -> str
def resolve_model(self, capability: str) -> str
# capability: "reasoning-heavy" | "capable" | "fast-cheap"
VCS (adapters/base/vcs.py)
class VCSAdapter:
def create_branch(self, name: str) -> None
def commit(self, files: list[str], message: str) -> str # returns commit sha
def create_pr(self, title: str, body: str, head: str, base: str) -> str # returns pr url
def get_pr_status(self, pr_id: str) -> str # open | merged | closed
Notify (adapters/base/notify.py)
class NotifyAdapter:
def send(self, message: str, context: dict) -> None
Runtime (adapters/base/runtime.py)
class RuntimeAdapter:
def spawn(self, task: str, capability: str, context: dict) -> str # returns agent_id
def get_result(self, agent_id: str, timeout_s: int) -> dict
def kill(self, agent_id: str) -> None
# Two implementations:
# openclaw.py — general purpose, uses sessions_spawn, suits T1/T2/T3
# claude_code.py — coding-specialized, has file/git/exec tools, suits T4/T5
#
# The runner selects runtime based on brief.preferred_runtime:
# "standard" → openclaw.py (default)
# "coding_agent" → claude_code.py (falls back to standard if unavailable)
#
# Both implementations inject brief.agent_personality as the system prompt
# when spawning, if present. Falls back to generic tier prompt otherwise.
# claude_code.py passes the agent file via --system-prompt flag natively
# (agency-agents was designed for Claude Code's agents/ directory).
Run Config (config/team.yaml)
run:
goal: "Build webhook ingestion system with retry logic and DLQ"
repo: "git@github.com:org/repo.git"
base_branch: "main"
adapters:
llm: anthropic
vcs: github
notify: openclaw
runtime: openclaw
models:
provider: anthropic # default provider
capability_map:
reasoning-heavy:
anthropic: claude-opus-4-6
openai: o3
capable:
anthropic: claude-sonnet-4-6
openai: gpt-4o
ollama: llama3.1:70b
fast-cheap:
anthropic: claude-haiku-3-5
openai: gpt-4o-mini
ollama: llama3.2
# optional: override provider per tier
tier_overrides:
t1: { provider: openai, capability: reasoning-heavy }
t4: { provider: ollama, capability: fast-cheap }
runtime:
default: openclaw
coding_agent: claude_code # used for T4/T5 when available; omit to disable
native_teams: false # Claude Code's experimental agent teams — opt-in only
# when true: T3 hands full workstream to Claude Code,
# which fans out internally. faster but less blackboard
# visibility. default: false (explicit T4 spawning)
# tier_runtime_map (optional overrides):
# t1: standard
# t2: standard
# t3: standard
# t4: coding_agent
# t5: coding_agent
retry_defaults:
bad_output: 3
partial: 2
blocked: 0 # always escalate immediately
Role Registry (config/role_registry.yaml)
Maps (tier, domain) → agent personality file. T1 consults this during scope assessment when selecting specialists for each workstream brief. Adding a new specialist means adding one entry here — no core changes.
t1:
default: agents/strategy/nexus-strategy.md
t2:
backend: agents/engineering/engineering-software-architect.md
frontend: agents/engineering/engineering-software-architect.md
infra: agents/engineering/engineering-devops-automator.md
data: agents/engineering/engineering-data-engineer.md
default: agents/engineering/engineering-software-architect.md
t3:
backend: agents/engineering/engineering-senior-developer.md
frontend: agents/engineering/engineering-senior-developer.md
infra: agents/engineering/engineering-sre.md
default: agents/engineering/engineering-senior-developer.md
t4:
frontend: agents/engineering/engineering-frontend-developer.md
backend: agents/engineering/engineering-backend-architect.md
database: agents/engineering/engineering-database-optimizer.md
devops: agents/engineering/engineering-devops-automator.md
mobile: agents/engineering/engineering-mobile-app-builder.md
ai: agents/engineering/engineering-ai-engineer.md
security: agents/engineering/engineering-security-engineer.md
docs: agents/engineering/engineering-technical-writer.md
default: agents/engineering/engineering-senior-developer.md
t5:
code: agents/engineering/engineering-code-reviewer.md
integration: agents/testing/testing-reality-checker.md
api: agents/testing/testing-api-tester.md
performance: agents/testing/testing-performance-benchmarker.md
security: agents/engineering/engineering-security-engineer.md
default: agents/engineering/engineering-code-reviewer.md
Key Flows
1. Run Kickoff
User → Hans → team_runner.start(goal, config)
→ generate run_id
→ init blackboard (create runs/<run_id>/blackboard.db)
→ build T1 brief (goal_anchor = goal, retry_budget from config)
→ spawn T1 via runtime adapter
→ await T1 workplan
2. T1 Scope Assessment
T1 receives brief
→ assess complexity → decide depth
→ identify workstreams
→ set retry_budget multiplier per workstream (1x simple, 2x complex)
→ emit N workstream briefs for T2 (or T3 if shallow)
→ write workplan to blackboard
→ team_runner spawns T2s in parallel
3. T4 Retry Loop (escalation.py)
spawn T4 with brief
→ receive result
→ classify: bad_output | blocked | partial | success
blocked:
→ log event(escalated)
→ pass to T3 immediately
bad_output, retries_remaining:
→ amend brief with failure context, increment retry_count
→ re-spawn T4
→ log event(retried)
bad_output, retries_exhausted:
→ log event(escalated)
→ pass to T3
partial:
→ write salvageable parts to blackboard
→ re-task remainder with new brief
success:
→ write result to blackboard
→ log event(completed)
→ notify T3
4. Review Gate
T1 completes integration
→ vcs_adapter.create_pr(
title="[agent-teams] <run_id>: <goal summary>",
body="<workplan + workstream summaries>",
head="integration/<run_id>",
base="main"
)
→ notify_adapter.send(
"Run <run_id> complete. PR ready for review: <pr_url>",
context={run_id, goal, workstreams, pr_url}
)
→ blackboard: update run status → "review"
→ halt — no auto-merge
Build Order
git submodule add https://github.com/msitarzewski/agency-agents agents/— pull the talent poolconfig/role_registry.yaml— map tier+domain → agent personality filescore/task_brief.py— schema + validation (everything depends on this)core/blackboard.py— SQLite store, all table definitionsadapters/base/*— all four abstract interfacesadapters/llm/anthropic.py— first LLM implementationcore/escalation.py— retry + failure routing logicadapters/runtime/openclaw.py— wire up sessions_spawn + personality injectionadapters/runtime/claude_code.py— coding agent runtime, personality via --system-promptcore/team_runner.py— full run lifecycle, runtime + personality selectionprompts/— fallback tier prompts (used when no agent_personality set)adapters/vcs/github.py— PR creation + branch managementadapters/notify/openclaw.py— Hans notificationconfig/team.yaml— example configREADME.md— how to run, how to add adapters, how to extend the roster
Out of Scope (Phase 2)
- Cost accounting per tier + run rollup
- Parallel workstream progress dashboard
- Additional adapter implementations (GitLab, Slack, OpenAI, Ollama)
- Persistent standing teams
- Web UI for run monitoring