cw-hans/the-agency

Fork 0

Files

Hans Heinemann 1ed7023c08 docs: update design — dynamic dispatch, distributed ownership, orchestration patterns

2026-03-16 16:13:33 -04:00

11 KiB

Raw Blame History

Tiered Agent Team System — Design Document

Started: 2026-03-14. Last updated: 2026-03-16.

Overview

A dynamic, hierarchical multi-agent system for software pipelines. Teams assemble on demand, execute, then disband. Inspired by a blend of Hollywood production (dynamic assembly), consulting firms (structured deliverables, hierarchical synthesis), and two-pizza teams (small autonomous squads, clear domain ownership).

Core Principles

1. Tiers represent cognitive modes, not org chart levels. Each tier thinks differently — strategy, design, coordination, execution, verification. Adding a tier only makes sense if it introduces a genuinely different mode of reasoning.

2. Depth is proportional to complexity. Not every task needs every tier. A config change might only need T3→T4. A new product needs the full stack. T1 assesses scope and prescribes the path — it is never pre-configured.

3. Goal anchoring at every level. T1's original intent is embedded in every agent's context — not just passed to T2 and forgotten. Every agent knows the end goal even if they only own a slice.

4. Artifacts, not summaries. Tiers pass structured specs downward (JSON task briefs), not paraphrased prose. Meaning is preserved; format is compressed.

5. Verification is mandatory. T5 always runs. Nothing returns to T1 unverified. T5 is a quality gate, not optional — things should work and work well before they surface upward.

6. Provider agnostic. The system makes no assumptions about which LLM provider or platform is in use. Tiers reference capability levels, not specific models. All external dependencies are swappable adapters.

7. Specialist talent pool. Tiers define structure and responsibility. Agent personalities define domain expertise. The two are separate — the same tier can be filled by different specialists depending on the workstream domain.

Tier Definitions

Tier	Role	Owns	Capability Level
T1	Visionary	Goal, constraints, dispatch plan, final acceptance	reasoning-heavy
T2	Architect	System design, interface contracts, workstream boundaries	reasoning-heavy / capable
T3	Squad Lead	Workstream delivery, T4 management, quality gate	capable
T4	Implementer	Atomic task execution (one file, one function, one test)	fast-cheap
T5	Verifier	Validation of T4 output — correctness + intent alignment	capable

T5 runs within T3's scope, not above it. T3 commissions T5 verification of its T4 outputs. T5 is a quality gate, not a management layer.

Capability levels map to actual models per provider in config — the core system never references a specific model name.

Dispatch Model

T1 Owns the Plan

T1 is not just a decomposer — it is the dispatch planner. Its output declares:

Workstreams — the decomposed units of work
Tier path per workstream — which tiers to engage (e.g. [T2, T3, T4, T5] or [T4, T5] for trivial tasks)
Parallelism — which workstreams are independent and can run concurrently

T1 does not prescribe how each tier operates internally. That is the tier's own concern.

Each Tier Owns the Layer Below

Control flow is distributed, not centralised:

T1 manages its T2s
T2 manages its T3s
T3 manages its T4s — including dependency graph, parallelism, and T5 commissioning
The runner is thin: bootstrap T1, monitor the blackboard, handle final result and notifications

This means orchestration logic lives in agent prompts and output schemas — not in Python runner code. Adding a new execution pattern means updating a prompt, not the runner.

Tradeoff: Debugging is harder. When something fails mid-chain, you read blackboard logs rather than step through central runner code. This is a tooling problem to solve (good blackboard inspection), not a design flaw to avoid.

Dynamic Paths

Tiers can propose path amendments mid-run (e.g. T3 discovers scope that warrants a T2 pass it didn't get). Amendments are logged to the blackboard. Higher tiers are notified but do not need to approve — it is informational. No tier silently changes the plan.

Orchestration Patterns Per Tier

Different tiers suit different internal coordination patterns. These are baked into the runner's tier-handling logic and the tier prompts — not prescribed by T1.

Tier	Pattern	Rationale
T1	Single agent	Must be authoritative; no committee
T2	Group chat / round-table	Specialist architects (security, perf, data, API) debate and reach consensus before committing to a design
T3	Light mesh	Peer coordination to negotiate task boundaries and avoid T4 conflicts before dispatch
T4	Swarm + pipeline hybrid	Independent tasks run as swarm; dependent tasks pipeline (T4-A's output feeds T4-B). T3 declares which is which.
T5	Parallel fan-out + consensus	Each T5 reviews its slice independently, then compares notes for a joint verdict — catches both artifact bugs and integration issues

Horizontal Scaling Within Tiers

T1 (1 agent — authoritative)
├── T2: Backend Architect  ─┐
├── T2: Frontend Architect  ├─ round-table consensus
└── T2: Infra Architect    ─┘
    │
    └── T3: Squad Lead (per workstream)  ─┐
            │                             ├─ light mesh across T3s
            ├── T4: Worker A  ─┐          │
            ├── T4: Worker B  ─┼─ swarm / pipeline (T3 decides)
            └── T4: Worker C  ─┘
                    │
                    └── T5: Verifier(s) — fan-out + consensus

Shared State

For software pipelines, the repo is the primary blackboard:

T4 workers commit to feature branches
T3 leads review and merge to workstream branches
T2 architects own integration branches
T1 does final integration and acceptance

Supplemented by a SQLite coordination store per run tracking:

In-flight workstreams and their current execution plans
Handoff artifacts and tier status
Retry counts and escalation history
Path amendments (proposed, by whom, timestamp)

Failure Handling

Failure	Handler	Action
T4 bad output	T3	Retry T4 with corrected brief (up to retry_budget)
T4 blocked	T3	Escalate immediately — no retries
T4 partial output	T3	Salvage good parts, re-task remainder
T3 workstream stuck	T2	Re-scope or split the workstream
T2 design wrong	T1	Re-plan; may discard workstream and restart
Repeated escalation	Surface to user	Block until human unblocks

Retry limits prevent infinite loops. Escalation path is always upward, never sideways.

T1 sets a retry budget multiplier during scope assessment (1x simple, 2x complex). Retry budget is a field on the task brief — not hardcoded in the runner.

Agent Talent Pool

The system builds on agency-agents — a library of 50+ pre-built specialist personalities, each with deep domain expertise, quality standards, and specific deliverables.

Division of responsibility:

Our system provides: orchestration, tier structure, task briefs, retries, verification gates, shared state
Agency-agents provides: the specialist knowledge each agent brings to its role

T1 selects the right specialist from the roster when building workstream briefs. The specialist's personality is injected as the system prompt at spawn time.

Default tier-to-specialist mapping for software pipelines:

Tier	Domain	Agent
T1	Strategy	nexus-strategy
T2	Backend	software-architect
T2	Infra	devops-automator
T2	Data	data-engineer
T3	Backend	senior-developer
T3	Reliability	sre
T4	Frontend	frontend-developer
T4	Backend	backend-architect
T4	Database	database-optimizer
T4	DevOps	devops-automator
T4	Mobile	mobile-app-builder
T4	AI/ML	ai-engineer
T4	Security	security-engineer
T4	Docs	technical-writer
T5	Code review	code-reviewer
T5	Integration	testing-reality-checker
T5	API	testing-api-tester
T5	Performance	testing-performance-benchmarker
T5	Security	security-engineer

The roster is not fixed — T1 can select any agent from the library based on workstream needs.

Adapter Layers

Everything external is a swappable adapter. Core logic never imports from adapters directly — always through an interface.

Core (platform-agnostic)
├── team_runner      — thin bootstrap: spawn T1, monitor blackboard, handle result
├── blackboard       — SQLite coordination state
├── task_brief       — schema + validation
└── escalation       — retry logic, failure routing

Adapters (swappable)
├── llm/             — anthropic (now), openai, ollama, any API
├── notify/          — openclaw (now), slack, email, webhook...
├── vcs/             — github (now), gitlab, gitea, bare git...
└── runtime/
    ├── standard     — openclaw sessions_spawn (T1/T2/T3)
    └── coding_agent — claude_code (T4/T5 default), codex, aider...

Swapping providers means writing a new adapter file — nothing in core changes.

T4 and T5 default to the coding agent runtime when available. Falls back to standard runtime gracefully if not configured.

Decisions Log

T1 dynamic dispatch — T1 assesses scope and prescribes tier path and workstream parallelism. It does not prescribe internal tier coordination patterns.

Distributed ownership — Each tier owns the layer below it. Runner is thin. Tradeoff: distributed control makes the system extensible but debugging requires good blackboard tooling, not central runner traces.

T5 always mandatory — No skipping verification. Things should work and work well before surfacing to T1.

T3 owns T4 and T5 — T3 manages its T4s (dependency graph, swarm vs pipeline, parallelism) and commissions T5 verification of T4 outputs. Runner does not orchestrate T4/T5 centrally.

Orchestration patterns — Baked into tier prompts and runner tier-handling logic, not prescribed by T1. T2: round-table. T3: light mesh. T4: swarm+pipeline. T5: fan-out+consensus.

Output / review — Nothing merges to main without Andrew's explicit approval. T1 opens a PR and surfaces it to Andrew. Notification is dual: Hans messages Andrew directly + PR opened on VCS. Merge is gated on human sign-off.

Platform agnosticism — Core is provider and platform agnostic. Capability levels (reasoning-heavy, capable, fast-cheap) map to models in config. Mixing providers across tiers is supported.

LLM provider — Anthropic first implementation. Config supports per-tier provider selection.

Gateway modification — Decided against. Agent-teams stays standalone Python. OpenClaw used via runtime adapter only.

Coding agent runtime — Claude Code is default T4/T5 runtime. Opt-in native_teams flag available for internal Claude Code parallelism — faster but less blackboard visibility. Default false.

Agency-agents integration — Via git submodule at agents/. T1 selects specialists via config/role_registry.yaml. agent_personality field on task brief; runtime injects as system prompt at spawn time.

11 KiB Raw Blame History