feat(gh-monitor): design doc and build spec for GitHub PR polling monitor

This commit is contained in:
2026-03-15 17:12:24 -04:00
parent d35ea90e9f
commit 9f23deb309
3 changed files with 366 additions and 0 deletions

26
README.md Normal file
View File

@@ -0,0 +1,26 @@
# hans-tools
Personal tooling for Hans — GitHub PR monitoring, automation hooks, and utilities.
## Tools
| Tool | Description | Status |
|---|---|---|
| `gh-monitor` | Polls GitHub for PR activity and notifies Hans via OpenClaw | Spec review |
## Structure
```
hans-tools/
├── gh-monitor/ # GitHub PR polling monitor
│ ├── design.md # Design doc
│ ├── buildspec.md # Build spec
│ └── ... # Implementation (pending review)
└── README.md
```
## Principles
- Every tool has a design doc and build spec reviewed before implementation
- No secrets in the repo — all credentials via environment variables
- Tools run as cron jobs or daemons managed by OpenClaw

182
gh-monitor/buildspec.md Normal file
View File

@@ -0,0 +1,182 @@
# gh-monitor — Build Spec
**Status:** Pending Andrew's review
**Depends on:** design.md approved
---
## Directory Layout
```
gh-monitor/
├── design.md
├── buildspec.md
├── poll.py # Main entry point
├── config/
│ └── watched.yaml # Repos and filter rules
├── state/
│ ├── last_seen.json # Event cursor (gitignored)
│ └── errors.log # Error log (gitignored)
├── requirements.txt
└── .gitignore
```
---
## Build Order
### STEP 1 — .gitignore + requirements.txt
`.gitignore`:
```
state/last_seen.json
state/errors.log
__pycache__/
*.pyc
.env
```
`requirements.txt`:
```
PyYAML>=6.0
```
(All other deps: stdlib + gh CLI)
### STEP 2 — config/watched.yaml
Starter config watching the-agency repo:
```yaml
repos:
- owner: coding-with-hans-heinemann
repo: the-agency
notify_on:
- review_submitted
- review_comment
- issue_comment
- pr_closed
```
### STEP 3 — poll.py: config + state loader
Functions:
- `load_config(path) -> dict`
Reads watched.yaml. Raises on missing file.
- `load_state(path) -> dict`
Reads last_seen.json. Returns `{}` if file doesn't exist (first run).
- `save_state(state, path)`
Atomically writes last_seen.json (write to .tmp, rename).
### STEP 4 — poll.py: GitHub API client
Function:
- `gh_api(endpoint) -> list | dict`
Runs `gh api --paginate <endpoint>` as subprocess.
Returns parsed JSON. Raises `GHAPIError` on non-zero exit.
- `get_open_prs(owner, repo) -> list[dict]`
Calls `/repos/{owner}/{repo}/pulls?state=open`.
Returns list of PR dicts (number, title, html_url).
### STEP 5 — poll.py: event fetchers
Functions (each returns list of event dicts with `event_type`, `created_at`,
`actor`, `body`, `url`):
- `get_reviews(owner, repo, pr_number) -> list[dict]`
`/repos/{owner}/{repo}/pulls/{pr_number}/reviews`
- `get_review_comments(owner, repo, pr_number) -> list[dict]`
`/repos/{owner}/{repo}/pulls/{pr_number}/comments`
- `get_issue_comments(owner, repo, pr_number) -> list[dict]`
`/repos/{owner}/{repo}/issues/{pr_number}/comments`
### STEP 6 — poll.py: event diffing
Function:
- `new_events_since(events, cursor_ts) -> list[dict]`
Filters events to those with `created_at > cursor_ts`.
Returns sorted by `created_at` ascending.
### STEP 7 — poll.py: notification sender
Function:
- `notify(text)`
Runs `openclaw system event --text "<text>" --mode now` as subprocess.
Logs warning and continues on non-zero exit (best-effort).
- `format_notification(repo_slug, pr, event) -> str`
Builds the notification string:
`[gh-monitor] PR #N "title" — <actor> <action>:\n"<body[:200]>"\n<url>`
### STEP 8 — poll.py: error tracking
Module-level logic:
- `log_error(repo_slug, error, state)`
Appends to `state/errors.log`.
Increments `state["<repo_slug>"]["consecutive_errors"]` counter.
If counter >= 3 and not already alerted: fires one notify() alert.
Resets counter to 0 on successful poll for that repo.
### STEP 9 — poll.py: main poll loop
Function:
- `poll_repo(repo_cfg, state) -> dict`
1. Get cursor from state (or now if first run).
2. Fetch open PRs.
3. For each PR: fetch reviews, review_comments, issue_comments.
4. Filter to new events since cursor.
5. Fire notify() for each new event.
6. Update cursor to max(created_at) of processed events (or now if none).
7. Return updated state slice.
- `main()`
Loads config + state.
Calls poll_repo() for each repo in watched.yaml.
Saves state.
Exits 0.
Entry point: `if __name__ == "__main__": main()`
### STEP 10 — OpenClaw cron job
Register via OpenClaw cron API:
```json
{
"name": "gh-monitor",
"schedule": { "kind": "every", "everyMs": 300000 },
"payload": {
"kind": "systemEvent",
"text": "Run GitHub PR monitor: cd ~/Projects/hans-tools/gh-monitor && python3 poll.py"
},
"sessionTarget": "main"
}
```
Note: this is a systemEvent (not agentTurn) so it injects into the main session
and Hans handles it inline. If this proves noisy, switch to agentTurn in isolated
session.
---
## Testing Plan
Manual test steps (no automated tests for v1):
1. `python3 poll.py` with no state file → creates state, no notifications (first-run cursor set to now)
2. Post a comment on PR #1 → run poll.py → notification fires
3. Run poll.py again immediately → no duplicate notification (cursor advanced)
4. Break `gh` binary path temporarily → error logged, no crash
5. After 3 failed cycles → single alert fires
---
## What Is NOT in This Build
- Automated test suite
- Filtering by comment author
- Digest/batching mode
- Any write operations to GitHub
- Anything touching main branch

158
gh-monitor/design.md Normal file
View File

@@ -0,0 +1,158 @@
# gh-monitor — Design Doc
**Status:** Pending Andrew's review
**Repo:** coding-with-hans-heinemann/hans-tools
**Author:** Hans Heinemann
---
## What It Does
Polls the GitHub API for activity on watched repositories and fires OpenClaw
system events to wake Hans when action is needed. Hans can then read the PR,
respond to comments, push fixes, or request changes — all without a public
webhook endpoint.
---
## Scope
Initial scope: PRs only. Issues, CI, and deployments out of scope for v1.
Events monitored:
- New review submitted (approved, changes requested, commented)
- New PR review comment posted
- New PR issue comment posted
- PR merged or closed
Events NOT monitored in v1:
- CI/check status
- Issue activity
- Dependabot alerts
---
## Architecture
```
cron (every 5 min)
└── gh-monitor/poll.py
├── reads config/watched.yaml (repos + filter rules)
├── reads state/last_seen.json (per-repo event cursor)
├── calls GitHub API via gh CLI (no extra credentials)
├── diffs against last_seen
├── for each new event:
│ └── fires openclaw system event (text summary)
└── writes updated last_seen.json
```
Hans receives OpenClaw system event → session wakes → Hans reads + acts.
---
## Config — watched.yaml
```yaml
repos:
- owner: coding-with-hans-heinemann
repo: the-agency
notify_on:
- review_submitted
- review_comment
- issue_comment
- pr_closed
```
Multiple repos supported. Per-repo filter rules.
---
## State — last_seen.json
Tracks the timestamp of the last processed event per repo. On each poll,
only events newer than this cursor are processed. Prevents duplicate alerts.
```json
{
"coding-with-hans-heinemann/the-agency": {
"last_event_at": "2026-03-15T17:00:00Z"
}
}
```
On first run (no state file), cursor is set to now — no backfill of old events.
---
## Notification Format
OpenClaw system event text:
```
[gh-monitor] PR #1 "feat: Phase 2" — Andrew left a review comment:
"The escalation retry logic looks good but can you add a test for the blocked case?"
https://github.com/coding-with-hans-heinemann/the-agency/pull/1#discussion_r12345
```
One event per notification. If multiple events arrive in one poll cycle, they
fire as separate system events in sequence.
---
## GitHub API Access
Uses `gh` CLI (already installed, already authenticated as hansheinemann).
No new credentials needed. All API calls go through `gh api`.
Endpoints used:
- `GET /repos/{owner}/{repo}/pulls/{pull_number}/reviews`
- `GET /repos/{owner}/{repo}/pulls/{pull_number}/comments`
- `GET /repos/{owner}/{repo}/issues/{pull_number}/comments`
- `GET /repos/{owner}/{repo}/pulls` (list open PRs)
Rate limit: 5,000 requests/hour for authenticated requests. At 5-min poll
intervals across a handful of repos, this is nowhere near the limit.
---
## Cron Schedule
Every 5 minutes via OpenClaw cron:
```
{ "kind": "every", "everyMs": 300000 }
```
Payload: systemEvent → injects wake text into main session.
Can be paused/resumed via OpenClaw cron management without touching the code.
---
## Error Handling
- GitHub API errors: log to `state/errors.log`, skip that repo for this cycle
- Malformed API response: log and skip
- Missing state file: create fresh with cursor = now
- `gh` CLI not found: exit with error message
Errors do NOT fire system events (avoid alert fatigue from transient API blips).
If errors persist for >3 consecutive cycles, fire one alert to Hans.
---
## Security
- No webhook endpoint — nothing exposed to the internet
- No secrets stored in the repo — `gh` CLI handles auth via its own keychain
- State files excluded from git via .gitignore
- Read-only GitHub API access needed (no write scopes required for polling)
---
## Out of Scope (v1)
- Filtering by PR author
- Filtering by comment author
- Digest mode (batch multiple events into one notification)
- Slack/email delivery (OpenClaw system event only)
- CI/check status monitoring