# token-monitor Modular LLM API quota and usage visibility tool. Extracts rate-limit and usage data from all configured LLM providers before failures happen. **Why it exists:** team-vigilio hit its 7-day rate limit (9+ days of 429s). api-ateam ran out of credit mid-session. We kept flying blind. This tool surfaces quota health before the failure. ## Quick start ```bash node monitor.js # run now — human-readable output + log node analyze.js # analyze accumulated logs — burn rates, rotation ~/os/token-status.sh # what Vigilio's wake prompt sees (automated path) ``` ## Usage ```bash node monitor.js # human-readable summary + log to file node monitor.js --json # JSON output + log to file node monitor.js --summary # human-readable only (no log) node monitor.js --provider team-nadja # single provider node monitor.js --no-log # suppress log file write ``` ## Analysis ```bash node analyze.js # full report node analyze.js --burn-rate # burn rate per account node analyze.js --weekly # weekly budget reconstruction node analyze.js --stagger # reset schedule (next 48h) node analyze.js --rotation # rotation recommendation node analyze.js --json # JSON output (all sections) node analyze.js --provider team-nadja # filter to one provider node analyze.js --prune [--dry-run] # archive and prune logs > 30 days ``` **Burn Rate** — delta analysis of 7d utilization over time, projected exhaustion at current rate. **Reset Schedule** — providers resetting within the next 48 hours, sorted ascending by time to reset. **Weekly Reconstruction** — peak and average 7d utilization per provider per ISO week. Shows exhaustion events. **Rotation Recommendation** — ranked provider list by headroom, deprioritizing maxed/rejected/invalid-key accounts. **Underspend Alerts** — active accounts with ≥ 40% of 5h window unused and < 2h until reset. ## Example output ``` Token Monitor — 2026-04-04 16:59 UTC ════════════════════════════════════════════════════════════ team-vigilio [CRITICAL] MAXED — 7d: 100% | resets in 23h 0m team-ludo [UNKNOWN] Invalid API key (401) team-molto [OK] 5h: 32% | 7d: 45% | resets in 4h 0m team-nadja [OK] 5h: 52% | 7d: 39% | resets in 3h 0m team-buio [UNKNOWN] Invalid API key (401) shelley-proxy [OK] tokens: 4,800,000/4,800,000 | cost: $0.000013/call api-ateam [UNKNOWN] Anthropic API does not expose billing/quota via REST ──────────────────────────────────────────────────────────── Overall: 1 CRITICAL, 3 OK, 3 UNKNOWN ``` ## JSON output schema ```json { "timestamp": "2026-04-04T16:59:50.000Z", "providers": { "team-vigilio": { "type": "teams-direct", "status": "rejected", "utilization_5h": 0.94, "utilization_7d": 1.0, "representative_claim": "seven_day", "reset_timestamp": 1743800000, "reset_in_seconds": 82800, "organization_id": "1d7653ad-...", "severity": "critical", "probe_latency_ms": 210 }, "shelley-proxy": { "type": "shelley-proxy", "status": "ok", "tokens_remaining": 4800000, "tokens_limit": 4800000, "requests_remaining": 20000, "requests_limit": 20000, "tokens_reset": "2026-04-04T17:59:50Z", "cost_per_call_usd": 0.000013, "severity": "ok", "probe_latency_ms": 180 }, "api-ateam": { "type": "api-direct", "status": "no_billing_data", "message": "Anthropic API does not expose billing/quota via REST.", "severity": "unknown", "probe_latency_ms": 0 } } } ``` ## Provider support | Provider | Type | Data available | |----------|------|---------------| | team-vigilio, team-ludo, team-molto, team-nadja, team-buio | Anthropic Teams (direct) | 5h/7d utilization (0–100%), status, reset countdown, severity | | shelley-proxy | Shelley/exe.dev proxy | Token headroom, request headroom, per-call USD cost | | api-ateam | Anthropic API (pay-per-use) | Key validity only — no billing API exists | | google-gemini | Gemini API (free tier) | Quota violation detail, retry delay, key validity | | xai-face, xai-amy, xai-murdock, xai-ba, xai-vigilio | xAI/Grok | Request/token remaining counts, rate limit status | ## Severity levels | Level | Condition | |-------|-----------| | `critical` | Teams provider rejected (rate limited / budget exhausted) | | `warning` | Teams 7d utilization > 85%, or 5h utilization > 70% | | `ok` | Healthy | | `unknown` | Invalid key (401), no billing data, or probe error | ## Rate-limit header schemas **Teams direct** (oat01 keys — all `team-*` providers): Anthropic Teams uses the **unified** schema. Headers are present on every response (200 and 429). - `anthropic-ratelimit-unified-5h-utilization` — 5-hour window utilization (0.0–1.0) - `anthropic-ratelimit-unified-7d-utilization` — 7-day budget utilization (0.0–1.0) - `anthropic-ratelimit-unified-status` — `allowed` | `rejected` - `anthropic-ratelimit-unified-representative-claim` — which window is binding (`five_hour` | `seven_day`) - `anthropic-ratelimit-unified-reset` — Unix timestamp when binding window resets **Shelley proxy** (exe.dev gateway): Uses classic absolute-count headers plus an exe.dev-injected cost header: - `Anthropic-Ratelimit-Tokens-Remaining` / `Limit` — absolute token counts - `Anthropic-Ratelimit-Requests-Remaining` / `Limit` — request counts - `Exedev-Gateway-Cost` — USD cost per call (unique to exe.dev gateway) ## Log files Each run appends to `~/.logs/token-monitor/YYYY-MM-DD.jsonl`: ```bash # View today's log cat ~/.logs/token-monitor/$(date +%Y-%m-%d).jsonl | \ python3 -c "import sys,json; [print(json.loads(l)['ts'], json.loads(l)['providers']['team-vigilio']['severity']) for l in sys.stdin]" # Check when team-vigilio was last healthy grep '"team-vigilio"' ~/.logs/token-monitor/*.jsonl | \ python3 -c "import sys,json; [print(l[:40]) for l in sys.stdin if json.loads(l.split(':',1)[1])['providers']['team-vigilio']['severity']=='ok']" ``` ## Architecture ``` monitor.js — CLI entrypoint, orchestrates probes analyze.js — analysis CLI (burn rates, weekly, stagger, rotation) providers/ index.js — reads ~/.pi/agent/models.json, returns typed provider list anthropic-teams.js — unified schema parser (oat01 keys, all team-* providers) anthropic-api.js — pay-per-use (api03 keys) — reports "no billing data" shelley-proxy.js — classic schema + Exedev-Gateway-Cost header gemini.js — Gemini API (free tier, quota via response body) xai.js — x.ai/Grok (rate-limit headers) logger.js — JSONL log to ~/.logs/token-monitor/ report.js — human-readable summary + severity logic test.js — test suite (run: node test.js) docs/ analyze.md — analysis CLI full reference ``` ## How to add a new provider 1. Add provider to `~/.pi/agent/models.json` with `"api": "anthropic-messages"` 2. If it uses Teams unified schema and its name starts with `team-`: picked up automatically 3. For a custom schema: create `providers/yourprovider.js`, implement `probeYourProvider()`, classify it in `providers/index.js`, add probe dispatch in `monitor.js` ## Probe behavior The tool makes one minimal API call per provider to extract headers: - Model: `claude-haiku-4-5-20251001` (cheapest available) - Max tokens: 1 (minimizes cost) - Rate-limit headers are returned on **every** response (200 and 429) for Teams providers - A 429 from a maxed provider is expected and treated as valid quota data **Stealth:** Read-only, one call per run, no hammering. Run at most once per session. ## Related - `~/os/token-status.sh` — wake-prompt integration (calls monitor.js, formats for beat.sh) - `~/projects/provider-check/` — predecessor (liveness only, no quota depth) - `~/.pi/agent/models.json` — provider configuration source - Forgejo issue: trentuna/token-monitor#1 ## Wake integration `~/os/token-status.sh` is the automated interface. It runs `monitor.js --json` and formats the output into a compact summary block for injection into Vigilio's wake prompt via `beat.sh`. ```bash # Manual invocation (same as what the wake prompt sees) ~/os/token-status.sh # Output format: ## Token Economics Anthropic Teams (5 seats): team-vigilio ✗ MAXED 7d:100% resets 23h team-molto ✓ 5h:32% 7d:45% resets 4h ... → Current recommendation: use team-molto | avoid team-vigilio ``` File location: `~/os/token-status.sh` Called by: `~/os/beat.sh` (Vigilio wake script) Uses: `monitor.js --json` with 20-minute cache guard (won't double-probe within a session)