# token-monitor Modular LLM API quota and usage visibility tool. Extracts rate-limit and usage data from all configured LLM providers before failures happen. **Why it exists:** team-vigilio hit its 7-day rate limit (9+ days of 429s). api-ateam ran out of credit mid-session. We kept flying blind. This tool surfaces quota health before the failure. ## Usage ```bash node monitor.js # human-readable summary + log to file node monitor.js --json # JSON output + log to file node monitor.js --summary # human-readable only (no log) node monitor.js --provider team-nadja # single provider node monitor.js --no-log # suppress log file write ``` ## Example output ``` Token Monitor — 2026-04-04 16:59 UTC ════════════════════════════════════════════════════════════ team-vigilio [CRITICAL] MAXED — 7d: 100% | resets in 23h 0m team-ludo [UNKNOWN] Invalid API key (401) team-molto [OK] 5h: 32% | 7d: 45% | resets in 4h 0m team-nadja [OK] 5h: 52% | 7d: 39% | resets in 3h 0m team-buio [UNKNOWN] Invalid API key (401) shelley-proxy [OK] tokens: 4,800,000/4,800,000 | cost: $0.000013/call api-ateam [UNKNOWN] Anthropic API does not expose billing/quota via REST ──────────────────────────────────────────────────────────── Overall: 1 CRITICAL, 3 OK, 3 UNKNOWN ``` ## JSON output schema ```json { "timestamp": "2026-04-04T16:59:50.000Z", "providers": { "team-vigilio": { "type": "teams-direct", "status": "rejected", "utilization_5h": 0.94, "utilization_7d": 1.0, "representative_claim": "seven_day", "reset_timestamp": 1743800000, "reset_in_seconds": 82800, "organization_id": "1d7653ad-...", "severity": "critical", "probe_latency_ms": 210 }, "shelley-proxy": { "type": "shelley-proxy", "status": "ok", "tokens_remaining": 4800000, "tokens_limit": 4800000, "requests_remaining": 20000, "requests_limit": 20000, "tokens_reset": "2026-04-04T17:59:50Z", "cost_per_call_usd": 0.000013, "severity": "ok", "probe_latency_ms": 180 }, "api-ateam": { "type": "api-direct", "status": "no_billing_data", "message": "Anthropic API does not expose billing/quota via REST.", "severity": "unknown", "probe_latency_ms": 0 } } } ``` ## Provider support | Provider | Type | Data available | |----------|------|---------------| | team-vigilio, team-ludo, team-molto, team-nadja, team-buio | Anthropic Teams (direct) | 5h/7d utilization (0–100%), status, reset countdown, severity | | shelley-proxy | Shelley/exe.dev proxy | Token headroom, request headroom, per-call USD cost | | api-ateam | Anthropic API (pay-per-use) | Key validity only — no billing API exists | ## Severity levels | Level | Condition | |-------|-----------| | `critical` | Teams provider rejected (rate limited / budget exhausted) | | `warning` | Teams 7d utilization > 85%, or 5h utilization > 70% | | `ok` | Healthy | | `unknown` | Invalid key (401), no billing data, or probe error | ## Rate-limit header schemas **Teams direct** (oat01 keys — all `team-*` providers): Anthropic Teams uses the **unified** schema. Headers are present on every response (200 and 429). - `anthropic-ratelimit-unified-5h-utilization` — 5-hour window utilization (0.0–1.0) - `anthropic-ratelimit-unified-7d-utilization` — 7-day budget utilization (0.0–1.0) - `anthropic-ratelimit-unified-status` — `allowed` | `rejected` - `anthropic-ratelimit-unified-representative-claim` — which window is binding (`five_hour` | `seven_day`) - `anthropic-ratelimit-unified-reset` — Unix timestamp when binding window resets **Shelley proxy** (exe.dev gateway): Uses classic absolute-count headers plus an exe.dev-injected cost header: - `Anthropic-Ratelimit-Tokens-Remaining` / `Limit` — absolute token counts - `Anthropic-Ratelimit-Requests-Remaining` / `Limit` — request counts - `Exedev-Gateway-Cost` — USD cost per call (unique to exe.dev gateway) ## Log files Each run appends to `~/.logs/token-monitor/YYYY-MM-DD.jsonl`: ```bash # View today's log cat ~/.logs/token-monitor/$(date +%Y-%m-%d).jsonl | \ python3 -c "import sys,json; [print(json.loads(l)['ts'], json.loads(l)['providers']['team-vigilio']['severity']) for l in sys.stdin]" # Check when team-vigilio was last healthy grep '"team-vigilio"' ~/.logs/token-monitor/*.jsonl | \ python3 -c "import sys,json; [print(l[:40]) for l in sys.stdin if json.loads(l.split(':',1)[1])['providers']['team-vigilio']['severity']=='ok']" ``` ## Architecture ``` monitor.js — CLI entrypoint, orchestrates probes providers/ index.js — reads ~/.pi/agent/models.json, returns typed provider list anthropic-teams.js — unified schema parser (oat01 keys, all team-* providers) anthropic-api.js — pay-per-use (api03 keys) — reports "no billing data" shelley-proxy.js — classic schema + Exedev-Gateway-Cost header logger.js — JSONL log to ~/.logs/token-monitor/ report.js — human-readable summary + severity logic test.js — test suite (run: node test.js) ``` ## How to add a new provider 1. Add provider to `~/.pi/agent/models.json` with `"api": "anthropic-messages"` 2. If it uses Teams unified schema and its name starts with `team-`: picked up automatically 3. For a custom schema: create `providers/yourprovider.js`, implement `probeYourProvider()`, classify it in `providers/index.js`, add probe dispatch in `monitor.js` ## Probe behavior The tool makes one minimal API call per provider to extract headers: - Model: `claude-haiku-4-5-20251001` (cheapest available) - Max tokens: 1 (minimizes cost) - Rate-limit headers are returned on **every** response (200 and 429) for Teams providers - A 429 from a maxed provider is expected and treated as valid quota data **Stealth:** Read-only, one call per run, no hammering. Run at most once per session. ## Related - `~/projects/provider-check/` — predecessor (liveness only, no quota depth) - `~/.pi/agent/models.json` — provider configuration source - Forgejo issue: trentuna/a-team#91