| docs | ||
| providers | ||
| analyze.js | ||
| configure-key-limits.js | ||
| logger.js | ||
| monitor.js | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| recommend.js | ||
| report.js | ||
| test.js | ||
| tui.js | ||
token-monitor
Modular LLM API quota and usage visibility tool. Extracts rate-limit and usage data from all configured LLM providers before failures happen.
Why it exists: team-vigilio hit its 7-day rate limit (9+ days of 429s). api-ateam ran out of credit mid-session. We kept flying blind. This tool surfaces quota health before the failure.
Quick start
node monitor.js # run now — human-readable output + log
node analyze.js # analyze accumulated logs — burn rates, rotation
~/os/token-status.sh # what Vigilio's wake prompt sees (automated path)
Usage
node monitor.js # human-readable summary + log to file
node monitor.js --json # JSON output + log to file
node monitor.js --summary # human-readable only (no log)
node monitor.js --provider team-nadja # single provider
node monitor.js --no-log # suppress log file write
Analysis
node analyze.js # full report
node analyze.js --burn-rate # burn rate per account
node analyze.js --weekly # weekly budget reconstruction
node analyze.js --stagger # reset schedule (next 48h)
node analyze.js --rotation # rotation recommendation
node analyze.js --json # JSON output (all sections)
node analyze.js --provider team-nadja # filter to one provider
node analyze.js --prune [--dry-run] # archive and prune logs > 30 days
Burn Rate — delta analysis of 7d utilization over time, projected exhaustion at current rate.
Reset Schedule — providers resetting within the next 48 hours, sorted ascending by time to reset.
Weekly Reconstruction — peak and average 7d utilization per provider per ISO week. Shows exhaustion events.
Rotation Recommendation — ranked provider list by headroom, deprioritizing maxed/rejected/invalid-key accounts.
Underspend Alerts — active accounts with ≥ 40% of 5h window unused and < 2h until reset.
Example output
Token Monitor — 2026-04-04 16:59 UTC
════════════════════════════════════════════════════════════
team-vigilio [CRITICAL] MAXED — 7d: 100% | resets in 23h 0m
team-ludo [UNKNOWN] Invalid API key (401)
team-molto [OK] 5h: 32% | 7d: 45% | resets in 4h 0m
team-nadja [OK] 5h: 52% | 7d: 39% | resets in 3h 0m
team-buio [UNKNOWN] Invalid API key (401)
shelley-proxy [OK] tokens: 4,800,000/4,800,000 | cost: $0.000013/call
api-ateam [UNKNOWN] Anthropic API does not expose billing/quota via REST
────────────────────────────────────────────────────────────
Overall: 1 CRITICAL, 3 OK, 3 UNKNOWN
JSON output schema
{
"timestamp": "2026-04-04T16:59:50.000Z",
"providers": {
"team-vigilio": {
"type": "teams-direct",
"status": "rejected",
"utilization_5h": 0.94,
"utilization_7d": 1.0,
"representative_claim": "seven_day",
"reset_timestamp": 1743800000,
"reset_in_seconds": 82800,
"organization_id": "1d7653ad-...",
"severity": "critical",
"probe_latency_ms": 210
},
"shelley-proxy": {
"type": "shelley-proxy",
"status": "ok",
"tokens_remaining": 4800000,
"tokens_limit": 4800000,
"requests_remaining": 20000,
"requests_limit": 20000,
"tokens_reset": "2026-04-04T17:59:50Z",
"cost_per_call_usd": 0.000013,
"severity": "ok",
"probe_latency_ms": 180
},
"api-ateam": {
"type": "api-direct",
"status": "no_billing_data",
"message": "Anthropic API does not expose billing/quota via REST.",
"severity": "unknown",
"probe_latency_ms": 0
}
}
}
Provider support
| Provider | Type | Data available |
|---|---|---|
| team-vigilio, team-ludo, team-molto, team-nadja, team-buio | Anthropic Teams (direct) | 5h/7d utilization (0–100%), status, reset countdown, severity |
| shelley-proxy | Shelley/exe.dev proxy | Token headroom, request headroom, per-call USD cost |
| api-ateam | Anthropic API (pay-per-use) | Key validity only — no billing API exists |
| google-gemini | Gemini API (free tier) | Quota violation detail, retry delay, key validity |
| xai-face, xai-amy, xai-murdock, xai-ba, xai-vigilio | xAI/Grok | Request/token remaining counts, rate limit status |
Severity levels
| Level | Condition |
|---|---|
critical |
Teams provider rejected (rate limited / budget exhausted) |
warning |
Teams 7d utilization > 85%, or 5h utilization > 70% |
ok |
Healthy |
unknown |
Invalid key (401), no billing data, or probe error |
Rate-limit header schemas
Teams direct (oat01 keys — all team-* providers):
Anthropic Teams uses the unified schema. Headers are present on every response (200 and 429).
anthropic-ratelimit-unified-5h-utilization— 5-hour window utilization (0.0–1.0)anthropic-ratelimit-unified-7d-utilization— 7-day budget utilization (0.0–1.0)anthropic-ratelimit-unified-status—allowed|rejectedanthropic-ratelimit-unified-representative-claim— which window is binding (five_hour|seven_day)anthropic-ratelimit-unified-reset— Unix timestamp when binding window resets
Shelley proxy (exe.dev gateway): Uses classic absolute-count headers plus an exe.dev-injected cost header:
Anthropic-Ratelimit-Tokens-Remaining/Limit— absolute token countsAnthropic-Ratelimit-Requests-Remaining/Limit— request countsExedev-Gateway-Cost— USD cost per call (unique to exe.dev gateway)
Log files
Each run appends to ~/.logs/token-monitor/YYYY-MM-DD.jsonl:
# View today's log
cat ~/.logs/token-monitor/$(date +%Y-%m-%d).jsonl | \
python3 -c "import sys,json; [print(json.loads(l)['ts'], json.loads(l)['providers']['team-vigilio']['severity']) for l in sys.stdin]"
# Check when team-vigilio was last healthy
grep '"team-vigilio"' ~/.logs/token-monitor/*.jsonl | \
python3 -c "import sys,json; [print(l[:40]) for l in sys.stdin if json.loads(l.split(':',1)[1])['providers']['team-vigilio']['severity']=='ok']"
Architecture
monitor.js — CLI entrypoint, orchestrates probes
analyze.js — analysis CLI (burn rates, weekly, stagger, rotation)
providers/
index.js — reads ~/.pi/agent/models.json, returns typed provider list
anthropic-teams.js — unified schema parser (oat01 keys, all team-* providers)
anthropic-api.js — pay-per-use (api03 keys) — reports "no billing data"
shelley-proxy.js — classic schema + Exedev-Gateway-Cost header
gemini.js — Gemini API (free tier, quota via response body)
xai.js — x.ai/Grok (rate-limit headers)
logger.js — JSONL log to ~/.logs/token-monitor/
report.js — human-readable summary + severity logic
test.js — test suite (run: node test.js)
docs/
analyze.md — analysis CLI full reference
How to add a new provider
- Add provider to
~/.pi/agent/models.jsonwith"api": "anthropic-messages" - If it uses Teams unified schema and its name starts with
team-: picked up automatically - For a custom schema: create
providers/yourprovider.js, implementprobeYourProvider(), classify it inproviders/index.js, add probe dispatch inmonitor.js
Probe behavior
The tool makes one minimal API call per provider to extract headers:
- Model:
claude-haiku-4-5-20251001(cheapest available) - Max tokens: 1 (minimizes cost)
- Rate-limit headers are returned on every response (200 and 429) for Teams providers
- A 429 from a maxed provider is expected and treated as valid quota data
Stealth: Read-only, one call per run, no hammering. Run at most once per session.
Related
~/os/token-status.sh— wake-prompt integration (calls monitor.js, formats for beat.sh)~/projects/provider-check/— predecessor (liveness only, no quota depth)~/.pi/agent/models.json— provider configuration source- Forgejo issue: trentuna/token-monitor#1
Wake integration
~/os/token-status.sh is the automated interface. It runs monitor.js --json
and formats the output into a compact summary block for injection into
Vigilio's wake prompt via beat.sh.
# Manual invocation (same as what the wake prompt sees)
~/os/token-status.sh
# Output format:
## Token Economics
Anthropic Teams (5 seats):
team-vigilio ✗ MAXED 7d:100% resets 23h
team-molto ✓ 5h:32% 7d:45% resets 4h
...
→ Current recommendation: use team-molto | avoid team-vigilio
File location: ~/os/token-status.sh
Called by: ~/os/beat.sh (Vigilio wake script)
Uses: monitor.js --json with 20-minute cache guard (won't double-probe within a session)