token-monitor/README.md
B.A. Baracus 8ced108f74
docs: README overhaul — add analyze.js, wake integration, Quick start, fix provider table and architecture
Six changes:
- Add ## Quick start block (monitor.js, analyze.js, token-status.sh)
- Add ## Analysis section with all 8 analyze.js subcommands and output descriptions
- Add ## Wake integration section — token-status.sh docs, output format, cache guard note
- Provider support table: add google-gemini and xai-* rows
- Architecture block: add analyze.js, gemini.js, xai.js, docs/analyze.md
- Related: add token-status.sh as first item, fix issue link to trentuna/token-monitor#1

164/164 tests pass.
2026-04-06 02:26:51 +00:00

219 lines
9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# token-monitor
Modular LLM API quota and usage visibility tool. Extracts rate-limit and usage data from all configured LLM providers before failures happen.
**Why it exists:** team-vigilio hit its 7-day rate limit (9+ days of 429s). api-ateam ran out of credit mid-session. We kept flying blind. This tool surfaces quota health before the failure.
## Quick start
```bash
node monitor.js # run now — human-readable output + log
node analyze.js # analyze accumulated logs — burn rates, rotation
~/os/token-status.sh # what Vigilio's wake prompt sees (automated path)
```
## Usage
```bash
node monitor.js # human-readable summary + log to file
node monitor.js --json # JSON output + log to file
node monitor.js --summary # human-readable only (no log)
node monitor.js --provider team-nadja # single provider
node monitor.js --no-log # suppress log file write
```
## Analysis
```bash
node analyze.js # full report
node analyze.js --burn-rate # burn rate per account
node analyze.js --weekly # weekly budget reconstruction
node analyze.js --stagger # reset schedule (next 48h)
node analyze.js --rotation # rotation recommendation
node analyze.js --json # JSON output (all sections)
node analyze.js --provider team-nadja # filter to one provider
node analyze.js --prune [--dry-run] # archive and prune logs > 30 days
```
**Burn Rate** — delta analysis of 7d utilization over time, projected exhaustion at current rate.
**Reset Schedule** — providers resetting within the next 48 hours, sorted ascending by time to reset.
**Weekly Reconstruction** — peak and average 7d utilization per provider per ISO week. Shows exhaustion events.
**Rotation Recommendation** — ranked provider list by headroom, deprioritizing maxed/rejected/invalid-key accounts.
**Underspend Alerts** — active accounts with ≥ 40% of 5h window unused and < 2h until reset.
## Example output
```
Token Monitor — 2026-04-04 16:59 UTC
════════════════════════════════════════════════════════════
team-vigilio [CRITICAL] MAXED — 7d: 100% | resets in 23h 0m
team-ludo [UNKNOWN] Invalid API key (401)
team-molto [OK] 5h: 32% | 7d: 45% | resets in 4h 0m
team-nadja [OK] 5h: 52% | 7d: 39% | resets in 3h 0m
team-buio [UNKNOWN] Invalid API key (401)
shelley-proxy [OK] tokens: 4,800,000/4,800,000 | cost: $0.000013/call
api-ateam [UNKNOWN] Anthropic API does not expose billing/quota via REST
────────────────────────────────────────────────────────────
Overall: 1 CRITICAL, 3 OK, 3 UNKNOWN
```
## JSON output schema
```json
{
"timestamp": "2026-04-04T16:59:50.000Z",
"providers": {
"team-vigilio": {
"type": "teams-direct",
"status": "rejected",
"utilization_5h": 0.94,
"utilization_7d": 1.0,
"representative_claim": "seven_day",
"reset_timestamp": 1743800000,
"reset_in_seconds": 82800,
"organization_id": "1d7653ad-...",
"severity": "critical",
"probe_latency_ms": 210
},
"shelley-proxy": {
"type": "shelley-proxy",
"status": "ok",
"tokens_remaining": 4800000,
"tokens_limit": 4800000,
"requests_remaining": 20000,
"requests_limit": 20000,
"tokens_reset": "2026-04-04T17:59:50Z",
"cost_per_call_usd": 0.000013,
"severity": "ok",
"probe_latency_ms": 180
},
"api-ateam": {
"type": "api-direct",
"status": "no_billing_data",
"message": "Anthropic API does not expose billing/quota via REST.",
"severity": "unknown",
"probe_latency_ms": 0
}
}
}
```
## Provider support
| Provider | Type | Data available |
|----------|------|---------------|
| team-vigilio, team-ludo, team-molto, team-nadja, team-buio | Anthropic Teams (direct) | 5h/7d utilization (0100%), status, reset countdown, severity |
| shelley-proxy | Shelley/exe.dev proxy | Token headroom, request headroom, per-call USD cost |
| api-ateam | Anthropic API (pay-per-use) | Key validity only no billing API exists |
| google-gemini | Gemini API (free tier) | Quota violation detail, retry delay, key validity |
| xai-face, xai-amy, xai-murdock, xai-ba, xai-vigilio | xAI/Grok | Request/token remaining counts, rate limit status |
## Severity levels
| Level | Condition |
|-------|-----------|
| `critical` | Teams provider rejected (rate limited / budget exhausted) |
| `warning` | Teams 7d utilization > 85%, or 5h utilization > 70% |
| `ok` | Healthy |
| `unknown` | Invalid key (401), no billing data, or probe error |
## Rate-limit header schemas
**Teams direct** (oat01 keys — all `team-*` providers):
Anthropic Teams uses the **unified** schema. Headers are present on every response (200 and 429).
- `anthropic-ratelimit-unified-5h-utilization` — 5-hour window utilization (0.01.0)
- `anthropic-ratelimit-unified-7d-utilization` — 7-day budget utilization (0.01.0)
- `anthropic-ratelimit-unified-status``allowed` | `rejected`
- `anthropic-ratelimit-unified-representative-claim` — which window is binding (`five_hour` | `seven_day`)
- `anthropic-ratelimit-unified-reset` — Unix timestamp when binding window resets
**Shelley proxy** (exe.dev gateway):
Uses classic absolute-count headers plus an exe.dev-injected cost header:
- `Anthropic-Ratelimit-Tokens-Remaining` / `Limit` — absolute token counts
- `Anthropic-Ratelimit-Requests-Remaining` / `Limit` — request counts
- `Exedev-Gateway-Cost` — USD cost per call (unique to exe.dev gateway)
## Log files
Each run appends to `~/.logs/token-monitor/YYYY-MM-DD.jsonl`:
```bash
# View today's log
cat ~/.logs/token-monitor/$(date +%Y-%m-%d).jsonl | \
python3 -c "import sys,json; [print(json.loads(l)['ts'], json.loads(l)['providers']['team-vigilio']['severity']) for l in sys.stdin]"
# Check when team-vigilio was last healthy
grep '"team-vigilio"' ~/.logs/token-monitor/*.jsonl | \
python3 -c "import sys,json; [print(l[:40]) for l in sys.stdin if json.loads(l.split(':',1)[1])['providers']['team-vigilio']['severity']=='ok']"
```
## Architecture
```
monitor.js — CLI entrypoint, orchestrates probes
analyze.js — analysis CLI (burn rates, weekly, stagger, rotation)
providers/
index.js — reads ~/.pi/agent/models.json, returns typed provider list
anthropic-teams.js — unified schema parser (oat01 keys, all team-* providers)
anthropic-api.js — pay-per-use (api03 keys) — reports "no billing data"
shelley-proxy.js — classic schema + Exedev-Gateway-Cost header
gemini.js — Gemini API (free tier, quota via response body)
xai.js — x.ai/Grok (rate-limit headers)
logger.js — JSONL log to ~/.logs/token-monitor/
report.js — human-readable summary + severity logic
test.js — test suite (run: node test.js)
docs/
analyze.md — analysis CLI full reference
```
## How to add a new provider
1. Add provider to `~/.pi/agent/models.json` with `"api": "anthropic-messages"`
2. If it uses Teams unified schema and its name starts with `team-`: picked up automatically
3. For a custom schema: create `providers/yourprovider.js`, implement `probeYourProvider()`, classify it in `providers/index.js`, add probe dispatch in `monitor.js`
## Probe behavior
The tool makes one minimal API call per provider to extract headers:
- Model: `claude-haiku-4-5-20251001` (cheapest available)
- Max tokens: 1 (minimizes cost)
- Rate-limit headers are returned on **every** response (200 and 429) for Teams providers
- A 429 from a maxed provider is expected and treated as valid quota data
**Stealth:** Read-only, one call per run, no hammering. Run at most once per session.
## Related
- `~/os/token-status.sh` — wake-prompt integration (calls monitor.js, formats for beat.sh)
- `~/projects/provider-check/` — predecessor (liveness only, no quota depth)
- `~/.pi/agent/models.json` — provider configuration source
- Forgejo issue: trentuna/token-monitor#1
## Wake integration
`~/os/token-status.sh` is the automated interface. It runs `monitor.js --json`
and formats the output into a compact summary block for injection into
Vigilio's wake prompt via `beat.sh`.
```bash
# Manual invocation (same as what the wake prompt sees)
~/os/token-status.sh
# Output format:
## Token Economics
Anthropic Teams (5 seats):
team-vigilio ✗ MAXED 7d:100% resets 23h
team-molto ✓ 5h:32% 7d:45% resets 4h
...
→ Current recommendation: use team-molto | avoid team-vigilio
```
File location: `~/os/token-status.sh`
Called by: `~/os/beat.sh` (Vigilio wake script)
Uses: `monitor.js --json` with 20-minute cache guard (won't double-probe within a session)