token-monitor/README.md
Hannibal Smith 07a544c50d
build: token-monitor v0.1.0 — modular LLM API quota visibility
Implements modular provider probing with two distinct header schemas:
- Teams direct (unified schema): 5h/7d utilization floats, status, reset countdown
- Shelley proxy (classic schema): token/request counts + Exedev-Gateway-Cost (USD/call)
- api-ateam: reports no billing data (confirmed non-existent by recon)

Key: uses claude-haiku-4-5-20251001 for minimal probe calls (1 token).
Rate-limit headers present on ALL responses (200 and 429).

113/113 tests passing.

Built from Face recon (trentuna/a-team#91) — live header capture confirmed
unified schema with utilization floats replaces old per-count schema.
2026-04-04 17:01:05 +00:00

157 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# token-monitor
Modular LLM API quota and usage visibility tool. Extracts rate-limit and usage data from all configured LLM providers before failures happen.
**Why it exists:** team-vigilio hit its 7-day rate limit (9+ days of 429s). api-ateam ran out of credit mid-session. We kept flying blind. This tool surfaces quota health before the failure.
## Usage
```bash
node monitor.js # human-readable summary + log to file
node monitor.js --json # JSON output + log to file
node monitor.js --summary # human-readable only (no log)
node monitor.js --provider team-nadja # single provider
node monitor.js --no-log # suppress log file write
```
## Example output
```
Token Monitor — 2026-04-04 16:59 UTC
════════════════════════════════════════════════════════════
team-vigilio [CRITICAL] MAXED — 7d: 100% | resets in 23h 0m
team-ludo [UNKNOWN] Invalid API key (401)
team-molto [OK] 5h: 32% | 7d: 45% | resets in 4h 0m
team-nadja [OK] 5h: 52% | 7d: 39% | resets in 3h 0m
team-buio [UNKNOWN] Invalid API key (401)
shelley-proxy [OK] tokens: 4,800,000/4,800,000 | cost: $0.000013/call
api-ateam [UNKNOWN] Anthropic API does not expose billing/quota via REST
────────────────────────────────────────────────────────────
Overall: 1 CRITICAL, 3 OK, 3 UNKNOWN
```
## JSON output schema
```json
{
"timestamp": "2026-04-04T16:59:50.000Z",
"providers": {
"team-vigilio": {
"type": "teams-direct",
"status": "rejected",
"utilization_5h": 0.94,
"utilization_7d": 1.0,
"representative_claim": "seven_day",
"reset_timestamp": 1743800000,
"reset_in_seconds": 82800,
"organization_id": "1d7653ad-...",
"severity": "critical",
"probe_latency_ms": 210
},
"shelley-proxy": {
"type": "shelley-proxy",
"status": "ok",
"tokens_remaining": 4800000,
"tokens_limit": 4800000,
"requests_remaining": 20000,
"requests_limit": 20000,
"tokens_reset": "2026-04-04T17:59:50Z",
"cost_per_call_usd": 0.000013,
"severity": "ok",
"probe_latency_ms": 180
},
"api-ateam": {
"type": "api-direct",
"status": "no_billing_data",
"message": "Anthropic API does not expose billing/quota via REST.",
"severity": "unknown",
"probe_latency_ms": 0
}
}
}
```
## Provider support
| Provider | Type | Data available |
|----------|------|---------------|
| team-vigilio, team-ludo, team-molto, team-nadja, team-buio | Anthropic Teams (direct) | 5h/7d utilization (0100%), status, reset countdown, severity |
| shelley-proxy | Shelley/exe.dev proxy | Token headroom, request headroom, per-call USD cost |
| api-ateam | Anthropic API (pay-per-use) | Key validity only — no billing API exists |
## Severity levels
| Level | Condition |
|-------|-----------|
| `critical` | Teams provider rejected (rate limited / budget exhausted) |
| `warning` | Teams 7d utilization > 85%, or 5h utilization > 70% |
| `ok` | Healthy |
| `unknown` | Invalid key (401), no billing data, or probe error |
## Rate-limit header schemas
**Teams direct** (oat01 keys — all `team-*` providers):
Anthropic Teams uses the **unified** schema. Headers are present on every response (200 and 429).
- `anthropic-ratelimit-unified-5h-utilization` — 5-hour window utilization (0.01.0)
- `anthropic-ratelimit-unified-7d-utilization` — 7-day budget utilization (0.01.0)
- `anthropic-ratelimit-unified-status``allowed` | `rejected`
- `anthropic-ratelimit-unified-representative-claim` — which window is binding (`five_hour` | `seven_day`)
- `anthropic-ratelimit-unified-reset` — Unix timestamp when binding window resets
**Shelley proxy** (exe.dev gateway):
Uses classic absolute-count headers plus an exe.dev-injected cost header:
- `Anthropic-Ratelimit-Tokens-Remaining` / `Limit` — absolute token counts
- `Anthropic-Ratelimit-Requests-Remaining` / `Limit` — request counts
- `Exedev-Gateway-Cost` — USD cost per call (unique to exe.dev gateway)
## Log files
Each run appends to `~/.logs/token-monitor/YYYY-MM-DD.jsonl`:
```bash
# View today's log
cat ~/.logs/token-monitor/$(date +%Y-%m-%d).jsonl | \
python3 -c "import sys,json; [print(json.loads(l)['ts'], json.loads(l)['providers']['team-vigilio']['severity']) for l in sys.stdin]"
# Check when team-vigilio was last healthy
grep '"team-vigilio"' ~/.logs/token-monitor/*.jsonl | \
python3 -c "import sys,json; [print(l[:40]) for l in sys.stdin if json.loads(l.split(':',1)[1])['providers']['team-vigilio']['severity']=='ok']"
```
## Architecture
```
monitor.js — CLI entrypoint, orchestrates probes
providers/
index.js — reads ~/.pi/agent/models.json, returns typed provider list
anthropic-teams.js — unified schema parser (oat01 keys, all team-* providers)
anthropic-api.js — pay-per-use (api03 keys) — reports "no billing data"
shelley-proxy.js — classic schema + Exedev-Gateway-Cost header
logger.js — JSONL log to ~/.logs/token-monitor/
report.js — human-readable summary + severity logic
test.js — test suite (run: node test.js)
```
## How to add a new provider
1. Add provider to `~/.pi/agent/models.json` with `"api": "anthropic-messages"`
2. If it uses Teams unified schema and its name starts with `team-`: picked up automatically
3. For a custom schema: create `providers/yourprovider.js`, implement `probeYourProvider()`, classify it in `providers/index.js`, add probe dispatch in `monitor.js`
## Probe behavior
The tool makes one minimal API call per provider to extract headers:
- Model: `claude-haiku-4-5-20251001` (cheapest available)
- Max tokens: 1 (minimizes cost)
- Rate-limit headers are returned on **every** response (200 and 429) for Teams providers
- A 429 from a maxed provider is expected and treated as valid quota data
**Stealth:** Read-only, one call per run, no hammering. Run at most once per session.
## Related
- `~/projects/provider-check/` — predecessor (liveness only, no quota depth)
- `~/.pi/agent/models.json` — provider configuration source
- Forgejo issue: trentuna/a-team#91