build: token-monitor v0.1.0 — modular LLM API quota visibility
Implements modular provider probing with two distinct header schemas: - Teams direct (unified schema): 5h/7d utilization floats, status, reset countdown - Shelley proxy (classic schema): token/request counts + Exedev-Gateway-Cost (USD/call) - api-ateam: reports no billing data (confirmed non-existent by recon) Key: uses claude-haiku-4-5-20251001 for minimal probe calls (1 token). Rate-limit headers present on ALL responses (200 and 429). 113/113 tests passing. Built from Face recon (trentuna/a-team#91) — live header capture confirmed unified schema with utilization floats replaces old per-count schema.
This commit is contained in:
parent
760049a25e
commit
07a544c50d
10 changed files with 1093 additions and 1 deletions
156
README.md
156
README.md
|
|
@ -1,3 +1,157 @@
|
|||
# token-monitor
|
||||
|
||||
Modular LLM API quota and usage visibility tool
|
||||
Modular LLM API quota and usage visibility tool. Extracts rate-limit and usage data from all configured LLM providers before failures happen.
|
||||
|
||||
**Why it exists:** team-vigilio hit its 7-day rate limit (9+ days of 429s). api-ateam ran out of credit mid-session. We kept flying blind. This tool surfaces quota health before the failure.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
node monitor.js # human-readable summary + log to file
|
||||
node monitor.js --json # JSON output + log to file
|
||||
node monitor.js --summary # human-readable only (no log)
|
||||
node monitor.js --provider team-nadja # single provider
|
||||
node monitor.js --no-log # suppress log file write
|
||||
```
|
||||
|
||||
## Example output
|
||||
|
||||
```
|
||||
Token Monitor — 2026-04-04 16:59 UTC
|
||||
════════════════════════════════════════════════════════════
|
||||
|
||||
team-vigilio [CRITICAL] MAXED — 7d: 100% | resets in 23h 0m
|
||||
team-ludo [UNKNOWN] Invalid API key (401)
|
||||
team-molto [OK] 5h: 32% | 7d: 45% | resets in 4h 0m
|
||||
team-nadja [OK] 5h: 52% | 7d: 39% | resets in 3h 0m
|
||||
team-buio [UNKNOWN] Invalid API key (401)
|
||||
shelley-proxy [OK] tokens: 4,800,000/4,800,000 | cost: $0.000013/call
|
||||
api-ateam [UNKNOWN] Anthropic API does not expose billing/quota via REST
|
||||
|
||||
────────────────────────────────────────────────────────────
|
||||
Overall: 1 CRITICAL, 3 OK, 3 UNKNOWN
|
||||
```
|
||||
|
||||
## JSON output schema
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-04-04T16:59:50.000Z",
|
||||
"providers": {
|
||||
"team-vigilio": {
|
||||
"type": "teams-direct",
|
||||
"status": "rejected",
|
||||
"utilization_5h": 0.94,
|
||||
"utilization_7d": 1.0,
|
||||
"representative_claim": "seven_day",
|
||||
"reset_timestamp": 1743800000,
|
||||
"reset_in_seconds": 82800,
|
||||
"organization_id": "1d7653ad-...",
|
||||
"severity": "critical",
|
||||
"probe_latency_ms": 210
|
||||
},
|
||||
"shelley-proxy": {
|
||||
"type": "shelley-proxy",
|
||||
"status": "ok",
|
||||
"tokens_remaining": 4800000,
|
||||
"tokens_limit": 4800000,
|
||||
"requests_remaining": 20000,
|
||||
"requests_limit": 20000,
|
||||
"tokens_reset": "2026-04-04T17:59:50Z",
|
||||
"cost_per_call_usd": 0.000013,
|
||||
"severity": "ok",
|
||||
"probe_latency_ms": 180
|
||||
},
|
||||
"api-ateam": {
|
||||
"type": "api-direct",
|
||||
"status": "no_billing_data",
|
||||
"message": "Anthropic API does not expose billing/quota via REST.",
|
||||
"severity": "unknown",
|
||||
"probe_latency_ms": 0
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Provider support
|
||||
|
||||
| Provider | Type | Data available |
|
||||
|----------|------|---------------|
|
||||
| team-vigilio, team-ludo, team-molto, team-nadja, team-buio | Anthropic Teams (direct) | 5h/7d utilization (0–100%), status, reset countdown, severity |
|
||||
| shelley-proxy | Shelley/exe.dev proxy | Token headroom, request headroom, per-call USD cost |
|
||||
| api-ateam | Anthropic API (pay-per-use) | Key validity only — no billing API exists |
|
||||
|
||||
## Severity levels
|
||||
|
||||
| Level | Condition |
|
||||
|-------|-----------|
|
||||
| `critical` | Teams provider rejected (rate limited / budget exhausted) |
|
||||
| `warning` | Teams 7d utilization > 85%, or 5h utilization > 70% |
|
||||
| `ok` | Healthy |
|
||||
| `unknown` | Invalid key (401), no billing data, or probe error |
|
||||
|
||||
## Rate-limit header schemas
|
||||
|
||||
**Teams direct** (oat01 keys — all `team-*` providers):
|
||||
Anthropic Teams uses the **unified** schema. Headers are present on every response (200 and 429).
|
||||
- `anthropic-ratelimit-unified-5h-utilization` — 5-hour window utilization (0.0–1.0)
|
||||
- `anthropic-ratelimit-unified-7d-utilization` — 7-day budget utilization (0.0–1.0)
|
||||
- `anthropic-ratelimit-unified-status` — `allowed` | `rejected`
|
||||
- `anthropic-ratelimit-unified-representative-claim` — which window is binding (`five_hour` | `seven_day`)
|
||||
- `anthropic-ratelimit-unified-reset` — Unix timestamp when binding window resets
|
||||
|
||||
**Shelley proxy** (exe.dev gateway):
|
||||
Uses classic absolute-count headers plus an exe.dev-injected cost header:
|
||||
- `Anthropic-Ratelimit-Tokens-Remaining` / `Limit` — absolute token counts
|
||||
- `Anthropic-Ratelimit-Requests-Remaining` / `Limit` — request counts
|
||||
- `Exedev-Gateway-Cost` — USD cost per call (unique to exe.dev gateway)
|
||||
|
||||
## Log files
|
||||
|
||||
Each run appends to `~/.logs/token-monitor/YYYY-MM-DD.jsonl`:
|
||||
|
||||
```bash
|
||||
# View today's log
|
||||
cat ~/.logs/token-monitor/$(date +%Y-%m-%d).jsonl | \
|
||||
python3 -c "import sys,json; [print(json.loads(l)['ts'], json.loads(l)['providers']['team-vigilio']['severity']) for l in sys.stdin]"
|
||||
|
||||
# Check when team-vigilio was last healthy
|
||||
grep '"team-vigilio"' ~/.logs/token-monitor/*.jsonl | \
|
||||
python3 -c "import sys,json; [print(l[:40]) for l in sys.stdin if json.loads(l.split(':',1)[1])['providers']['team-vigilio']['severity']=='ok']"
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
monitor.js — CLI entrypoint, orchestrates probes
|
||||
providers/
|
||||
index.js — reads ~/.pi/agent/models.json, returns typed provider list
|
||||
anthropic-teams.js — unified schema parser (oat01 keys, all team-* providers)
|
||||
anthropic-api.js — pay-per-use (api03 keys) — reports "no billing data"
|
||||
shelley-proxy.js — classic schema + Exedev-Gateway-Cost header
|
||||
logger.js — JSONL log to ~/.logs/token-monitor/
|
||||
report.js — human-readable summary + severity logic
|
||||
test.js — test suite (run: node test.js)
|
||||
```
|
||||
|
||||
## How to add a new provider
|
||||
|
||||
1. Add provider to `~/.pi/agent/models.json` with `"api": "anthropic-messages"`
|
||||
2. If it uses Teams unified schema and its name starts with `team-`: picked up automatically
|
||||
3. For a custom schema: create `providers/yourprovider.js`, implement `probeYourProvider()`, classify it in `providers/index.js`, add probe dispatch in `monitor.js`
|
||||
|
||||
## Probe behavior
|
||||
|
||||
The tool makes one minimal API call per provider to extract headers:
|
||||
- Model: `claude-haiku-4-5-20251001` (cheapest available)
|
||||
- Max tokens: 1 (minimizes cost)
|
||||
- Rate-limit headers are returned on **every** response (200 and 429) for Teams providers
|
||||
- A 429 from a maxed provider is expected and treated as valid quota data
|
||||
|
||||
**Stealth:** Read-only, one call per run, no hammering. Run at most once per session.
|
||||
|
||||
## Related
|
||||
|
||||
- `~/projects/provider-check/` — predecessor (liveness only, no quota depth)
|
||||
- `~/.pi/agent/models.json` — provider configuration source
|
||||
- Forgejo issue: trentuna/a-team#91
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue