Modular LLM API quota and usage visibility tool

Find a file

Vigilio Desto aefa35a5ca Merge pull request 'Budget Intelligence & TUI — programmatic spend data, live dashboard, mission cost tracking' (#5 ) from budget-intel into main		2026-04-08 09:12:08 +00:00
docs	docs: add phase3-piggyback.md — piggyback header capture + repo location recommendation	2026-04-06 02:27:23 +00:00
providers	Handle policy_rejected status (Anthropic April 4 billing change)	2026-04-08 05:38:46 +00:00
analyze.js	feat: --budget-json and --mission cost tracking (Face + Murdock objectives)	2026-04-08 08:28:18 +00:00
configure-key-limits.js	add configure-key-limits.js — per-key QPS/QPM rate limit script	2026-04-06 11:00:09 +00:00
logger.js	Phase 2: analysis layer (analyze.js), cache guard, log hygiene	2026-04-05 04:49:05 +00:00
monitor.js	Phase 2: analysis layer (analyze.js), cache guard, log hygiene	2026-04-05 04:49:05 +00:00
package-lock.json	Handle policy_rejected status (Anthropic April 4 billing change)	2026-04-08 05:38:46 +00:00
package.json	build: token-monitor v0.1.0 — modular LLM API quota visibility	2026-04-04 17:01:05 +00:00
README.md	docs: README overhaul — add analyze.js, wake integration, Quick start, fix provider table and architecture	2026-04-06 02:26:51 +00:00
recommend.js	Fix recommend.js: include allowed_warning in provider selection	2026-04-07 23:17:39 +00:00
report.js	Handle policy_rejected status (Anthropic April 4 billing change)	2026-04-08 05:38:46 +00:00
test.js	Phase 2: analysis layer (analyze.js), cache guard, log hygiene	2026-04-05 04:49:05 +00:00
tui.js	feat: tui.js — live ANSI terminal dashboard (B.A. objective)	2026-04-08 08:29:40 +00:00

README.md

token-monitor

Modular LLM API quota and usage visibility tool. Extracts rate-limit and usage data from all configured LLM providers before failures happen.

Why it exists: team-vigilio hit its 7-day rate limit (9+ days of 429s). api-ateam ran out of credit mid-session. We kept flying blind. This tool surfaces quota health before the failure.

Quick start

node monitor.js          # run now — human-readable output + log
node analyze.js          # analyze accumulated logs — burn rates, rotation
~/os/token-status.sh     # what Vigilio's wake prompt sees (automated path)

Usage

node monitor.js                          # human-readable summary + log to file
node monitor.js --json                   # JSON output + log to file
node monitor.js --summary                # human-readable only (no log)
node monitor.js --provider team-nadja    # single provider
node monitor.js --no-log                 # suppress log file write

Analysis

node analyze.js                         # full report
node analyze.js --burn-rate             # burn rate per account
node analyze.js --weekly                # weekly budget reconstruction
node analyze.js --stagger               # reset schedule (next 48h)
node analyze.js --rotation              # rotation recommendation
node analyze.js --json                  # JSON output (all sections)
node analyze.js --provider team-nadja   # filter to one provider
node analyze.js --prune [--dry-run]     # archive and prune logs > 30 days

Burn Rate — delta analysis of 7d utilization over time, projected exhaustion at current rate.

Reset Schedule — providers resetting within the next 48 hours, sorted ascending by time to reset.

Weekly Reconstruction — peak and average 7d utilization per provider per ISO week. Shows exhaustion events.

Rotation Recommendation — ranked provider list by headroom, deprioritizing maxed/rejected/invalid-key accounts.

Underspend Alerts — active accounts with ≥ 40% of 5h window unused and < 2h until reset.

Example output

Token Monitor — 2026-04-04 16:59 UTC
════════════════════════════════════════════════════════════

team-vigilio    [CRITICAL]  MAXED — 7d: 100% | resets in 23h 0m
team-ludo       [UNKNOWN]   Invalid API key (401)
team-molto      [OK]        5h: 32% | 7d: 45% | resets in 4h 0m
team-nadja      [OK]        5h: 52% | 7d: 39% | resets in 3h 0m
team-buio       [UNKNOWN]   Invalid API key (401)
shelley-proxy   [OK]        tokens: 4,800,000/4,800,000 | cost: $0.000013/call
api-ateam       [UNKNOWN]   Anthropic API does not expose billing/quota via REST

────────────────────────────────────────────────────────────
Overall: 1 CRITICAL, 3 OK, 3 UNKNOWN

JSON output schema

{
  "timestamp": "2026-04-04T16:59:50.000Z",
  "providers": {
    "team-vigilio": {
      "type": "teams-direct",
      "status": "rejected",
      "utilization_5h": 0.94,
      "utilization_7d": 1.0,
      "representative_claim": "seven_day",
      "reset_timestamp": 1743800000,
      "reset_in_seconds": 82800,
      "organization_id": "1d7653ad-...",
      "severity": "critical",
      "probe_latency_ms": 210
    },
    "shelley-proxy": {
      "type": "shelley-proxy",
      "status": "ok",
      "tokens_remaining": 4800000,
      "tokens_limit": 4800000,
      "requests_remaining": 20000,
      "requests_limit": 20000,
      "tokens_reset": "2026-04-04T17:59:50Z",
      "cost_per_call_usd": 0.000013,
      "severity": "ok",
      "probe_latency_ms": 180
    },
    "api-ateam": {
      "type": "api-direct",
      "status": "no_billing_data",
      "message": "Anthropic API does not expose billing/quota via REST.",
      "severity": "unknown",
      "probe_latency_ms": 0
    }
  }
}

Provider support

Provider	Type	Data available
team-vigilio, team-ludo, team-molto, team-nadja, team-buio	Anthropic Teams (direct)	5h/7d utilization (0–100%), status, reset countdown, severity
shelley-proxy	Shelley/exe.dev proxy	Token headroom, request headroom, per-call USD cost
api-ateam	Anthropic API (pay-per-use)	Key validity only — no billing API exists
google-gemini	Gemini API (free tier)	Quota violation detail, retry delay, key validity
xai-face, xai-amy, xai-murdock, xai-ba, xai-vigilio	xAI/Grok	Request/token remaining counts, rate limit status

Severity levels

Level	Condition
`critical`	Teams provider rejected (rate limited / budget exhausted)
`warning`	Teams 7d utilization > 85%, or 5h utilization > 70%
`ok`	Healthy
`unknown`	Invalid key (401), no billing data, or probe error

Rate-limit header schemas

Teams direct (oat01 keys — all team-* providers): Anthropic Teams uses the unified schema. Headers are present on every response (200 and 429).

anthropic-ratelimit-unified-5h-utilization — 5-hour window utilization (0.0–1.0)
anthropic-ratelimit-unified-7d-utilization — 7-day budget utilization (0.0–1.0)
anthropic-ratelimit-unified-status — allowed | rejected
anthropic-ratelimit-unified-representative-claim — which window is binding (five_hour | seven_day)
anthropic-ratelimit-unified-reset — Unix timestamp when binding window resets

Shelley proxy (exe.dev gateway): Uses classic absolute-count headers plus an exe.dev-injected cost header:

Anthropic-Ratelimit-Tokens-Remaining / Limit — absolute token counts
Anthropic-Ratelimit-Requests-Remaining / Limit — request counts
Exedev-Gateway-Cost — USD cost per call (unique to exe.dev gateway)

Log files

Each run appends to ~/.logs/token-monitor/YYYY-MM-DD.jsonl:

# View today's log
cat ~/.logs/token-monitor/$(date +%Y-%m-%d).jsonl | \
  python3 -c "import sys,json; [print(json.loads(l)['ts'], json.loads(l)['providers']['team-vigilio']['severity']) for l in sys.stdin]"

# Check when team-vigilio was last healthy
grep '"team-vigilio"' ~/.logs/token-monitor/*.jsonl | \
  python3 -c "import sys,json; [print(l[:40]) for l in sys.stdin if json.loads(l.split(':',1)[1])['providers']['team-vigilio']['severity']=='ok']"

Architecture

monitor.js              — CLI entrypoint, orchestrates probes
analyze.js              — analysis CLI (burn rates, weekly, stagger, rotation)
providers/
  index.js              — reads ~/.pi/agent/models.json, returns typed provider list
  anthropic-teams.js    — unified schema parser (oat01 keys, all team-* providers)
  anthropic-api.js      — pay-per-use (api03 keys) — reports "no billing data"
  shelley-proxy.js      — classic schema + Exedev-Gateway-Cost header
  gemini.js             — Gemini API (free tier, quota via response body)
  xai.js                — x.ai/Grok (rate-limit headers)
logger.js               — JSONL log to ~/.logs/token-monitor/
report.js               — human-readable summary + severity logic
test.js                 — test suite (run: node test.js)
docs/
  analyze.md            — analysis CLI full reference

How to add a new provider

Add provider to ~/.pi/agent/models.json with "api": "anthropic-messages"
If it uses Teams unified schema and its name starts with team-: picked up automatically
For a custom schema: create providers/yourprovider.js, implement probeYourProvider(), classify it in providers/index.js, add probe dispatch in monitor.js

Probe behavior

The tool makes one minimal API call per provider to extract headers:

Model: claude-haiku-4-5-20251001 (cheapest available)
Max tokens: 1 (minimizes cost)
Rate-limit headers are returned on every response (200 and 429) for Teams providers
A 429 from a maxed provider is expected and treated as valid quota data

Stealth: Read-only, one call per run, no hammering. Run at most once per session.

~/os/token-status.sh — wake-prompt integration (calls monitor.js, formats for beat.sh)
~/projects/provider-check/ — predecessor (liveness only, no quota depth)
~/.pi/agent/models.json — provider configuration source
Forgejo issue: trentuna/token-monitor#1

Wake integration

~/os/token-status.sh is the automated interface. It runs monitor.js --json and formats the output into a compact summary block for injection into Vigilio's wake prompt via beat.sh.

# Manual invocation (same as what the wake prompt sees)
~/os/token-status.sh

# Output format:
## Token Economics
Anthropic Teams (5 seats):
  team-vigilio     ✗ MAXED 7d:100% resets 23h
  team-molto       ✓ 5h:32% 7d:45% resets 4h
  ...
→ Current recommendation: use team-molto | avoid team-vigilio

File location: ~/os/token-status.sh
Called by: ~/os/beat.sh (Vigilio wake script)
Uses: monitor.js --json with 20-minute cache guard (won't double-probe within a session)

README.md Unescape Escape