With extra-usage credit (post April 4 2026), 7d per-seat limits no
longer block sessions — broken OAuth tokens do. provider-check.json
(written hourly by health-pulse) tests actual pi session startup.
Changes:
- Load /tmp/provider-check.json (if fresh, < 2h old) before selection
- Filter candidates to pi-usable providers only
- If filter would empty the pool, fall through to budget-only logic
- Reason string includes 'pi-check' when filter was applied
- Handles stale file, missing file, parse errors gracefully
This fixes the monitoring gap where budget API probes and pi session
usability diverge (e.g. team-buio: budget OK, pi ETIMEDOUT at 12:01)
Refs: trentuna/token-monitor#4
allowed_warning providers can serve requests — only the budget is
approaching its limit. Previously they were excluded from both Phase 1
and Phase 2 selection, causing unnecessary escalation to shelley-proxy
emergency fallback when team-vigilio was at 79% 7d (allowed_warning)
and team-ludo was showing invalid_key in the health-pulse cache.
Now:
- Phase 1: first provider under threshold with status allowed or allowed_warning
- Phase 2: lowest-utilization provider with either status; reason notes warning
Effect: next wake picks team-vigilio (79% 7d, warning) instead of
shelley-proxy. Shelley-proxy is now a true last resort again.
invalid_key (HTTP 401) can be transient during key rotation or
temporary API issues — unlike rejected/exhausted which are stable
budget states. When cache shows all chain providers as invalid_key,
bypass cache and probe fresh so recovery is immediate instead of
waiting for the 20-minute TTL to expire.
Selects optimal Teams provider from chain based on real 7d utilization.
Uses cached monitor data (no extra API calls if fresh cache exists).
- Phase 1: first provider in chain with 7d util < SWITCH_THRESHOLD (default 75%)
- Phase 2: all over threshold → pick lowest 7d allowed provider
- Phase 3: all rejected → emergency=true, signals shelley-proxy needed
- Always fails safe: returns team-vigilio on any error