5.5 KiB
Phase 3: Piggyback Header Capture
What it is
Instead of making a dedicated probe call at each wake to read rate-limit state, piggyback capture reads the same headers from LLM responses that are already happening during normal pi sessions. Zero extra API calls. No probe latency. Every real conversation generates a data point automatically.
Why it's better
The current probe approach has a structural limitation: it samples state once per wake, at most every 20 minutes (cache guard). Real usage happens between probes — the 5h window can fill and reset while Vigilio is working. Piggyback gets a reading on every turn, tied to actual usage events, with sub-minute resolution.
- Zero overhead — no extra API calls, no added latency, no token spend
- Temporal accuracy — readings tied to real usage moments, not arbitrary probe intervals
- Richer signal — burn rate analysis improves dramatically with higher sample frequency
- No polling logic — the data arrives when there's data to report
Headers already present
Every Anthropic API response already carries the full rate-limit family. These are the same headers anthropic-teams.js parses today:
anthropic-ratelimit-unified-status
anthropic-ratelimit-unified-5h-utilization
anthropic-ratelimit-unified-5h-reset
anthropic-ratelimit-unified-7d-utilization
anthropic-ratelimit-unified-7d-reset
anthropic-ratelimit-unified-representative-claim
anthropic-ratelimit-unified-reset
Present on every response — 200 and 429 alike. The data is already flowing; we just aren't capturing it.
Where pi would need instrumentation
Pi's extension system exposes before_provider_request for outbound payload inspection, but no documented hook exposes raw response headers on the inbound side. The intercept point is the custom provider wrapper.
Pi supports registerProvider("anthropic", { baseUrl: ... }) to override the endpoint. A thin proxy wrapper could:
- Forward the request to the real Anthropic endpoint
- Capture response headers before returning the stream to pi
- Append a JSONL entry to the piggyback log
This is a ~/.pi/agent/extensions/token-monitor-piggyback.ts — a pi extension that registers itself as the Anthropic provider, wraps the actual call, and writes the side channel.
Alternative approach if a response hook is added to pi in future: pi.on("after_provider_response", ...) with event.headers exposed. Cleaner, no proxy indirection. Worth filing upstream.
Data interface: minimal viable integration
Pi extension writes to:
~/.logs/token-monitor/piggyback.jsonl
Same path structure as today's probe logs. Same JSONL format:
{"ts":"2026-04-06T14:23:01Z","source":"piggyback","provider":"team-vigilio","type":"teams-direct","status":"allowed","utilization_5h":0.42,"utilization_7d":0.61,"reset_in_seconds":14400}
analyze.js reads both probe entries and piggyback entries from the same file. The source field distinguishes them. Probe entries continue working as fallback when no real conversation has occurred yet (first wake of the day, dormant accounts).
What remains unknown
- Header exposure: Pi doesn't currently expose raw response headers in any extension event. The custom provider proxy approach works but adds complexity. Check whether a future pi release adds
after_provider_response. - Streaming interception: Anthropic responses stream. Headers arrive before the body. The proxy needs to capture headers and write the log entry without buffering the full response — should be fine but needs testing.
- Multi-provider coverage: Piggyback naturally works for whichever provider is active. Dormant accounts still need probe calls to confirm they're dormant. Hybrid approach is probably permanent.
- Extension packaging: Should this live as a pi extension in
commons/pi/extensions/alongside bootstrap, or as a standalone script? Depends on repo location decision below.
Repo location recommendation
Options assessed:
-
Stay as
trentuna/token-monitor— Works fine, but it's isolated. The tool serves all trentuna members; a separate repo means separate cloning, separate updates, andtoken-status.shalready lives in~/os/outside it. The split is already awkward. -
Move into
trentuna/commons— Natural fit.commonsis the shared config layer for all trentuna members;bootstrap.shalready handles pi setup. Token monitoring is infrastructure, not a standalone product.analyze.js,monitor.js, and a future piggyback extension would sit alongside other shared operational tools. Ludo explicitly named this option. -
Split: code in token-monitor,
token-status.shtovigilio/os— The split already exists informally (token-status.shis in~/os/). Formalizing it adds a cross-repo dependency without resolving the underlying issue. More moving parts, same problem. -
Merge into
vigilio/osentirely —token-status.shalready lives here and it's close tovigilio.sh. Butvigilio/osis vigilio-specific; the monitor is multi-member infrastructure. Wrong home.
Recommendation: Option 2 — move into trentuna/commons.
The monitor is trentuna infrastructure. commons already owns bootstrap, pi config, and model provisioning for all members. Token monitoring belongs in the same layer. A future piggyback extension would live in commons/pi/extensions/, wired up by bootstrap automatically for every member. token-status.sh stays in ~/os/ (vigilio-local runtime script) and just calls the tool from its new location — one path update.