token-monitor/docs/phase3-piggyback.md

5.5 KiB

Phase 3: Piggyback Header Capture

What it is

Instead of making a dedicated probe call at each wake to read rate-limit state, piggyback capture reads the same headers from LLM responses that are already happening during normal pi sessions. Zero extra API calls. No probe latency. Every real conversation generates a data point automatically.

Why it's better

The current probe approach has a structural limitation: it samples state once per wake, at most every 20 minutes (cache guard). Real usage happens between probes — the 5h window can fill and reset while Vigilio is working. Piggyback gets a reading on every turn, tied to actual usage events, with sub-minute resolution.

  • Zero overhead — no extra API calls, no added latency, no token spend
  • Temporal accuracy — readings tied to real usage moments, not arbitrary probe intervals
  • Richer signal — burn rate analysis improves dramatically with higher sample frequency
  • No polling logic — the data arrives when there's data to report

Headers already present

Every Anthropic API response already carries the full rate-limit family. These are the same headers anthropic-teams.js parses today:

anthropic-ratelimit-unified-status
anthropic-ratelimit-unified-5h-utilization
anthropic-ratelimit-unified-5h-reset
anthropic-ratelimit-unified-7d-utilization
anthropic-ratelimit-unified-7d-reset
anthropic-ratelimit-unified-representative-claim
anthropic-ratelimit-unified-reset

Present on every response — 200 and 429 alike. The data is already flowing; we just aren't capturing it.

Where pi would need instrumentation

Pi's extension system exposes before_provider_request for outbound payload inspection, but no documented hook exposes raw response headers on the inbound side. The intercept point is the custom provider wrapper.

Pi supports registerProvider("anthropic", { baseUrl: ... }) to override the endpoint. A thin proxy wrapper could:

  1. Forward the request to the real Anthropic endpoint
  2. Capture response headers before returning the stream to pi
  3. Append a JSONL entry to the piggyback log

This is a ~/.pi/agent/extensions/token-monitor-piggyback.ts — a pi extension that registers itself as the Anthropic provider, wraps the actual call, and writes the side channel.

Alternative approach if a response hook is added to pi in future: pi.on("after_provider_response", ...) with event.headers exposed. Cleaner, no proxy indirection. Worth filing upstream.

Data interface: minimal viable integration

Pi extension writes to:

~/.logs/token-monitor/piggyback.jsonl

Same path structure as today's probe logs. Same JSONL format:

{"ts":"2026-04-06T14:23:01Z","source":"piggyback","provider":"team-vigilio","type":"teams-direct","status":"allowed","utilization_5h":0.42,"utilization_7d":0.61,"reset_in_seconds":14400}

analyze.js reads both probe entries and piggyback entries from the same file. The source field distinguishes them. Probe entries continue working as fallback when no real conversation has occurred yet (first wake of the day, dormant accounts).

What remains unknown

  • Header exposure: Pi doesn't currently expose raw response headers in any extension event. The custom provider proxy approach works but adds complexity. Check whether a future pi release adds after_provider_response.
  • Streaming interception: Anthropic responses stream. Headers arrive before the body. The proxy needs to capture headers and write the log entry without buffering the full response — should be fine but needs testing.
  • Multi-provider coverage: Piggyback naturally works for whichever provider is active. Dormant accounts still need probe calls to confirm they're dormant. Hybrid approach is probably permanent.
  • Extension packaging: Should this live as a pi extension in commons/pi/extensions/ alongside bootstrap, or as a standalone script? Depends on repo location decision below.

Repo location recommendation

Options assessed:

  1. Stay as trentuna/token-monitor — Works fine, but it's isolated. The tool serves all trentuna members; a separate repo means separate cloning, separate updates, and token-status.sh already lives in ~/os/ outside it. The split is already awkward.

  2. Move into trentuna/commons — Natural fit. commons is the shared config layer for all trentuna members; bootstrap.sh already handles pi setup. Token monitoring is infrastructure, not a standalone product. analyze.js, monitor.js, and a future piggyback extension would sit alongside other shared operational tools. Ludo explicitly named this option.

  3. Split: code in token-monitor, token-status.sh to vigilio/os — The split already exists informally (token-status.sh is in ~/os/). Formalizing it adds a cross-repo dependency without resolving the underlying issue. More moving parts, same problem.

  4. Merge into vigilio/os entirelytoken-status.sh already lives here and it's close to vigilio.sh. But vigilio/os is vigilio-specific; the monitor is multi-member infrastructure. Wrong home.

Recommendation: Option 2 — move into trentuna/commons.

The monitor is trentuna infrastructure. commons already owns bootstrap, pi config, and model provisioning for all members. Token monitoring belongs in the same layer. A future piggyback extension would live in commons/pi/extensions/, wired up by bootstrap automatically for every member. token-status.sh stays in ~/os/ (vigilio-local runtime script) and just calls the tool from its new location — one path update.