From b504977853d929dd762948277c8f6e8f65b0347b Mon Sep 17 00:00:00 2001 From: "H.M. Murdock" Date: Mon, 6 Apr 2026 02:27:23 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20add=20phase3-piggyback.md=20=E2=80=94?= =?UTF-8?q?=20piggyback=20header=20capture=20+=20repo=20location=20recomme?= =?UTF-8?q?ndation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/phase3-piggyback.md | 85 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 docs/phase3-piggyback.md diff --git a/docs/phase3-piggyback.md b/docs/phase3-piggyback.md new file mode 100644 index 0000000..9e16c94 --- /dev/null +++ b/docs/phase3-piggyback.md @@ -0,0 +1,85 @@ +# Phase 3: Piggyback Header Capture + +## What it is + +Instead of making a dedicated probe call at each wake to read rate-limit state, **piggyback capture** reads the same headers from LLM responses that are already happening during normal pi sessions. Zero extra API calls. No probe latency. Every real conversation generates a data point automatically. + +## Why it's better + +The current probe approach has a structural limitation: it samples state once per wake, at most every 20 minutes (cache guard). Real usage happens between probes — the 5h window can fill and reset while Vigilio is working. Piggyback gets a reading on every turn, tied to actual usage events, with sub-minute resolution. + +- **Zero overhead** — no extra API calls, no added latency, no token spend +- **Temporal accuracy** — readings tied to real usage moments, not arbitrary probe intervals +- **Richer signal** — burn rate analysis improves dramatically with higher sample frequency +- **No polling logic** — the data arrives when there's data to report + +## Headers already present + +Every Anthropic API response already carries the full rate-limit family. These are the same headers `anthropic-teams.js` parses today: + +``` +anthropic-ratelimit-unified-status +anthropic-ratelimit-unified-5h-utilization +anthropic-ratelimit-unified-5h-reset +anthropic-ratelimit-unified-7d-utilization +anthropic-ratelimit-unified-7d-reset +anthropic-ratelimit-unified-representative-claim +anthropic-ratelimit-unified-reset +``` + +Present on every response — 200 and 429 alike. The data is already flowing; we just aren't capturing it. + +## Where pi would need instrumentation + +Pi's extension system exposes `before_provider_request` for outbound payload inspection, but **no documented hook exposes raw response headers** on the inbound side. The intercept point is the **custom provider wrapper**. + +Pi supports `registerProvider("anthropic", { baseUrl: ... })` to override the endpoint. A thin proxy wrapper could: + +1. Forward the request to the real Anthropic endpoint +2. Capture response headers before returning the stream to pi +3. Append a JSONL entry to the piggyback log + +This is a `~/.pi/agent/extensions/token-monitor-piggyback.ts` — a pi extension that registers itself as the Anthropic provider, wraps the actual call, and writes the side channel. + +Alternative approach if a response hook is added to pi in future: `pi.on("after_provider_response", ...)` with `event.headers` exposed. Cleaner, no proxy indirection. Worth filing upstream. + +## Data interface: minimal viable integration + +Pi extension writes to: + +``` +~/.logs/token-monitor/piggyback.jsonl +``` + +Same path structure as today's probe logs. Same JSONL format: + +```jsonl +{"ts":"2026-04-06T14:23:01Z","source":"piggyback","provider":"team-vigilio","type":"teams-direct","status":"allowed","utilization_5h":0.42,"utilization_7d":0.61,"reset_in_seconds":14400} +``` + +`analyze.js` reads both probe entries and piggyback entries from the same file. The `source` field distinguishes them. Probe entries continue working as fallback when no real conversation has occurred yet (first wake of the day, dormant accounts). + +## What remains unknown + +- **Header exposure**: Pi doesn't currently expose raw response headers in any extension event. The custom provider proxy approach works but adds complexity. Check whether a future pi release adds `after_provider_response`. +- **Streaming interception**: Anthropic responses stream. Headers arrive before the body. The proxy needs to capture headers and write the log entry without buffering the full response — should be fine but needs testing. +- **Multi-provider coverage**: Piggyback naturally works for whichever provider is active. Dormant accounts still need probe calls to confirm they're dormant. Hybrid approach is probably permanent. +- **Extension packaging**: Should this live as a pi extension in `commons/pi/extensions/` alongside bootstrap, or as a standalone script? Depends on repo location decision below. + +--- + +## Repo location recommendation + +**Options assessed:** + +1. **Stay as `trentuna/token-monitor`** — Works fine, but it's isolated. The tool serves all trentuna members; a separate repo means separate cloning, separate updates, and `token-status.sh` already lives in `~/os/` outside it. The split is already awkward. + +2. **Move into `trentuna/commons`** — Natural fit. `commons` is the shared config layer for all trentuna members; `bootstrap.sh` already handles pi setup. Token monitoring is infrastructure, not a standalone product. `analyze.js`, `monitor.js`, and a future piggyback extension would sit alongside other shared operational tools. Ludo explicitly named this option. + +3. **Split: code in token-monitor, `token-status.sh` to `vigilio/os`** — The split already exists informally (`token-status.sh` is in `~/os/`). Formalizing it adds a cross-repo dependency without resolving the underlying issue. More moving parts, same problem. + +4. **Merge into `vigilio/os` entirely** — `token-status.sh` already lives here and it's close to `vigilio.sh`. But `vigilio/os` is vigilio-specific; the monitor is multi-member infrastructure. Wrong home. + +**Recommendation: Option 2 — move into `trentuna/commons`.** + +The monitor is trentuna infrastructure. `commons` already owns bootstrap, pi config, and model provisioning for all members. Token monitoring belongs in the same layer. A future piggyback extension would live in `commons/pi/extensions/`, wired up by bootstrap automatically for every member. `token-status.sh` stays in `~/os/` (vigilio-local runtime script) and just calls the tool from its new location — one path update.