After removing the previous step’s model output from this step’s newly appended tokens, how much external context growth (new user/tool content + framing) is actually added per step?
The previous assistant response is normally replayed into the next prompt, so it shows up inside the
next step’s newly_append_tokens (the cache-write / freshly-charged slice). Subtracting it isolates
the genuinely new content. The metric is an adjacent-step policy applied within a session:
the steps of each session are walked in order, and each step is compared to the step immediately
before it (round_index differing by exactly 1).
For each such (previous, current) pair the experiment derives, on current:
- adjusted append =
max(0, newly_append_tokens − prior-step output proxy), plus the intermediate signed_adjusted_append (pre-clip) and aclipped_after_subtractflag for pairs the subtraction drove negative.
Method and assumptions:
--subtract-policy(defaultclaude-and-gpt55) decides which pairs get the subtraction. Only Claude and Codex gpt-5.5 carry their prior output — including reasoning — forward into the next step’s append, so only thosepreviousrows are subtracted. gpt-5.4 and earlier Codex models do not carry reasoning forward, so subtracting it would over-count and spuriously clip the append to 0; those rows are left raw.allsubtracts every provider/model and is kept only for comparison.--subtract-output(defaulttotal) decides what quantity is subtracted from the selected rows:totalsubtracts the fulloutput_tokens(visible + reasoning), correct for Claude/gpt-5.5 since they replay reasoning;visible-for-codexinstead subtractsoutput_tokens − reasoning_output_tokensfor Codex. See../../../docs/prompt_cache_accounting.mdfor the cache-accounting evidence.- This is an approximation, not an identity — output can include hidden/thinking tokens and raw tool outputs may be clipped/compacted before being sent.
- Adjacency ordering is file order within a session. This is the same order the pre-migration
JSONL loader used: rows are grouped per
session_idin first-appearance (file) order, then stably sorted byround_index, so ties keep file order. The shared DuckDB surrogate keyingest_seq(= round_pk) is that file order, so pullingORDER BY ingest_seqand grouping in Python reproduces both the per-session row order and the session-visitation order byte-for-byte. - Summary CSV is exact (full data). The per-group quantiles, median, min, and max are computed
over every pair, not a sample. The percentile method is the legacy linear-interpolation helper
(
(n−1)·q), matchingnp.percentile’s default. - The scatter is a reservoir subsample (
--pair-sample-size, default 80k). Because the sampler is preserved and fed in file order (ingest_seq), it retains exactly the same points as the pre-migration loader — the scatter figure is byte-for-byte unchanged on a fixed trace.