For each agent step, how large is the input prompt and how is it split between the cached prefix it reuses and the newly appended tokens it must pay for?
Every agent step in the trace carries an input-token accounting:
input_tokens_total = prefix_tokens + newly_append_tokens.
- prefix_tokens = the cached prefix reused from the previous request
(Claude
cache_read_input_tokens; Codexcached_input_tokens). - newly_append_tokens = tokens charged as new in this step (Claude
input_tokens + cache_creation_input_tokens; Codexinput_tokens_total − cached_input_tokens). See../../../docs/prompt_cache_accounting.mdfor the full cache-accounting derivation.
This experiment renders the prefix/append distributions (histogram + CDF), a prefix-vs-append scatter, and a token-mass-weighted view of append lengths.
Method and assumptions:
- Exact, not sampled. Histograms, CDFs, percentiles, means, and the append-weighted bins are
computed over every step via the shared trace DuckDB. (The old per-script loader
reservoir-sampled at 200k/group to bound memory; that cap is gone, so the
sampledcolumn is alwaysFalseandsample_countequals the fullcount.) Percentiles usenp.percentile(linear interpolation); the mean reproduces the old running float sum exactly by summing the per-group values in ingest (round_pk) order. - Validity gate. A token value counts when it is non-null and
>= 0(the old NumericTracker’sallow_zerorule); nulls feedmissing, negatives feedinvalid. The append-weighted bins and the scatter use rows where both prefix and append are>= 0(the old loader’s pair gate). - Binary token axis. Distributions are plotted on a base-2 log token axis.
- Grouping follows
--group-by(defaultprovider; alsomodel/provider_model), with<unknown-provider>/<unknown-model>COALESCE fallbacks mirroring the oldgroup_key. - Append-weighted bins weight each append-token bucket by total tokens, so the bars show where the token mass lives, not just step counts.
- The scatter is a deterministic visual subsample. A prefix-vs-append scatter cannot draw
350k+ points, so it keeps a fixed-size subsample (
--pair-sample-size, default 80k). Instead of the old reservoir, the subsample is chosen in SQL by a Knuth-multiplicative hash of the surrogate key:ORDER BY (round_pk * 2654435761) % 1000000, round_pk LIMIT <pair-sample-size>over rows withprefix_tokens >= 0 AND newly_append_tokens >= 0. This is reproducible across DB builds and engines but is not the old reservoir, so the scatter figure is not byte-compatible with the pre-migration run (the CSVs are).