What fraction of each agent step’s input is served from the cached prefix, and how does that prefix hit ratio distribute across user-initiated and tool-triggered steps?
Every agent step in the trace carries an input-token accounting
(input_tokens_total = prefix_tokens + newly_append_tokens). This experiment treats the cached
prefix as a cache hit and the freshly-appended tokens as the miss, and reports the per-step
hit ratio, split by what started the step.
Method and assumptions:
- Prefix hit ratio =
prefix_tokens / (prefix_tokens + newly_append_tokens)per step — the share of input tokens that were a cache read rather than newly charged. - Step eligibility / trigger. A step is included only when its first timing event
(
timing_eventswithevent_index = 1, the in-order first event) is auser_messageor atool_result; all other first-event types (and steps with no timing events) are dropped. That first event also sets the trigger:userforuser_message,tool_resultotherwise. Thefirst_input_event_typecolumn is not used — it diverges from the first timing event (it has nulls where a first timing event exists), so the legacytiming_events[0]semantics are reproduced via the timing-events table. - Validity gate. Both
prefix_tokensandnewly_append_tokensmust be non-null, and their sum must be> 0(a zero/empty step contributes nothing and never divides by zero). - Exact, not sampled. Means, percentiles (custom linear interpolation, matching
np.percentile), histograms, and bins are computed over every eligible step via the shared trace DuckDB — there is no reservoir cap. - Grouping. Each step feeds both its provider scope and the
mergedscope, under both its trigger and thealltrigger. Scopes reported:merged/claude/codex(a null provider falls back tounknown). The append-weighted views weight each step by its append tokens to show where the token mass sits, not just step counts.