SyFI TraceLab
Trace assistant
Reading the public SYFI pool
357,161 agent steps across Claude & Codex — public, shareable.
Answers run real DuckDB/Python in a sandbox · code is shown
All figures
Session
one continuous trace of work, often spanning multiple requests or problems.
Request
one user input through the agent's final response.
Agent step
one model call inside a request.
User-initiated step
an agent step started by user input.
Tool-triggered step
an agent step started by a tool result.
Problem

From a consumer perspective, how much extra append-prefill and API cost does a user pay because “thinking” time can turn prefix-cache hits into misses?

Tables
MetricClaudeCodexTotal
User-initiated steps with predecessor16,92717,03333,960
Observed append tokens1.19B1.15B2.34B
Append after retained cache541.9M721.7M1.26B
Append-token reduction648.1M (54.5%)423.9M (37.0%)1.07B (45.9%)
Observed total cost$22,654$17,777$40,431
Cost after retained cache$18,973$16,269$35,242
Total cost reduction$3,680 (16.2%)$1,508 (8.5%)$5,189 (12.8%)
Cost saved / reduced step avg$0.263$0.116$0.192
Table 1Upper-bound append-token and cost savings from eliminating user-thinking-induced prefix-cache misses.

From this consumer perspective, if user-initiated steps retained their prefix cache across human thinking time, append-prefill would drop from 2.34B to 1.26B tokens in the merged trace: a 1.07B-token reduction, or 45.9% of all append tokens. Because the estimate only changes user-initiated steps, that reduction is 95.1% of observed user-initiated append tokens.

With pricing.json prices as of 2026-06, the estimated final cost falls from $40,431 to $35,242, saving $5,189 (12.8%) over priced rounds. The split is 648.1M fewer append tokens and $3,680 saved for Claude, and 423.9M fewer append tokens and $1,508 saved for Codex. Dollar totals price 99.1% of rounds; token reductions include all rounds.

Reference
Experiment overview

This is an upper-bound savings estimate, not an observed cache metric. We analyze prefix caching from a consumer perspective: the user-visible cost of “thinking” is the extra fresh-input prefill billed when a user-initiated step resumes after the prefix cache has expired. For every user-initiated step S with a predecessor P in the same session, the estimate caps observed append at the step’s net context growth:

total_input(S)            = prefix_tokens(S) + newly_append_tokens(S)
context_growth(S)         = max(0, total_input(S) - total_input(P))
append_after_retained_cache(S)  = min(newly_append_tokens(S), context_growth(S))
prefix_after_retained_cache(S)  = total_input(S) - append_after_retained_cache(S)

All other steps keep their observed prefix_tokens / newly_append_tokens split. The total input length and output tokens do not change. Any append-token reduction is moved into prefix tokens and billed at the cache-read price; remaining Claude cache-creation tokens are billed at the 5-minute cache-write rate. Because the estimate assumes all shifted tokens can be served from cache at the cache-read rate, the resulting savings are an upper bound rather than an achievable policy guarantee.

Method and assumptions:

  • A user-initiated step is a round whose first timing event is user_message, matching the trigger convention used by cache_hit_ratio and redundant_prefill.
  • Pairing follows redundant_prefill / session/total_input_growth: the predecessor is the last round seen for the same session_id in round_pk file order. Session-first user steps have no predecessor and remain unchanged.
  • This isolates the consumer-side cost of human thinking time. Tool-result steps and session-first steps are left as observed.
  • Costs use artifacts/utils/pricing.json through artifacts/web_analytics/pricing.py: append at fresh-input/cache-write rates, prefix at the cache-read rate, output unchanged. Unpriced rounds contribute to token counts but are excluded from dollar totals.
Code structure
  • collect(con) streams rounds joined with each round’s first timing event, walks sessions in file order, and accumulates observed vs. retained-cache token/cost totals by scope.
  • ScopeAccum stores token totals, user-step coverage, priced-round coverage, and observed / retained-cache cost buckets. It also stores per-reduced-step cost-saved samples and preceding human idle gaps for avg / p50 / p90 rows.
  • write_summary_csv(...), render_md(...), and write_latex_table(...) emit the raw summary, web table, and optional paper table.
Running it
# default merged trace (materialized to a temp DuckDB cache on first use)
uv run python artifacts/prefix_cache/human_idle_cache_counterfactual/analyze.py

# a prebuilt DB, into a chosen dir
uv run python artifacts/prefix_cache/human_idle_cache_counterfactual/analyze.py \
  --db trace/syfi_coding_trace.duckdb -o /tmp/out
Outputs
  • human_idle_cache_counterfactual_summary.csv - raw observed vs. retained-cache token and cost totals per scope (merged, claude, codex).
  • human_idle_cache_counterfactual.md - GFM table for the web detail page.
  • human_idle_cache_counterfactual.tex - optional LaTeX table.
  • headline.json - headline values for the Overview gallery card.
SyFI TraceLab · experiment detail