SyFI TraceLab
数据助手
正在读取公开 SYFI 数据池
覆盖 Claude 和 Codex 的 357,161 个智能体步骤,可公开分享。
答案会在沙箱中运行真实 DuckDB/Python,并展示代码
全部图表
会话
一段连续的工作记录,通常包含多个请求或问题。
请求
从一次用户输入到智能体最终响应的完整过程。
智能体步骤
请求内部的一次模型调用。
用户触发步骤
由用户输入启动的智能体步骤。
工具触发步骤
由工具结果启动的智能体步骤。
问题

From a consumer perspective, how much extra append-prefill and API cost does a user pay because “thinking” time can turn prefix-cache hits into misses?

表格
MetricClaudeCodexTotal
User-initiated steps with predecessor16,92717,03333,960
Observed append tokens1.19B1.15B2.34B
Append after retained cache541.9M721.7M1.26B
Append-token reduction648.1M (54.5%)423.9M (37.0%)1.07B (45.9%)
Observed total cost$22,654$17,777$40,431
Cost after retained cache$18,973$16,269$35,242
Total cost reduction$3,680 (16.2%)$1,508 (8.5%)$5,189 (12.8%)
Cost saved / reduced step avg$0.263$0.116$0.192
表 1Upper-bound append-token and cost savings from eliminating user-thinking-induced prefix-cache misses.

From this consumer perspective, if user-initiated steps retained their prefix cache across human thinking time, append-prefill would drop from 2.34B to 1.26B tokens in the merged trace: a 1.07B-token reduction, or 45.9% of all append tokens. Because the estimate only changes user-initiated steps, that reduction is 95.1% of observed user-initiated append tokens.

With pricing.json prices as of 2026-06, the estimated final cost falls from $40,431 to $35,242, saving $5,189 (12.8%) over priced rounds. The split is 648.1M fewer append tokens and $3,680 saved for Claude, and 423.9M fewer append tokens and $1,508 saved for Codex. Dollar totals price 99.1% of rounds; token reductions include all rounds.

参考
Experiment overview

This is an upper-bound savings estimate, not an observed cache metric. We analyze prefix caching from a consumer perspective: the user-visible cost of “thinking” is the extra fresh-input prefill billed when a user-initiated step resumes after the prefix cache has expired. For every user-initiated step S with a predecessor P in the same session, the estimate caps observed append at the step’s net context growth:

total_input(S)            = prefix_tokens(S) + newly_append_tokens(S)
context_growth(S)         = max(0, total_input(S) - total_input(P))
append_after_retained_cache(S)  = min(newly_append_tokens(S), context_growth(S))
prefix_after_retained_cache(S)  = total_input(S) - append_after_retained_cache(S)

All other steps keep their observed prefix_tokens / newly_append_tokens split. The total input length and output tokens do not change. Any append-token reduction is moved into prefix tokens and billed at the cache-read price; remaining Claude cache-creation tokens are billed at the 5-minute cache-write rate. Because the estimate assumes all shifted tokens can be served from cache at the cache-read rate, the resulting savings are an upper bound rather than an achievable policy guarantee.

Method and assumptions:

  • A user-initiated step is a round whose first timing event is user_message, matching the trigger convention used by cache_hit_ratio and redundant_prefill.
  • Pairing follows redundant_prefill / session/total_input_growth: the predecessor is the last round seen for the same session_id in round_pk file order. Session-first user steps have no predecessor and remain unchanged.
  • This isolates the consumer-side cost of human thinking time. Tool-result steps and session-first steps are left as observed.
  • Costs use artifacts/utils/pricing.json through artifacts/web_analytics/pricing.py: append at fresh-input/cache-write rates, prefix at the cache-read rate, output unchanged. Unpriced rounds contribute to token counts but are excluded from dollar totals.
Code structure
  • collect(con) streams rounds joined with each round’s first timing event, walks sessions in file order, and accumulates observed vs. retained-cache token/cost totals by scope.
  • ScopeAccum stores token totals, user-step coverage, priced-round coverage, and observed / retained-cache cost buckets. It also stores per-reduced-step cost-saved samples and preceding human idle gaps for avg / p50 / p90 rows.
  • write_summary_csv(...), render_md(...), and write_latex_table(...) emit the raw summary, web table, and optional paper table.
Running it
# default merged trace (materialized to a temp DuckDB cache on first use)
uv run python artifacts/prefix_cache/human_idle_cache_counterfactual/analyze.py

# a prebuilt DB, into a chosen dir
uv run python artifacts/prefix_cache/human_idle_cache_counterfactual/analyze.py \
  --db trace/syfi_coding_trace.duckdb -o /tmp/out
Outputs
  • human_idle_cache_counterfactual_summary.csv - raw observed vs. retained-cache token and cost totals per scope (merged, claude, codex).
  • human_idle_cache_counterfactual.md - GFM table for the web detail page.
  • human_idle_cache_counterfactual.tex - optional LaTeX table.
  • headline.json - headline values for the Overview gallery card.
SyFI TraceLab · 实验详情