Cost distribution · SyFI TraceLab

Session

one continuous trace of work, often spanning multiple requests or problems.

Request

one user input through the agent's final response.

Agent step

one model call inside a request.

User-initiated step

an agent step started by user input.

Tool-triggered step

an agent step started by a tool result.

Problem

What does a coding session / request / step cost, and where does the money go?

Computes the USD-cost distribution behind tab:cost_distribution (src/04_SessionContext.tex). For each granularity (per session, per request, per step) and each billed category, the paper table reports the cost as avg / p50 / p90 / p99 plus the category’s share of total spend (the script also prints the underlying token distributions, incl. p25, to stdout):

Append tokens — newly_append_tokens, billed at the fresh-input/cache-write rates.
Prefix tokens — prefix_tokens, billed at the cache-read rate.
Output tokens — output_tokens (reasoning included), billed at the output rate.
Total — the sum of the three.

Tables

Per session

Metric	Avg	P50	P90	P99	% cost
Total	$9.36	$0.59	$13.2	$172
Append tokens	$2.50	$0.21	$3.33	$40.2	26.7%
Prefix tokens	$5.77	$0.17	$7.55	$111	61.7%
Output tokens	$1.09	$0.16	$1.92	$16.9	11.6%

Per request

Metric	Avg	P50	P90	P99	% cost
Total	$0.97	$0.32	$2.35	$8.94
Append tokens	$0.25	$0.03	$0.68	$3.22	26.7%
Prefix tokens	$0.60	$0.16	$1.35	$6.51	61.7%
Output tokens	$0.11	$0.03	$0.29	$1.08	11.6%

Per step

Metric	Avg	P50	P90	P99	% cost
Total	$0.11	$0.07	$0.20	$0.67
Append tokens	$0.03	$0.00	$0.03	$0.62	26.7%
Prefix tokens	$0.07	$0.05	$0.12	$0.41	61.7%
Output tokens	$0.01	$0.01	$0.03	$0.12	11.6%

Table 1Per-session, per-request, and per-step cost (USD) by category; % cost is each category’s share of total spend.

For a coding agent the bill is dominated by re-reading context, not by generation (the paper’s tab:cost_distribution). Cached prefix tokens are 59.5% of total spend even though they are billed at roughly a tenth of the fresh-input rate — pure volume, since the accumulating context is replayed on every step — against 29.2% for append/new-input and only 11.2% for output. Output is cheap in aggregate despite its high per-token price because each step emits so few tokens. The absolute costs are modest at the median ($0.61/session, $0.33/request, $0.07/step) but carry a heavy tail: the average session is $9.70 and p99 reaches $178, a few very long sessions driving most of the spend. This inverts the usual intuition that generation is the expensive part.

Reference

Definitions

Cost uses the single-source price table artifacts/utils/pricing.json via web_analytics/pricing.py (price_for → per-model exact/family resolve; round_cost → append at input/cache-write rates, prefix at cache-read rate, output at output rate — the same billing the web dashboard uses). Rounds whose model has no price are unpriced and excluded; 99.1% of rounds are priced (the rest are codex:codex-auto-review / null-model rows). Coverage is printed.
Request — one user turn, via the same turn state machine as human_in_the_loop/user_turn_decomposition (39,202 turns, matching user_turn_response_time and session_internal_counts). Step — one LLM round. Session — one session_id.

Running it

uv run python artifacts/session/session_cost_distribution/analyze.py -i trace/syfi_coding_trace.jsonl
uv run python artifacts/session/session_cost_distribution/analyze.py            # default merged trace

Outputs

session_cost_distribution.tex — the merged single-column cost table (Avg / P50 / P90 / P99
- % cost) for the paper.
session_cost_distribution.md — GFM Markdown mirror of the table, rendered on the web detail page.
headline.json — the few headline numbers for the Overview gallery card.
stdout — merged + per-provider (Claude / Codex) token and cost percentiles, plus the append / prefix / output cost composition.

Headline numbers (public trace, list prices as of 2026-06)

Cost composition: prefix/cached 59.5%, append/new-input 29.2%, output 11.2%. Cached input dominates spend despite the ~10× cache-read discount, purely on volume.
Avg cost: $9.70 / session, $1.01 / request, $0.11 / step; medians are far lower ($0.61 / $0.33 / $0.074) with a heavy session tail (p99 = $178).

No figures.