SyFI TraceLab

Tool-triggered steps that grew 99.4%

Session internal counts

Requests, steps, and tool calls per session, request, and step.

Tool-triggered steps that shrank 0.5%

Avg growth 1,820 tok

Sessions w/ ≥1 compaction 8.8%

Total input growth

Net context growth after tool-triggered agent steps.

Tool-initiated 79.2%

Avg / active session 3.5

Prefix share of cost 58.6%

Context compactions

How often a session summarizes and drops its context near the limit.

Avg cost / session $11.8

Append / output share 30% / 11%

Avg response / request 3.7m

Cost distribution

USD per session, request, and step — and where the money goes.

Human thinking 91.6%

Request: gen / tool 38% / 61%

Timing distribution

Human thinking vs LLM generation vs tool execution across the wall clock.

Session token steps

How context grows across a single session.

Human input wait

How long the agent waits on a human.

Human wait count CDF

How quickly human-response waits resolve by provider.

Median prefix tokens 141K / 120K

Human wait total-time CDF

Where the summed human idle time accumulates.

Type

LLM generation

Token composition, output length, output attribution, and end-to-end generation timing.

Median append tokens 992 / 1K

Median output tokens 354 / 188

Token length distribution

Prefix, append, and output token lengths per step, by provider.

Median append, <1k prefix 96.1K

Prefix vs append token composition

Cached prefix against freshly appended input.

Median append, >64k prefix 1.0K

Busiest prefix bin 128-256k (158,520 steps)

Append by prefix bin

How append length collapses as the cached prefix fills.

Append token mass bins

Short agent steps by count, large agent steps by appended-token mass.

Output token distribution

How long the agents' completions run.

Output attribution

Two ways a prior step’s output is accounted in the next step.

Previous output placement

Whether a long response returns as fresh append or cached-prefix growth.

Context vs decode speed

Observed LLM timing against total input context length.

Prefix / append CDF

Median and tail length for cached prefix and fresh append.

Adjusted append scatter

Fresh context after subtracting replayed prior output.

Token spindles

Prefix, adjusted append, and output distributions on one axis.

Generation-time CDF

Wall-clock time to produce a full response.

Generation total-time CDF

Where summed model-generation time accumulates.

Type

Tool calls

How agents choose tools, how often they call them, how long those calls take, and their overhead.

Tool call counts by tool

Which tools the agents use the most.

Tool latency mass bins

Fast-call counts versus where aggregate tool time accumulates.

Residual (E2E − internal) 129.7h (22%)

Tool latency distribution

Per-tool latency spread for the most-used tools.

Top tool residual 108.7h

Median residual 0.16s

Codex tool overhead

Codex tool end-to-end time versus internal execution time.

Tool category distribution

How tool calls and latency split across coarse categories.

Total tool time by kind

Which tool kinds account for the most attributed work.

Tool latency CDF by provider

Per-call tool latency, split by provider.

Tool total-latency CDF

Summed tool latency by threshold, split by provider.

Bash command breakdown

Which executables shell commands run, and how long each takes.

Type

Prefix cache

Cache reuse, idle-gap eviction, redundant prefill, and the share of context kept active.

Cache hit ratio

How much input is served from the prefix cache.

Cache hit after human waits

Prefix-cache hit rate against the preceding human idle gap.

Prefill amplification 6.3x

Cache hit after tool waits

Prefix-cache hit rate after tool-triggered waits.

Fresh % of append 16.0%

Tool-result fresh % 33.9%

Redundant prefill

How much prefilled context is genuinely fresh versus replayed.

Append reduction 2.70B

Cost reduction $14,333

Final cost saved 15.8%

Cost of human thinking time

Upper-bound savings if user-initiated steps kept their prefix cache.

Eviction trade-off

Cache hit rate versus storage as the eviction timeout grows.