Of the wall-clock time a coding agent consumes, how much is the human thinking, the LLM generating, and the tools executing — per session, per request, per step, and per individual latency?
The time-domain sibling of session_cost_distribution. Computes the data behind
tab:timing_distribution (src/04_SessionContext.tex): for each granularity and category, avg /
p50 / p90 / p99 per unit plus the category’s share of total time where a block has a meaningful
total (same Avg/P50/P90/P99 + % layout as the cost table). The category set differs by granularity
because human thinking is a between-request quantity:
- Per session —
Total elapsed(wall-clock first→last timing event), with session-totalHuman thinking,LLM generation, andTool executionshares. - Per session, human capped (1h) — the same session units, but each human idle gap is clamped to one hour before summing. This is the prompt-cache-TTL view used for cache-relevant time.
- Per request —
Total (response time)(turn e2e) =LLM generation+Tool execution+ possible overlap. No human term: human wait sits between requests, never inside one. - Per step —
LLM generationvsTool executiononly (one round has no human term, no e2e). - Per individual latency — the strictly-positive human-input waits, positive observable per-round generation spans, and positive per-tool effective latencies. These rows line up with the human-wait, generation-time, and tool-latency CDF/summary views.