Across all tool calls, which tool kinds account for the most total effective time — separately for Claude Code and Codex?
Every agent step in the trace carries a tools[] list of tool calls, each with a measured latency.
This experiment attributes that latency to tool kinds and asks where the aggregate tool-execution
time goes, rendering one horizontal-bar panel per provider with tools ordered by summed latency
(each bar annotated with its call count n).
Method and assumptions:
- Effective tool latency =
tool_internal_latency_msif present, elsetool_wall_latency_ms(the sharedtrace_db.EFFECTIVE_TOOL_LATENCY_MS_SQLprecedence; the legacylatency_msfield is not in the normalized data). Only strictly positive latencies contribute to a tool’s summed/averaged time; calls with no effective latency are counted asmissing_latency_calls. - Additive over calls. Latency is summed per tool kind. Parallel tool calls are not merged into elapsed wall-clock time, so this measures attributed work, not end-to-end session time.
- One row per call. We aggregate entries in
tool_calls(the UNNESTedtools[]), not agent steps. - MCP tools are merged (figure only). Any tool whose name starts with
mcp_is aliased to a singlemcpbucket; the long server-qualified names are individually rare. The CSV keeps the raw unaliased names. - Rare tools collapse (figure only). Tools with fewer than
--min-tool-calls-for-plotprovider-local calls (default 20) are summed into oneOther (<N calls/tool)bar. The CSV keeps full per-tool detail — nothing is dropped from the data, only from the plot. - Exact, not sampled. Sums, counts and averages are computed over all tool calls in SQL (the old per-tool reservoir sampler is gone), so the totals are exact.