When tools are folded into a handful of coarse categories (execute, file write/edit, file read/search, agent/task, web/lookup, …), how are calls and latency distributed across those categories — and how concentrated is the latency long tail in a few slow calls?
Individual tool names are numerous and provider-specific; this experiment groups them into coarse categories that mean the same thing across Claude Code and Codex, then reports how calls and effective latency split across those categories.
Method and assumptions:
- One row per call. We count entries in
tool_calls(the UNNESTedtools[]), not agent steps. - Two fixed tool→category maps. A 5-category-plus-
othermap (Execute command,File write/edit,File read/search,Agent/task,Web/remote/lookup,Other) drives the count ring and latency bar; a 7-bucket presentation map (which additionally splits outPlanning) drives the dashboard. Both maps are explicit name→category sets ported verbatim — thetool_category_tool_map.csvemits the realized(category, provider, tool)breakdown for auditing. - Effective tool latency =
tool_internal_latency_msif present, elsetool_wall_latency_ms(the legacylatency_msfallback is not in the normalized schema). Only positive latencies contribute to summed latency and to the percentile/long-tail views; missing and non-positive latencies are counted separately but excluded from the sums. - Long-tail bins. Positive latencies are bucketed into
<1s,1–10s,10s–1m,>1mto contrast each bucket’s share of calls against its share of total latency.