How long do tool calls take, per tool and per provider, and where does the long tail of tool latency live — for Claude Code vs Codex?
Every agent step carries a tools[] list of tool calls, each with a measured latency. This experiment
profiles that per-call latency four ways: a per-tool/per-provider box-and-whisker view, a
count-vs-latency mass breakdown over coarse latency bins, and two cumulative CDFs (by call count and
by summed latency) over the per-call latency threshold.
Method and assumptions:
- Effective tool latency =
tool_internal_latency_msif present, elsetool_wall_latency_ms(=result_at − emitted_at; the sharedtrace_db.EFFECTIVE_TOOL_LATENCY_MS_SQLprecedence — the legacylatency_msfield is not in the normalized data). Internal timing is the runner-reported duration (Codex wrapperWall time, ClaudedurationMs). - Only strictly-positive latencies feed the distribution. A call with no effective latency is
counted as
missing_latency; a call with a non-positive effective latency isnonpositive_latency— neither enters the boxplots, bins, percentiles, or CDFs (matching the oldToolStats). - MCP tools are merged (figure only). Any tool whose name starts with
mcp_is aliased to a singlemcpbucket; the long server-qualified names are individually rare. CSV summaries keep the raw, unaliased names. - Rare tools collapse (figure only). Tools with fewer than
--min-tool-calls-for-plotprovider-local calls (default 20) fold into oneOther (<N calls/tool)box. CSV summaries keep full per-tool detail. - CDFs are additive over calls. Latency totals sum per call — parallel tools are not collapsed into wall-clock time, so the total-latency CDF measures attributed work, not elapsed session time.
- Exact, not sampled. Box quartiles, whiskers, percentiles and CDFs are computed over every
positive effective latency pulled from SQL. The old per-tool 50k reservoir sampler is gone, so the
summary CSV reports
sampled=Falseandsample_count= the fulllatency_countfor every tool (previously the two highest-volume tools,exec_commandandBash, were reservoir-sampled).