Between finishing a response and the human’s next message, how long does the agent sit idle — and where does total idle time accumulate?
Human input wait is the gap, within one session, from the model’s previous output to the
next user message. It is computed by a stateful single-pass walk over agent steps in ingestion order
(round_pk == file order), keeping last_model_output_by_session: {session_id -> datetime}. For
each step, in order:
start= the request-start user-message timestamp for the step: among the step’stiming_events, take the earliest model-output (reasoning/text/tool_call) timestamp asfirst_output, keep theuser_messagetimestamps at-or-beforefirst_output, and take the latest such candidate (None if there is no user message or no output, or none qualifies).- If
startis not None and the step has a non-empty stringsession_id, and that session already has a recorded previous model-output timestampprev, the wait is(start − prev).total_seconds(); when strictly positive it is appended to the"all"list and to that step’s provider bucket. - The session’s
last_model_output_by_session[session_id]is then updated with this step’s last model-output timestamp (the latestreasoning/text/tool_calltimestamp), when present.
This is a trace-level estimate, not a serving-engine timer; it reflects only recorded events. The wait spans the human think/read time between requests and excludes the model’s own generation.
The experiment renders the wait distribution three ways, with the x-axis on a log duration scale and the count/total panels capped at 1h (a 5-minute reference line marks a plausible cache-eviction horizon):
- a single-axis wait CDF overlaying
alland each provider; - a per-provider count CDF — fraction of waits
≤ T; - a per-provider total CDF — share of summed idle time from waits
≤ T.
Method and assumptions:
- Exact, not sampled. Every positive wait contributes one value to its provider’s list (and to
all); the CDFs, percentiles, and summed-time bins run over the full set. The old loader already kept every wait here — there was never a reservoir cap on this metric — so the migration is value-for-value identical. - File-order state. The walk is over
round_pk(ingestion ordinal == file order), reproducing the line-order tie-break the old single-pass JSONL loader relied on for its session state. - Provider grouping mirrors the old loader’s
str(provider) or "<unknown-provider>"fallback, so a missing/empty provider falls into<unknown-provider>. - Engine-independent timestamps. Timestamps are read from the DB as integer epoch-microseconds
(
CAST(epoch_us(timestamp) AS BIGINT)) and rebuilt to naive datetimes in Python, never fetched as a rawTIMESTAMP(native duckdb marshals that to adatetime, duckdb-wasm to a string). A difference between two same-timezone datetimes equals the naive-microsecond difference exactly, so the waits match the pre-DuckDB result bit-for-bit.