Within a single coding session, how does the total input length (prefix + append) move from one
agent step to the next — and when it shrinks, is that accounting jitter or a real context
compaction?
Each row in the trace is one agent step. Walking the steps of a session in order, this
experiment records the per-step change in total input length (prefix_tokens + newly_append_tokens) — a positive delta is the window growing, a negative delta is it shrinking —
and classifies every drop into one of three buckets.
Method and assumptions:
- Total input for a step is
prefix_tokens + newly_append_tokens(cached prefix plus freshly appended input). The per-step metric is the signed delta of that quantity from the previous step seen in the same session. - Pairing. A growth event is emitted only when the current step’s first timing event is a
visible input event — a
user_messageor atool_result(the step’s trigger) — and the session has been seen before. Thepreviousstep is whatever step was last observed for that session in trace order, regardless of its trigger. Steps are ordered within a session by ingestion order (round_pk= file order), the same line-order sequencing the pre-DuckDB scan used. - Reduction buckets (thresholds from
artifacts/utils/growth.py, overridable on the CLI):- micro-reduction — drop
≤ 1024tokens (accounting jitter); - major-compact — drop
≥ 50000tokens (a real context compaction); - ordinary reduction — anything between the two.
- micro-reduction — drop
- Triggers reported. Summary rows are cut three ways —
all,user, andtool_result— by the current step’s trigger, and per scope (mergedplus each provider). - Shares the growth helpers (
build_growth_stats,reduction_bucket, the CSV writers) with thetrace_factsoverview summaries.