--- name: redline description: Run a logic-preserving wallclock optimization loop for a named basket of invocations. Use when Codex needs to make a command, workload, or benchmark basket faster by locking measurement, collecting hardware counters, justifying each bottleneck thesis with hard numbers, and iterating until blocked or stopped. --- # Redline Use this skill to reduce wallclock for a concrete basket of invocations without changing logic. This skill is measurement-first. It is not a generic cleanup pass, and it is not a speculative "maybe this is faster" pass. Language-agnostic, but code-centric. ## Contract - Require a concrete basket of invocations. - Require a semantic guard. If the user did not provide one, infer the narrowest credible guard from tests, golden outputs, invariants, or diffable results. - Default basket weights to `1` unless the user specifies otherwise. - Create one persistent `/tmp` worklog before the first serious measurement. Chat is summary only; the worklog is the durable source of truth. - Lock the measurement surface before optimization: commands, inputs, cwd, env, build profile, thread settings, and any relevant runtime knobs. - Do not begin optimization until the basket and semantic guard are explicit in the worklog. - Attempt to obtain hardware-countered measurements early. If they are unavailable, record the exact blocker and treat the run as degraded mode. - Do not run a long blind optimization campaign without either hardware counters or a clear recorded reason why they are unavailable. - Measure the baseline repeatedly enough to bound noise before making changes. - Use the current best measured line as the champion. Do not compare against a drifting or remembered baseline. - Every bottleneck thesis must be justified by hard numbers: timing plus profiling and/or counters. - Profile before speculative rewrites. - Work one measured optimization line at a time unless a bundled change is strictly inseparable. - Re-measure every challenger against the same basket and the same guard. - Keep only logic-preserving wins and real enablers. - Do not stop because you found one plausible speedup. Keep pushing until stopped, blocked, or genuinely out of credible next theses. ## Optimization Law The objective is lower wallclock for the locked basket. The default scalar decision metric is: ```text weighted_total_median_ms = sum(weight_i * median_ms_i) ``` Per-item medians still matter. Do not accept a "win" that quietly trashes one basket item unless the user explicitly allows that tradeoff. A challenger becomes the new champion only if: - the semantic guard passes - the wallclock win is credible against observed noise - the tradeoff pattern is acceptable for the basket - the thesis is supported by the measured evidence, not just the result ## Flow ### 0. Create the worklog Create a path shaped like: ```text /tmp/redline--.md ``` Seed it with the embedded worklog skeleton from `Embedded Forms`. ### 1. Lock the basket Write the exact basket first. For each basket item, record: - command - cwd - relevant env - input or scenario - build profile - weight - semantic guard target Do not optimize an implied basket. ### 2. Lock the measurement harness Choose the actual measurement tools and parameters before chasing speed. Prefer: - repeated wallclock sampling with `hyperfine` or an equivalent harness - hardware counters with `perf stat` or platform equivalent - sampled profiling with `perf record`, flamegraph tooling, `pprof`, or equivalent Record: - warmup policy - sample count - timing metric - spread metric - counter set - profiler choice If the target is a release workload, do not benchmark debug builds. ### 3. Establish the baseline champion Measure the untouched line repeatedly enough to estimate noise. Minimum default: - at least 1 warmup run per basket item - at least 9 timed runs per basket item for champion decisions Record per basket item: - median wallclock - spread or noise estimate - hardware counters if available - semantic-guard result If noise is too high to trust decisions, stabilize the harness before optimizing further. ### 4. Profile the champion Profile the current champion on the basket items that dominate total cost. Do not start with "obvious" micro-optimizations. Find where the time, cycles, cache misses, allocations, syscalls, or contention actually live. The first real bottleneck thesis should come from this data. ### 5. Write the bottleneck thesis Every optimization line needs a thesis before code changes. A thesis must name: - the hotspot or bottleneck site - the measured evidence - why this should move wallclock - which basket items should improve - which counters or profile signatures should change if the thesis is right Bad theses: - "this code looks slow" - "this alloc seems unnecessary" - "we should probably inline this" Good theses are anchored to numbers. ### 6. Execute one optimization line Make one coherent optimization move. Typical move classes: - remove repeated work - improve data layout or locality - cut allocations or copies - reduce parsing or formatting overhead - reduce locking or contention - reduce syscall or I/O frequency - improve algorithmic complexity - hoist invariant work - batch expensive operations - exploit cheaper library or runtime primitives - remove abstraction overhead when it is actually measured Do not mix semantic redesign into a speed pass. If a large architectural move is needed, record that explicitly. ### 7. Re-measure the challenger Run the same basket, the same harness, and the same semantic guard. Record: - challenger timings - challenger counters - challenger profile notes if needed - pass/fail against the thesis Compare challenger against the current champion, not against memory. ### 8. Decide Choose exactly one: - `promote` - `enabler` - `reject` - `rework` - `blocked` Use: - `promote` when the challenger is a credible logic-preserving improvement on the current basket - `enabler` when the challenger is neutral or near-neutral now but materially unlocks stronger follow-on optimization work - `reject` when the change loses, is noisy, or fails the guard - `rework` when the line is promising but the implementation or measurement is not yet trustworthy - `blocked` when an external constraint prevents meaningful progress If rejected, revert or otherwise unwind the losing line before starting the next serious experiment. ### 9. Repeat After every decision: - update the champion snapshot - update the candidate queue - choose the next strongest measured thesis - keep going If progress stalls, return to profiling and widen the search rather than polishing a dead line. ## Measurement Discipline - Use one stable basket definition for the whole run unless the user explicitly changes the objective. - Keep build profile, env, and thread settings fixed unless changing them is the point of the experiment. - Do not trust one-shot timings. - Prefer medians over means for decision-making. - Record a noise or spread figure for every serious comparison. - Attempt hardware counters early rather than as an afterthought. - Prefer the same counter set across champion/challenger comparisons. - If counters are unavailable because of permissions, virtualization, unsupported PMU access, or host policy, record the exact failure in the worklog. - Do not cite counters you did not actually collect. - Do not cite profiler output without cashing it out into a bottleneck thesis. A bottleneck thesis must cite at least: - one timing signal - one profile or counter signal ## Semantic Discipline Logic preservation is mandatory. Use the narrowest credible semantic guard that actually protects the behavior under optimization. Examples: - targeted tests - golden-output diffs - checksum or snapshot comparisons - invariant checks - exact response comparisons - benchmark harnesses with correctness validation Do not smuggle behavior changes in as performance work. ## Embedded Forms ### Worklog Skeleton ```text worklog_path: objective: scope_root: mode: audit_plus_refactor degraded_mode_reason: measurement_harness: basket_spec: semantic_guard: baseline_champion: counter_status: profile_status: candidate_queue: experiment_log: residual_risks: ``` Rules: - create this file before the first serious measurement - update it after every real experiment - treat it as the durable source of truth for the run - report the final worklog path in the user-facing response ### Basket Spec ```text | item_id | command | cwd | env_summary | input_or_scenario | build_profile | weight | semantic_guard | |---------|---------|-----|-------------|-------------------|---------------|--------|----------------| | B01 | cargo run --release -- solve case1.lp | . | RAYON_NUM_THREADS=1 | case1.lp | release | 1 | compare objective and solution digest | | B02 | cargo run --release -- solve case2.lp | . | RAYON_NUM_THREADS=1 | case2.lp | release | 2 | compare objective and solution digest | ``` Rules: - every serious measurement must use this locked basket - weights default to `1` if unspecified - if the basket changes, record it explicitly as a new objective surface ### Champion Snapshot ```text champion_id: commit_or_worktree_state: decision_metric: weighted_total_median_ms per_item: - item_id: B01 median_ms: spread_pct: counters: cycles: instructions: branches: branch_misses: cache_misses: guard: pass - item_id: B02 median_ms: spread_pct: counters: cycles: instructions: branches: branch_misses: cache_misses: guard: pass weighted_total_median_ms: notes: ``` Rules: - keep exactly one current champion - update this after every promoted line - if counters are unavailable, say so explicitly instead of leaving them blank without explanation ### Bottleneck Thesis ```text thesis_id: hotspot: affected_basket_items: timing_evidence: profile_or_counter_evidence: expected_change: planned_move: risk_to_semantics: ``` Rules: - do not start a serious optimization line without this - the evidence must be measured, not aesthetic - expected change should name what should get better in timing or counters ### Experiment Record ```text experiment_id: thesis_id: change_summary: semantic_guard_result: challenger_weighted_total_median_ms: per_item_deltas: counter_deltas: decision: promote | enabler | reject | rework | blocked decision_reason: follow_on: ``` Rules: - one serious optimization line per record - rejected lines must still be recorded - if the result is noisy or surprising, say so plainly - do not mark a change as `enabler` without naming the specific follow-on optimization it unlocks - do not accumulate long chains of `enabler` changes without cashing them out into measured wins ## Final Response Always include: - the `/tmp` worklog path - locked basket summary - semantic-guard summary - champion metric summary - hardware-counter availability summary - strongest accepted lines - strongest rejected or blocked lines - residual next moves If you edited code, also include: - the promoted optimization lines - verification summary - any tradeoffs accepted inside the basket ## Hard Failure Modes - do not optimize before locking the basket - do not optimize before locking the semantic guard - do not turn the run into speculative cleanup - do not trust one-shot timings - do not compare against a drifting baseline - do not mix build profiles between champion and challenger - do not accept "probably faster" - do not keep a change that fails the semantic guard - do not make a bottleneck thesis without hard numbers - do not cite profiler screenshots or flamegraphs as if they were the win itself - do not run a long campaign without attempting hardware counters - do not keep multiple half-measured changes live at once - do not mark a change as `enabler` without a concrete unlocked next move - do not stop after one plausible win if credible next theses remain