redline/SKILL.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407

---
name: redline
description: Run a logic-preserving wallclock optimization loop for a named basket of invocations. Use when Codex needs to make a command, workload, or benchmark basket faster by locking measurement, collecting hardware counters, justifying each bottleneck thesis with hard numbers, and iterating until blocked or stopped.
---

# Redline

Use this skill to reduce wallclock for a concrete basket of invocations without changing logic.

This skill is measurement-first. It is not a generic cleanup pass, and it is not a speculative "maybe this is faster" pass.

Language-agnostic, but code-centric.

## Contract

- Require a concrete basket of invocations.
- Require a semantic guard. If the user did not provide one, infer the narrowest credible guard from tests, golden outputs, invariants, or diffable results.
- Default basket weights to `1` unless the user specifies otherwise.
- Create one persistent `/tmp` worklog before the first serious measurement. Chat is summary only; the worklog is the durable source of truth.
- Lock the measurement surface before optimization: commands, inputs, cwd, env, build profile, thread settings, and any relevant runtime knobs.
- Do not begin optimization until the basket and semantic guard are explicit in the worklog.
- Attempt to obtain hardware-countered measurements early. If they are unavailable, record the exact blocker and treat the run as degraded mode.
- Do not run a long blind optimization campaign without either hardware counters or a clear recorded reason why they are unavailable.
- Measure the baseline repeatedly enough to bound noise before making changes.
- Use the current best measured line as the champion. Do not compare against a drifting or remembered baseline.
- Every bottleneck thesis must be justified by hard numbers: timing plus profiling and/or counters.
- Profile before speculative rewrites.
- Work one measured optimization line at a time unless a bundled change is strictly inseparable.
- Re-measure every challenger against the same basket and the same guard.
- Keep only logic-preserving wins and real enablers.
- Do not stop because you found one plausible speedup. Keep pushing until stopped, blocked, or genuinely out of credible next theses.

## Optimization Law

The objective is lower wallclock for the locked basket.

The default scalar decision metric is:

```text
weighted_total_median_ms = sum(weight_i * median_ms_i)
```

Per-item medians still matter. Do not accept a "win" that quietly trashes one basket item unless the user explicitly allows that tradeoff.

A challenger becomes the new champion only if:

- the semantic guard passes
- the wallclock win is credible against observed noise
- the tradeoff pattern is acceptable for the basket
- the thesis is supported by the measured evidence, not just the result

## Flow

### 0. Create the worklog

Create a path shaped like:

```text
/tmp/redline-<repo-or-dir>-<basket-slug>.md
```

Seed it with the embedded worklog skeleton from `Embedded Forms`.

### 1. Lock the basket

Write the exact basket first.

For each basket item, record:

- command
- cwd
- relevant env
- input or scenario
- build profile
- weight
- semantic guard target

Do not optimize an implied basket.

### 2. Lock the measurement harness

Choose the actual measurement tools and parameters before chasing speed.

Prefer:

- repeated wallclock sampling with `hyperfine` or an equivalent harness
- hardware counters with `perf stat` or platform equivalent
- sampled profiling with `perf record`, flamegraph tooling, `pprof`, or equivalent

Record:

- warmup policy
- sample count
- timing metric
- spread metric
- counter set
- profiler choice

If the target is a release workload, do not benchmark debug builds.

### 3. Establish the baseline champion

Measure the untouched line repeatedly enough to estimate noise.

Minimum default:

- at least 1 warmup run per basket item
- at least 9 timed runs per basket item for champion decisions

Record per basket item:

- median wallclock
- spread or noise estimate
- hardware counters if available
- semantic-guard result

If noise is too high to trust decisions, stabilize the harness before optimizing further.

### 4. Profile the champion

Profile the current champion on the basket items that dominate total cost.

Do not start with "obvious" micro-optimizations.
Find where the time, cycles, cache misses, allocations, syscalls, or contention actually live.

The first real bottleneck thesis should come from this data.

### 5. Write the bottleneck thesis

Every optimization line needs a thesis before code changes.

A thesis must name:

- the hotspot or bottleneck site
- the measured evidence
- why this should move wallclock
- which basket items should improve
- which counters or profile signatures should change if the thesis is right

Bad theses:

- "this code looks slow"
- "this alloc seems unnecessary"
- "we should probably inline this"

Good theses are anchored to numbers.

### 6. Execute one optimization line

Make one coherent optimization move.

Typical move classes:

- remove repeated work
- improve data layout or locality
- cut allocations or copies
- reduce parsing or formatting overhead
- reduce locking or contention
- reduce syscall or I/O frequency
- improve algorithmic complexity
- hoist invariant work
- batch expensive operations
- exploit cheaper library or runtime primitives
- remove abstraction overhead when it is actually measured

Do not mix semantic redesign into a speed pass.
If a large architectural move is needed, record that explicitly.

### 7. Re-measure the challenger

Run the same basket, the same harness, and the same semantic guard.

Record:

- challenger timings
- challenger counters
- challenger profile notes if needed
- pass/fail against the thesis

Compare challenger against the current champion, not against memory.

### 8. Decide

Choose exactly one:

- `promote`
- `enabler`
- `reject`
- `rework`
- `blocked`

Use:

- `promote` when the challenger is a credible logic-preserving improvement on the current basket
- `enabler` when the challenger is neutral or near-neutral now but materially unlocks stronger follow-on optimization work
- `reject` when the change loses, is noisy, or fails the guard
- `rework` when the line is promising but the implementation or measurement is not yet trustworthy
- `blocked` when an external constraint prevents meaningful progress

If rejected, revert or otherwise unwind the losing line before starting the next serious experiment.

### 9. Repeat

After every decision:

- update the champion snapshot
- update the candidate queue
- choose the next strongest measured thesis
- keep going

If progress stalls, return to profiling and widen the search rather than polishing a dead line.

## Measurement Discipline

- Use one stable basket definition for the whole run unless the user explicitly changes the objective.
- Keep build profile, env, and thread settings fixed unless changing them is the point of the experiment.
- Do not trust one-shot timings.
- Prefer medians over means for decision-making.
- Record a noise or spread figure for every serious comparison.
- Attempt hardware counters early rather than as an afterthought.
- Prefer the same counter set across champion/challenger comparisons.
- If counters are unavailable because of permissions, virtualization, unsupported PMU access, or host policy, record the exact failure in the worklog.
- Do not cite counters you did not actually collect.
- Do not cite profiler output without cashing it out into a bottleneck thesis.

A bottleneck thesis must cite at least:

- one timing signal
- one profile or counter signal

## Semantic Discipline

Logic preservation is mandatory.

Use the narrowest credible semantic guard that actually protects the behavior under optimization.

Examples:

- targeted tests
- golden-output diffs
- checksum or snapshot comparisons
- invariant checks
- exact response comparisons
- benchmark harnesses with correctness validation

Do not smuggle behavior changes in as performance work.

## Embedded Forms

### Worklog Skeleton

```text
worklog_path:
objective:
scope_root:
mode: audit_plus_refactor
degraded_mode_reason:

measurement_harness:
basket_spec:
semantic_guard:
baseline_champion:
counter_status:
profile_status:
candidate_queue:
experiment_log:
residual_risks:
```

Rules:

- create this file before the first serious measurement
- update it after every real experiment
- treat it as the durable source of truth for the run
- report the final worklog path in the user-facing response

### Basket Spec

```text
| item_id | command | cwd | env_summary | input_or_scenario | build_profile | weight | semantic_guard |
|---------|---------|-----|-------------|-------------------|---------------|--------|----------------|
| B01 | cargo run --release -- solve case1.lp | . | RAYON_NUM_THREADS=1 | case1.lp | release | 1 | compare objective and solution digest |
| B02 | cargo run --release -- solve case2.lp | . | RAYON_NUM_THREADS=1 | case2.lp | release | 2 | compare objective and solution digest |
```

Rules:

- every serious measurement must use this locked basket
- weights default to `1` if unspecified
- if the basket changes, record it explicitly as a new objective surface

### Champion Snapshot

```text
champion_id:
commit_or_worktree_state:
decision_metric: weighted_total_median_ms

per_item:
- item_id: B01
  median_ms:
  spread_pct:
  counters:
    cycles:
    instructions:
    branches:
    branch_misses:
    cache_misses:
  guard: pass
- item_id: B02
  median_ms:
  spread_pct:
  counters:
    cycles:
    instructions:
    branches:
    branch_misses:
    cache_misses:
  guard: pass

weighted_total_median_ms:
notes:
```

Rules:

- keep exactly one current champion
- update this after every promoted line
- if counters are unavailable, say so explicitly instead of leaving them blank without explanation

### Bottleneck Thesis

```text
thesis_id:
hotspot:
affected_basket_items:
timing_evidence:
profile_or_counter_evidence:
expected_change:
planned_move:
risk_to_semantics:
```

Rules:

- do not start a serious optimization line without this
- the evidence must be measured, not aesthetic
- expected change should name what should get better in timing or counters

### Experiment Record

```text
experiment_id:
thesis_id:
change_summary:
semantic_guard_result:
challenger_weighted_total_median_ms:
per_item_deltas:
counter_deltas:
decision: promote | enabler | reject | rework | blocked
decision_reason:
follow_on:
```

Rules:

- one serious optimization line per record
- rejected lines must still be recorded
- if the result is noisy or surprising, say so plainly
- do not mark a change as `enabler` without naming the specific follow-on optimization it unlocks
- do not accumulate long chains of `enabler` changes without cashing them out into measured wins

## Final Response

Always include:

- the `/tmp` worklog path
- locked basket summary
- semantic-guard summary
- champion metric summary
- hardware-counter availability summary
- strongest accepted lines
- strongest rejected or blocked lines
- residual next moves

If you edited code, also include:

- the promoted optimization lines
- verification summary
- any tradeoffs accepted inside the basket

## Hard Failure Modes

- do not optimize before locking the basket
- do not optimize before locking the semantic guard
- do not turn the run into speculative cleanup
- do not trust one-shot timings
- do not compare against a drifting baseline
- do not mix build profiles between champion and challenger
- do not accept "probably faster"
- do not keep a change that fails the semantic guard
- do not make a bottleneck thesis without hard numbers
- do not cite profiler screenshots or flamegraphs as if they were the win itself
- do not run a long campaign without attempting hardware counters
- do not keep multiple half-measured changes live at once
- do not mark a change as `enabler` without a concrete unlocked next move
- do not stop after one plausible win if credible next theses remain