swarm repositories / source
aboutsummaryrefslogtreecommitdiff
path: root/redline/SKILL.md
diff options
context:
space:
mode:
authormain <main@swarm.moe>2026-04-03 23:43:03 -0400
committermain <main@swarm.moe>2026-04-03 23:43:03 -0400
commit4633c44d8018bdfe83883aff3b4ebfd984cbdc35 (patch)
tree323ae841a5d9ec20526952e5123c99a952319215 /redline/SKILL.md
parent6600904bcc9494a1c381d679ecdee74a864189c6 (diff)
downloadskills-main.zip
Add redline skillHEADmain
Diffstat (limited to 'redline/SKILL.md')
-rw-r--r--redline/SKILL.md407
1 files changed, 407 insertions, 0 deletions
diff --git a/redline/SKILL.md b/redline/SKILL.md
new file mode 100644
index 0000000..7662365
--- /dev/null
+++ b/redline/SKILL.md
@@ -0,0 +1,407 @@
+---
+name: redline
+description: Run a logic-preserving wallclock optimization loop for a named basket of invocations. Use when Codex needs to make a command, workload, or benchmark basket faster by locking measurement, collecting hardware counters, justifying each bottleneck thesis with hard numbers, and iterating until blocked or stopped.
+---
+
+# Redline
+
+Use this skill to reduce wallclock for a concrete basket of invocations without changing logic.
+
+This skill is measurement-first. It is not a generic cleanup pass, and it is not a speculative "maybe this is faster" pass.
+
+Language-agnostic, but code-centric.
+
+## Contract
+
+- Require a concrete basket of invocations.
+- Require a semantic guard. If the user did not provide one, infer the narrowest credible guard from tests, golden outputs, invariants, or diffable results.
+- Default basket weights to `1` unless the user specifies otherwise.
+- Create one persistent `/tmp` worklog before the first serious measurement. Chat is summary only; the worklog is the durable source of truth.
+- Lock the measurement surface before optimization: commands, inputs, cwd, env, build profile, thread settings, and any relevant runtime knobs.
+- Do not begin optimization until the basket and semantic guard are explicit in the worklog.
+- Attempt to obtain hardware-countered measurements early. If they are unavailable, record the exact blocker and treat the run as degraded mode.
+- Do not run a long blind optimization campaign without either hardware counters or a clear recorded reason why they are unavailable.
+- Measure the baseline repeatedly enough to bound noise before making changes.
+- Use the current best measured line as the champion. Do not compare against a drifting or remembered baseline.
+- Every bottleneck thesis must be justified by hard numbers: timing plus profiling and/or counters.
+- Profile before speculative rewrites.
+- Work one measured optimization line at a time unless a bundled change is strictly inseparable.
+- Re-measure every challenger against the same basket and the same guard.
+- Keep only logic-preserving wins and real enablers.
+- Do not stop because you found one plausible speedup. Keep pushing until stopped, blocked, or genuinely out of credible next theses.
+
+## Optimization Law
+
+The objective is lower wallclock for the locked basket.
+
+The default scalar decision metric is:
+
+```text
+weighted_total_median_ms = sum(weight_i * median_ms_i)
+```
+
+Per-item medians still matter. Do not accept a "win" that quietly trashes one basket item unless the user explicitly allows that tradeoff.
+
+A challenger becomes the new champion only if:
+
+- the semantic guard passes
+- the wallclock win is credible against observed noise
+- the tradeoff pattern is acceptable for the basket
+- the thesis is supported by the measured evidence, not just the result
+
+## Flow
+
+### 0. Create the worklog
+
+Create a path shaped like:
+
+```text
+/tmp/redline-<repo-or-dir>-<basket-slug>.md
+```
+
+Seed it with the embedded worklog skeleton from `Embedded Forms`.
+
+### 1. Lock the basket
+
+Write the exact basket first.
+
+For each basket item, record:
+
+- command
+- cwd
+- relevant env
+- input or scenario
+- build profile
+- weight
+- semantic guard target
+
+Do not optimize an implied basket.
+
+### 2. Lock the measurement harness
+
+Choose the actual measurement tools and parameters before chasing speed.
+
+Prefer:
+
+- repeated wallclock sampling with `hyperfine` or an equivalent harness
+- hardware counters with `perf stat` or platform equivalent
+- sampled profiling with `perf record`, flamegraph tooling, `pprof`, or equivalent
+
+Record:
+
+- warmup policy
+- sample count
+- timing metric
+- spread metric
+- counter set
+- profiler choice
+
+If the target is a release workload, do not benchmark debug builds.
+
+### 3. Establish the baseline champion
+
+Measure the untouched line repeatedly enough to estimate noise.
+
+Minimum default:
+
+- at least 1 warmup run per basket item
+- at least 9 timed runs per basket item for champion decisions
+
+Record per basket item:
+
+- median wallclock
+- spread or noise estimate
+- hardware counters if available
+- semantic-guard result
+
+If noise is too high to trust decisions, stabilize the harness before optimizing further.
+
+### 4. Profile the champion
+
+Profile the current champion on the basket items that dominate total cost.
+
+Do not start with "obvious" micro-optimizations.
+Find where the time, cycles, cache misses, allocations, syscalls, or contention actually live.
+
+The first real bottleneck thesis should come from this data.
+
+### 5. Write the bottleneck thesis
+
+Every optimization line needs a thesis before code changes.
+
+A thesis must name:
+
+- the hotspot or bottleneck site
+- the measured evidence
+- why this should move wallclock
+- which basket items should improve
+- which counters or profile signatures should change if the thesis is right
+
+Bad theses:
+
+- "this code looks slow"
+- "this alloc seems unnecessary"
+- "we should probably inline this"
+
+Good theses are anchored to numbers.
+
+### 6. Execute one optimization line
+
+Make one coherent optimization move.
+
+Typical move classes:
+
+- remove repeated work
+- improve data layout or locality
+- cut allocations or copies
+- reduce parsing or formatting overhead
+- reduce locking or contention
+- reduce syscall or I/O frequency
+- improve algorithmic complexity
+- hoist invariant work
+- batch expensive operations
+- exploit cheaper library or runtime primitives
+- remove abstraction overhead when it is actually measured
+
+Do not mix semantic redesign into a speed pass.
+If a large architectural move is needed, record that explicitly.
+
+### 7. Re-measure the challenger
+
+Run the same basket, the same harness, and the same semantic guard.
+
+Record:
+
+- challenger timings
+- challenger counters
+- challenger profile notes if needed
+- pass/fail against the thesis
+
+Compare challenger against the current champion, not against memory.
+
+### 8. Decide
+
+Choose exactly one:
+
+- `promote`
+- `enabler`
+- `reject`
+- `rework`
+- `blocked`
+
+Use:
+
+- `promote` when the challenger is a credible logic-preserving improvement on the current basket
+- `enabler` when the challenger is neutral or near-neutral now but materially unlocks stronger follow-on optimization work
+- `reject` when the change loses, is noisy, or fails the guard
+- `rework` when the line is promising but the implementation or measurement is not yet trustworthy
+- `blocked` when an external constraint prevents meaningful progress
+
+If rejected, revert or otherwise unwind the losing line before starting the next serious experiment.
+
+### 9. Repeat
+
+After every decision:
+
+- update the champion snapshot
+- update the candidate queue
+- choose the next strongest measured thesis
+- keep going
+
+If progress stalls, return to profiling and widen the search rather than polishing a dead line.
+
+## Measurement Discipline
+
+- Use one stable basket definition for the whole run unless the user explicitly changes the objective.
+- Keep build profile, env, and thread settings fixed unless changing them is the point of the experiment.
+- Do not trust one-shot timings.
+- Prefer medians over means for decision-making.
+- Record a noise or spread figure for every serious comparison.
+- Attempt hardware counters early rather than as an afterthought.
+- Prefer the same counter set across champion/challenger comparisons.
+- If counters are unavailable because of permissions, virtualization, unsupported PMU access, or host policy, record the exact failure in the worklog.
+- Do not cite counters you did not actually collect.
+- Do not cite profiler output without cashing it out into a bottleneck thesis.
+
+A bottleneck thesis must cite at least:
+
+- one timing signal
+- one profile or counter signal
+
+## Semantic Discipline
+
+Logic preservation is mandatory.
+
+Use the narrowest credible semantic guard that actually protects the behavior under optimization.
+
+Examples:
+
+- targeted tests
+- golden-output diffs
+- checksum or snapshot comparisons
+- invariant checks
+- exact response comparisons
+- benchmark harnesses with correctness validation
+
+Do not smuggle behavior changes in as performance work.
+
+## Embedded Forms
+
+### Worklog Skeleton
+
+```text
+worklog_path:
+objective:
+scope_root:
+mode: audit_plus_refactor
+degraded_mode_reason:
+
+measurement_harness:
+basket_spec:
+semantic_guard:
+baseline_champion:
+counter_status:
+profile_status:
+candidate_queue:
+experiment_log:
+residual_risks:
+```
+
+Rules:
+
+- create this file before the first serious measurement
+- update it after every real experiment
+- treat it as the durable source of truth for the run
+- report the final worklog path in the user-facing response
+
+### Basket Spec
+
+```text
+| item_id | command | cwd | env_summary | input_or_scenario | build_profile | weight | semantic_guard |
+|---------|---------|-----|-------------|-------------------|---------------|--------|----------------|
+| B01 | cargo run --release -- solve case1.lp | . | RAYON_NUM_THREADS=1 | case1.lp | release | 1 | compare objective and solution digest |
+| B02 | cargo run --release -- solve case2.lp | . | RAYON_NUM_THREADS=1 | case2.lp | release | 2 | compare objective and solution digest |
+```
+
+Rules:
+
+- every serious measurement must use this locked basket
+- weights default to `1` if unspecified
+- if the basket changes, record it explicitly as a new objective surface
+
+### Champion Snapshot
+
+```text
+champion_id:
+commit_or_worktree_state:
+decision_metric: weighted_total_median_ms
+
+per_item:
+- item_id: B01
+ median_ms:
+ spread_pct:
+ counters:
+ cycles:
+ instructions:
+ branches:
+ branch_misses:
+ cache_misses:
+ guard: pass
+- item_id: B02
+ median_ms:
+ spread_pct:
+ counters:
+ cycles:
+ instructions:
+ branches:
+ branch_misses:
+ cache_misses:
+ guard: pass
+
+weighted_total_median_ms:
+notes:
+```
+
+Rules:
+
+- keep exactly one current champion
+- update this after every promoted line
+- if counters are unavailable, say so explicitly instead of leaving them blank without explanation
+
+### Bottleneck Thesis
+
+```text
+thesis_id:
+hotspot:
+affected_basket_items:
+timing_evidence:
+profile_or_counter_evidence:
+expected_change:
+planned_move:
+risk_to_semantics:
+```
+
+Rules:
+
+- do not start a serious optimization line without this
+- the evidence must be measured, not aesthetic
+- expected change should name what should get better in timing or counters
+
+### Experiment Record
+
+```text
+experiment_id:
+thesis_id:
+change_summary:
+semantic_guard_result:
+challenger_weighted_total_median_ms:
+per_item_deltas:
+counter_deltas:
+decision: promote | enabler | reject | rework | blocked
+decision_reason:
+follow_on:
+```
+
+Rules:
+
+- one serious optimization line per record
+- rejected lines must still be recorded
+- if the result is noisy or surprising, say so plainly
+- do not mark a change as `enabler` without naming the specific follow-on optimization it unlocks
+- do not accumulate long chains of `enabler` changes without cashing them out into measured wins
+
+## Final Response
+
+Always include:
+
+- the `/tmp` worklog path
+- locked basket summary
+- semantic-guard summary
+- champion metric summary
+- hardware-counter availability summary
+- strongest accepted lines
+- strongest rejected or blocked lines
+- residual next moves
+
+If you edited code, also include:
+
+- the promoted optimization lines
+- verification summary
+- any tradeoffs accepted inside the basket
+
+## Hard Failure Modes
+
+- do not optimize before locking the basket
+- do not optimize before locking the semantic guard
+- do not turn the run into speculative cleanup
+- do not trust one-shot timings
+- do not compare against a drifting baseline
+- do not mix build profiles between champion and challenger
+- do not accept "probably faster"
+- do not keep a change that fails the semantic guard
+- do not make a bottleneck thesis without hard numbers
+- do not cite profiler screenshots or flamegraphs as if they were the win itself
+- do not run a long campaign without attempting hardware counters
+- do not keep multiple half-measured changes live at once
+- do not mark a change as `enabler` without a concrete unlocked next move
+- do not stop after one plausible win if credible next theses remain