diff options
| author | main <main@swarm.moe> | 2026-04-03 23:43:03 -0400 |
|---|---|---|
| committer | main <main@swarm.moe> | 2026-04-03 23:43:03 -0400 |
| commit | 4633c44d8018bdfe83883aff3b4ebfd984cbdc35 (patch) | |
| tree | 323ae841a5d9ec20526952e5123c99a952319215 | |
| parent | 6600904bcc9494a1c381d679ecdee74a864189c6 (diff) | |
| download | skills-main.zip | |
| -rw-r--r-- | README.md | 1 | ||||
| -rw-r--r-- | redline/SKILL.md | 407 | ||||
| -rw-r--r-- | redline/agents/openai.yaml | 4 |
3 files changed, 412 insertions, 0 deletions
@@ -6,3 +6,4 @@ Current skills: - `fahrenheit-451`: zero-based documentation purge and consolidation for markdown and plaintext notes - `haussmann`: zero-based Rust source-tree audit and reorganization for layout, module topology, and shared support placement +- `redline`: logic-preserving wallclock optimization for a locked basket of invocations with measurement-first discipline diff --git a/redline/SKILL.md b/redline/SKILL.md new file mode 100644 index 0000000..7662365 --- /dev/null +++ b/redline/SKILL.md @@ -0,0 +1,407 @@ +--- +name: redline +description: Run a logic-preserving wallclock optimization loop for a named basket of invocations. Use when Codex needs to make a command, workload, or benchmark basket faster by locking measurement, collecting hardware counters, justifying each bottleneck thesis with hard numbers, and iterating until blocked or stopped. +--- + +# Redline + +Use this skill to reduce wallclock for a concrete basket of invocations without changing logic. + +This skill is measurement-first. It is not a generic cleanup pass, and it is not a speculative "maybe this is faster" pass. + +Language-agnostic, but code-centric. + +## Contract + +- Require a concrete basket of invocations. +- Require a semantic guard. If the user did not provide one, infer the narrowest credible guard from tests, golden outputs, invariants, or diffable results. +- Default basket weights to `1` unless the user specifies otherwise. +- Create one persistent `/tmp` worklog before the first serious measurement. Chat is summary only; the worklog is the durable source of truth. +- Lock the measurement surface before optimization: commands, inputs, cwd, env, build profile, thread settings, and any relevant runtime knobs. +- Do not begin optimization until the basket and semantic guard are explicit in the worklog. +- Attempt to obtain hardware-countered measurements early. If they are unavailable, record the exact blocker and treat the run as degraded mode. +- Do not run a long blind optimization campaign without either hardware counters or a clear recorded reason why they are unavailable. +- Measure the baseline repeatedly enough to bound noise before making changes. +- Use the current best measured line as the champion. Do not compare against a drifting or remembered baseline. +- Every bottleneck thesis must be justified by hard numbers: timing plus profiling and/or counters. +- Profile before speculative rewrites. +- Work one measured optimization line at a time unless a bundled change is strictly inseparable. +- Re-measure every challenger against the same basket and the same guard. +- Keep only logic-preserving wins and real enablers. +- Do not stop because you found one plausible speedup. Keep pushing until stopped, blocked, or genuinely out of credible next theses. + +## Optimization Law + +The objective is lower wallclock for the locked basket. + +The default scalar decision metric is: + +```text +weighted_total_median_ms = sum(weight_i * median_ms_i) +``` + +Per-item medians still matter. Do not accept a "win" that quietly trashes one basket item unless the user explicitly allows that tradeoff. + +A challenger becomes the new champion only if: + +- the semantic guard passes +- the wallclock win is credible against observed noise +- the tradeoff pattern is acceptable for the basket +- the thesis is supported by the measured evidence, not just the result + +## Flow + +### 0. Create the worklog + +Create a path shaped like: + +```text +/tmp/redline-<repo-or-dir>-<basket-slug>.md +``` + +Seed it with the embedded worklog skeleton from `Embedded Forms`. + +### 1. Lock the basket + +Write the exact basket first. + +For each basket item, record: + +- command +- cwd +- relevant env +- input or scenario +- build profile +- weight +- semantic guard target + +Do not optimize an implied basket. + +### 2. Lock the measurement harness + +Choose the actual measurement tools and parameters before chasing speed. + +Prefer: + +- repeated wallclock sampling with `hyperfine` or an equivalent harness +- hardware counters with `perf stat` or platform equivalent +- sampled profiling with `perf record`, flamegraph tooling, `pprof`, or equivalent + +Record: + +- warmup policy +- sample count +- timing metric +- spread metric +- counter set +- profiler choice + +If the target is a release workload, do not benchmark debug builds. + +### 3. Establish the baseline champion + +Measure the untouched line repeatedly enough to estimate noise. + +Minimum default: + +- at least 1 warmup run per basket item +- at least 9 timed runs per basket item for champion decisions + +Record per basket item: + +- median wallclock +- spread or noise estimate +- hardware counters if available +- semantic-guard result + +If noise is too high to trust decisions, stabilize the harness before optimizing further. + +### 4. Profile the champion + +Profile the current champion on the basket items that dominate total cost. + +Do not start with "obvious" micro-optimizations. +Find where the time, cycles, cache misses, allocations, syscalls, or contention actually live. + +The first real bottleneck thesis should come from this data. + +### 5. Write the bottleneck thesis + +Every optimization line needs a thesis before code changes. + +A thesis must name: + +- the hotspot or bottleneck site +- the measured evidence +- why this should move wallclock +- which basket items should improve +- which counters or profile signatures should change if the thesis is right + +Bad theses: + +- "this code looks slow" +- "this alloc seems unnecessary" +- "we should probably inline this" + +Good theses are anchored to numbers. + +### 6. Execute one optimization line + +Make one coherent optimization move. + +Typical move classes: + +- remove repeated work +- improve data layout or locality +- cut allocations or copies +- reduce parsing or formatting overhead +- reduce locking or contention +- reduce syscall or I/O frequency +- improve algorithmic complexity +- hoist invariant work +- batch expensive operations +- exploit cheaper library or runtime primitives +- remove abstraction overhead when it is actually measured + +Do not mix semantic redesign into a speed pass. +If a large architectural move is needed, record that explicitly. + +### 7. Re-measure the challenger + +Run the same basket, the same harness, and the same semantic guard. + +Record: + +- challenger timings +- challenger counters +- challenger profile notes if needed +- pass/fail against the thesis + +Compare challenger against the current champion, not against memory. + +### 8. Decide + +Choose exactly one: + +- `promote` +- `enabler` +- `reject` +- `rework` +- `blocked` + +Use: + +- `promote` when the challenger is a credible logic-preserving improvement on the current basket +- `enabler` when the challenger is neutral or near-neutral now but materially unlocks stronger follow-on optimization work +- `reject` when the change loses, is noisy, or fails the guard +- `rework` when the line is promising but the implementation or measurement is not yet trustworthy +- `blocked` when an external constraint prevents meaningful progress + +If rejected, revert or otherwise unwind the losing line before starting the next serious experiment. + +### 9. Repeat + +After every decision: + +- update the champion snapshot +- update the candidate queue +- choose the next strongest measured thesis +- keep going + +If progress stalls, return to profiling and widen the search rather than polishing a dead line. + +## Measurement Discipline + +- Use one stable basket definition for the whole run unless the user explicitly changes the objective. +- Keep build profile, env, and thread settings fixed unless changing them is the point of the experiment. +- Do not trust one-shot timings. +- Prefer medians over means for decision-making. +- Record a noise or spread figure for every serious comparison. +- Attempt hardware counters early rather than as an afterthought. +- Prefer the same counter set across champion/challenger comparisons. +- If counters are unavailable because of permissions, virtualization, unsupported PMU access, or host policy, record the exact failure in the worklog. +- Do not cite counters you did not actually collect. +- Do not cite profiler output without cashing it out into a bottleneck thesis. + +A bottleneck thesis must cite at least: + +- one timing signal +- one profile or counter signal + +## Semantic Discipline + +Logic preservation is mandatory. + +Use the narrowest credible semantic guard that actually protects the behavior under optimization. + +Examples: + +- targeted tests +- golden-output diffs +- checksum or snapshot comparisons +- invariant checks +- exact response comparisons +- benchmark harnesses with correctness validation + +Do not smuggle behavior changes in as performance work. + +## Embedded Forms + +### Worklog Skeleton + +```text +worklog_path: +objective: +scope_root: +mode: audit_plus_refactor +degraded_mode_reason: + +measurement_harness: +basket_spec: +semantic_guard: +baseline_champion: +counter_status: +profile_status: +candidate_queue: +experiment_log: +residual_risks: +``` + +Rules: + +- create this file before the first serious measurement +- update it after every real experiment +- treat it as the durable source of truth for the run +- report the final worklog path in the user-facing response + +### Basket Spec + +```text +| item_id | command | cwd | env_summary | input_or_scenario | build_profile | weight | semantic_guard | +|---------|---------|-----|-------------|-------------------|---------------|--------|----------------| +| B01 | cargo run --release -- solve case1.lp | . | RAYON_NUM_THREADS=1 | case1.lp | release | 1 | compare objective and solution digest | +| B02 | cargo run --release -- solve case2.lp | . | RAYON_NUM_THREADS=1 | case2.lp | release | 2 | compare objective and solution digest | +``` + +Rules: + +- every serious measurement must use this locked basket +- weights default to `1` if unspecified +- if the basket changes, record it explicitly as a new objective surface + +### Champion Snapshot + +```text +champion_id: +commit_or_worktree_state: +decision_metric: weighted_total_median_ms + +per_item: +- item_id: B01 + median_ms: + spread_pct: + counters: + cycles: + instructions: + branches: + branch_misses: + cache_misses: + guard: pass +- item_id: B02 + median_ms: + spread_pct: + counters: + cycles: + instructions: + branches: + branch_misses: + cache_misses: + guard: pass + +weighted_total_median_ms: +notes: +``` + +Rules: + +- keep exactly one current champion +- update this after every promoted line +- if counters are unavailable, say so explicitly instead of leaving them blank without explanation + +### Bottleneck Thesis + +```text +thesis_id: +hotspot: +affected_basket_items: +timing_evidence: +profile_or_counter_evidence: +expected_change: +planned_move: +risk_to_semantics: +``` + +Rules: + +- do not start a serious optimization line without this +- the evidence must be measured, not aesthetic +- expected change should name what should get better in timing or counters + +### Experiment Record + +```text +experiment_id: +thesis_id: +change_summary: +semantic_guard_result: +challenger_weighted_total_median_ms: +per_item_deltas: +counter_deltas: +decision: promote | enabler | reject | rework | blocked +decision_reason: +follow_on: +``` + +Rules: + +- one serious optimization line per record +- rejected lines must still be recorded +- if the result is noisy or surprising, say so plainly +- do not mark a change as `enabler` without naming the specific follow-on optimization it unlocks +- do not accumulate long chains of `enabler` changes without cashing them out into measured wins + +## Final Response + +Always include: + +- the `/tmp` worklog path +- locked basket summary +- semantic-guard summary +- champion metric summary +- hardware-counter availability summary +- strongest accepted lines +- strongest rejected or blocked lines +- residual next moves + +If you edited code, also include: + +- the promoted optimization lines +- verification summary +- any tradeoffs accepted inside the basket + +## Hard Failure Modes + +- do not optimize before locking the basket +- do not optimize before locking the semantic guard +- do not turn the run into speculative cleanup +- do not trust one-shot timings +- do not compare against a drifting baseline +- do not mix build profiles between champion and challenger +- do not accept "probably faster" +- do not keep a change that fails the semantic guard +- do not make a bottleneck thesis without hard numbers +- do not cite profiler screenshots or flamegraphs as if they were the win itself +- do not run a long campaign without attempting hardware counters +- do not keep multiple half-measured changes live at once +- do not mark a change as `enabler` without a concrete unlocked next move +- do not stop after one plausible win if credible next theses remain diff --git a/redline/agents/openai.yaml b/redline/agents/openai.yaml new file mode 100644 index 0000000..76907d7 --- /dev/null +++ b/redline/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "Redline" + short_description: "Ruthless wallclock optimization" + default_prompt: "Use $redline to run a logic-preserving wallclock optimization loop for a named basket of invocations." |