skills/exterminate-slop/SKILL.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472

---
name: exterminate-slop
description: Perform an exhaustive Rust-only subtree audit and refactor pass for primitive obsession, under-abstraction, and DRY violations. Use when Codex needs to inspect every Rust file in a requested code subtree for strings, bools, tuples, parameter clumps, duplicated logic, repeated field bundles, or weak abstractions, and must follow a rigid ledgerized process instead of a loose best-effort review.
---

# Exterminate Slop

Read the repo's [AGENTS.md](../../AGENTS.md) first.

Rust only. Do not spend context on language-agnostic framing.

## Contract

- Require a concrete Rust subtree. If the user names only a repo, choose the smallest plausible subtree and state the assumption.
- Default to `audit_plus_refactor` unless the user clearly wants findings only.
- Inspect every `*.rs` file in scope. No sampling and no “representative files.”
- Create one persistent `/tmp` worklog before the first code read. Chat is summary only; the worklog is the durable source of truth.
- Partition the manifest into intentional logical cliques before deep reading.
- Read at most 8 files per wave.
- Write a `/tmp` checkpoint after every wave before reading more files.
- Record raw suspects during coverage, then promote only threshold-clearing suspects into the formal ledger after full coverage.
- Force a decision on every promoted ledger row.
- Do not edit until the promoted ledger exists, unless the user explicitly asked for a narrow hotfix instead of a full extermination pass.

Files may be re-read. Re-reading is expected when a file belongs to multiple logical cliques or when a later duplication cluster changes the correct abstraction boundary.

## Flow

### 0. Create the worklog

Create a path shaped like:

```text
/tmp/exterminate-slop-<repo-or-dir>-<subtree-slug>.md
```

Seed it with the embedded worklog skeleton from `Embedded Forms`.

The worklog must hold:

- the manifest
- the clique plan
- every wave checkpoint
- the rejected clusters
- the promoted ledger
- the residual sweep

If interrupted or resumed, reopen the same worklog before continuing.

### 1. Lock scope and manifest

Enumerate the full Rust-file manifest, for example with:

```bash
rg --files <subtree> -g '*.rs'
```

Write the manifest in the embedded form from `Embedded Forms` and persist it into the `/tmp` worklog immediately.

### 2. Plan logical cliques

Before deep reading, group manifest files into small logical cliques using the embedded clique-wave form from `Embedded Forms`.

Good clique causes:

- shared domain types
- repeated parse / validate / normalize flows
- repeated rendering / formatting flows
- repeated control-flow skeletons
- repeated construction / patch / conversion logic
- module-boundary interactions
- suspected duplication clusters

Cliques may overlap. A file may appear in multiple cliques. Keep cliques to 2-8 files; split larger conceptual groups into multiple related waves.

### 3. Run bounded coverage waves

For each clique wave:

- read at most 8 files
- use `rust_analyzer` aggressively for definitions, references, rename feasibility, and boundary understanding
- prefer `rust_analyzer` from live use sites in the current clique, not blind declaration-first jumps
- update the manifest coverage
- write a `/tmp` checkpoint containing:
  - clique id and purpose
  - files inspected in the wave
  - raw suspects
  - duplication hypotheses strengthened or weakened
  - files worth re-reading in later cliques

Do not read a 9th file until that checkpoint exists. A file counts as covered once it has been read in at least one wave.

Concrete `rust_analyzer` pattern:

- start at a live use site: field access, constructor call, function call, match arm, or repeated conversion site
- use hover or definition there to anchor the actual symbol in play
- then expand with references once the anchor is correct
- use rename feasibility only after the symbol has been anchored from a concrete use site

If declaration-first references are noisy or incomplete, return to a concrete use site and trace outward again.

### 4. Run smell passes

Run distinct passes over the manifest and checkpointed suspects. If a pass needs more context, form another clique and respect the 8-file cap again.

#### String suspicion

Interrogate meaningful `String`, `&str`, string discriminators, and stringly keys.

Ask whether the value is really an id, slug, path, URL, email, unit, currency, locale, timestamp, status, or other validated domain token; whether normalization or parsing repeats at call sites; whether the value set is finite; and whether multiple strings travel together as a latent record.

#### Bool suspicion

Interrogate meaningful `bool`.

Ask whether it is really a mode, strategy, phase, policy, capability, or tri-state; whether call sites become unreadable because the meaning is positional; and whether correlated bools are encoding illegal states that want an enum or policy type.

#### Tuple suspicion

Interrogate tuple types, tuple returns, and repeated destructuring.

Ask whether the order is accidental, whether the elements have stable names in human discussion, whether the tuple crosses module boundaries, and whether the same shape is repeatedly rebuilt or unpacked.

#### Parameter-clump and field-bundle suspicion

Interrogate functions, constructors, builders, and repeated local bundles.

Ask whether the same 2-5 values travel together repeatedly, whether repeated bundles imply invariants, and whether an argument list wants a domain struct or value object.

#### Duplication and under-abstraction

Search for repeated parse / validate / normalize pipelines, repeated format / render / serialize logic, repeated match skeletons, repeated field-by-field copying, repeated conversions, repeated error shaping, and repeated query predicates.

#### Primitive control encodings

Interrogate string discriminators, integer type codes, sentinel values, parallel collections implying a missing domain object, and repeated `Option` / `bool` combinations that really encode richer states.

### 5. Promote suspects

During coverage, keep only raw suspects. After every manifest file has been inspected at least once, promote a suspect into the formal ledger only if at least one is true:

- it appears in more than one file or module
- it crosses a crate or module boundary
- it shows invariant pressure rather than mere cosmetic repetition
- the fix is locally actionable now

Anything not promoted must still be accounted for as a short rejected cluster using the embedded rejected-cluster form from `Embedded Forms`. Persist both promoted and rejected outcomes in the `/tmp` worklog.

### 6. Populate and adjudicate the promoted ledger

Populate the formal ledger with the embedded ledger form from `Embedded Forms`. Only promoted candidates belong there.

For every row, run the embedded adjudication checklist and choose exactly one:

- `refactor_now`
- `defer`
- `false_positive`
- `needs_broader_design`

Keep the authoritative ledger in the `/tmp` worklog even if chat shows only a condensed view.

### 7. Refactor in batches

Group work by abstraction move rather than by file.

Preferred moves:

- primitive id or validated token → newtype / value object
- string or bool discriminator → enum
- tuple or clump → named struct
- repeated validation or parsing → constructor or smart parser
- repeated conversion or patch logic → dedicated helper, trait, or module
- repeated branch skeleton → extracted abstraction

Apply the more specific move guidance in `Embedded Refactor Playbook`.

Preserve public API compatibility only if the user asked for it. Otherwise prefer the clean shape.

### 8. Verify and sweep

After edits:

- run relevant checks on the touched surface
- re-scan the same manifest
- update the worklog with eliminated rows, deferred rows, false positives, and residual hotspots

Prefer the narrowest checks that still validate the changed surface, then widen if needed.

## Embedded Forms

### Worklog Skeleton

Create one persistent worklog file per extermination run.

```text
worklog_path:
scope_root:
inspection_mode: audit_only | audit_plus_refactor
wave_size_limit: 8

manifest:

clique_plan:

wave_checkpoints:

rejected_clusters:

promoted_ledger:

residual_sweep:
```

Rules:

- create this file before the first code read
- update it after every 8-file wave before reading more files
- treat it as the durable source of truth for the run
- if chat summaries are shorter, the worklog still must remain complete
- report the final worklog path in the user-facing response

### File Manifest

Build this before analysis starts.

```text
scope_root:
inspection_mode: audit_only | audit_plus_refactor
wave_size_limit: 8

files:
- [ ] path/to/file_a.rs
- [ ] path/to/file_b.rs
- [ ] path/to/file_c.rs
```

Rules:

- include every `*.rs` file in the requested subtree
- mark files as inspected only after reading them
- allow files to appear in multiple clique waves
- do not delete rows from the manifest during refactoring
- use the same manifest again for the residual sweep

### Clique Wave Checkpoint

Plan and checkpoint every bounded read wave with this format.

```text
clique_id:
purpose:
why_these_files_belong_together:
wave_size:

files:
- path/to/file_a.rs
- path/to/file_b.rs

checkpoint:
- raw suspects:
- duplication hypotheses:
- likely rereads:
- abstraction pressure:
```

Rules:

- `wave_size` must never exceed `8`
- write a checkpoint into the `/tmp` worklog after every wave before reading more files
- keep the checkpoint terse and raw; do not inflate every suspect into a formal ledger row yet
- a file may appear in multiple cliques when duplication or boundary reasoning demands it
- split oversized conceptual groups into multiple related cliques rather than inflating one wave
- name the clique by the suspected relationship, not by arbitrary adjacency

### Rejected Cluster

Use this for suspect clusters that were noticed during coverage but did not earn formal ledger rows.

```text
cluster_id:
kind:
sites:
- path/to/file_a.rs :: short note
- path/to/file_b.rs :: short note

rejection_reason:
still_watch_for:
```

Good rejection reasons:

- boundary-shaped transport code
- serialization or protocol surface where primitive form is correct
- cosmetic repetition without invariant pressure
- local-only pattern with no cross-site duplication
- abstraction name still too foggy to justify extraction

Rules:

- keep this short
- account for every non-promoted suspect cluster somewhere
- do not use rejected clusters to hide real actionable refactors

### Promoted Ledger

Use one row per promoted candidate site.

```text
| row_id | path | symbol_or_site | kind | evidence | suspected_abstraction | duplication_cluster | confidence | decision | decision_reason |
|--------|------|----------------|------|----------|-----------------------|---------------------|------------|----------|-----------------|
| S001 | crates/foo/src/bar.rs | User.id: String | string | parsed, validated, and compared in 3 places | UserId newtype | id-handling-1 | high | refactor_now | domain identifier with repeated normalization |
| B001 | crates/foo/src/baz.rs | frobnicate(..., dry_run: bool) | bool | positional flag controls strategy | ExecutionMode enum | mode-flags-1 | high | refactor_now | bool hides a mode split |
| T001 | crates/foo/src/qux.rs | (start, end, step) | tuple | repeated destructuring across module boundary | RangeSpec struct | tuple-shapes-2 | medium | defer | local only today, but cluster suggests future extraction |
```

Allowed `kind` values:

- `string`
- `bool`
- `tuple`
- `parameter_clump`
- `field_bundle`
- `duplicate_logic`
- `primitive_control_encoding`

Allowed `confidence` values:

- `high`
- `medium`
- `low`

Allowed `decision` values:

- `refactor_now`
- `defer`
- `false_positive`
- `needs_broader_design`

Requirements:

- `evidence` must name the concrete smell, not a vague feeling
- `suspected_abstraction` must be specific
- `duplication_cluster` must group related rows when applicable
- `decision_reason` must explain why the row did or did not graduate into an edit
- do not create rows for suspects that failed promotion; summarize those as rejected clusters instead

## Embedded Adjudication Checklist

Run this checklist for every ledger row.

### Semantic Pressure

- Does this primitive carry domain semantics rather than mere transport?
- Does it have validation, normalization, parsing, or formatting rules?
- Does the code rely on an implied finite value set?
- Would naming the concept clarify the surrounding code immediately?

### Invariant Pressure

- Are some values invalid but still representable today?
- Are correlated values allowed to drift apart?
- Does ordering matter only because a tuple hid the field names?
- Are impossible states currently encoded with primitive combinations?

### Boundary Pressure

- Does the primitive cross module or crate boundaries?
- Does it appear in public APIs, trait methods, or repeated call chains?
- Is the same primitive interpretation duplicated at multiple boundaries?

### Duplication Pressure

- Is the same parse, validate, normalize, compare, or convert logic repeated?
- Does the same field bundle move together repeatedly?
- Is there a repeated branch skeleton with only small local differences?

### Refactor Readiness

- Can a local type or helper eliminate repetition immediately?
- Will the fix cascade cleanly through references if renamed mechanically?
- Is there a natural owner module for the new abstraction?
- Does the change need a broader domain redesign first?

### Decision Mapping

Choose exactly one:

- `refactor_now`: the abstraction is clear and locally actionable now
- `defer`: the smell is real but the present diff would be premature
- `false_positive`: the primitive is genuinely incidental or boundary-shaped
- `needs_broader_design`: the smell is real but the right abstraction spans a larger domain cut

## Embedded Refactor Playbook

Prefer these moves when the ledger supports them.

### String

- identifiers → newtypes
- finite tags / statuses / modes → enums
- validated textual concepts → smart constructors plus opaque wrappers
- repeated path / URL / email / locale handling → dedicated domain type or parser boundary

### Bool

- mode switches → enums
- policy flags → policy types
- correlated bool sets → state enum or config struct with named fields
- public boolean parameters → named option type unless the meaning is truly trivial

### Tuple

- cross-module tuples → named structs
- repeated destructuring → named fields
- semantically rich returns → domain object instead of positional packs

### Parameter Clumps And Bundles

- repeated argument groups → parameter object
- repeated local field packs → extracted struct
- repeated construction/update logic → dedicated constructor or helper module

### Duplicate Logic

- repeated validation → single constructor / validator
- repeated conversions → `From` / `TryFrom` / dedicated conversion function
- repeated branch skeletons → extracted helper, trait, or dispatch enum
- repeated formatting → single renderer / formatter

### Rejection Criteria

Do not introduce an abstraction merely because a primitive exists.

Reject or defer when:

- the primitive is truly incidental and carries no stable semantics
- the abstraction name is still foggy
- the same concept appears only once and shows no invariant pressure
- the code is at a serialization boundary where the primitive form is the correct external shape

## Final Response

Always include:

- the `/tmp` worklog path
- manifest coverage summary
- clique-wave summary
- rejected-cluster summary
- promoted-ledger summary

If you edited code, also include:

- abstraction batches executed
- verification summary
- residual sweep summary

If you did not edit code, include:

- highest-value next moves

## Hard Failure Modes

- do not inspect only “important” files
- do not skip the `/tmp` worklog or keep checkpoint state only in chat
- do not read more than 8 files without a persisted checkpoint
- do not silently compact a wave because you feel you “basically got it”
- do not treat one clique as if it exhausts all duplication relationships
- do not promote everything into the formal ledger just to feel exhaustive
- do not drop non-promoted suspects on the floor; summarize them as rejected clusters
- do not jump into edits before the promoted ledger exists
- do not propose generic cleanup without row-level evidence
- do not treat Clippy output as a substitute for this pass