swarm repositories / source
aboutsummaryrefslogtreecommitdiff
path: root/docs/libgrid-dogfood.md
diff options
context:
space:
mode:
authormain <main@swarm.moe>2026-03-20 16:00:30 -0400
committermain <main@swarm.moe>2026-03-20 16:00:30 -0400
commit9d63844f3a28fde70b19500422f17379e99e588a (patch)
tree163cfbd65a8d3528346561410ef39eb1183a16f2 /docs/libgrid-dogfood.md
parent22fe3d2ce7478450a1d7443c4ecbd85fd4c46716 (diff)
downloadfidget_spinner-9d63844f3a28fde70b19500422f17379e99e588a.zip
Refound Spinner as an austere frontier ledger
Diffstat (limited to 'docs/libgrid-dogfood.md')
-rw-r--r--docs/libgrid-dogfood.md199
1 files changed, 65 insertions, 134 deletions
diff --git a/docs/libgrid-dogfood.md b/docs/libgrid-dogfood.md
index 206c4d7..9d81993 100644
--- a/docs/libgrid-dogfood.md
+++ b/docs/libgrid-dogfood.md
@@ -6,26 +6,19 @@
failure mode Fidget Spinner is designed to kill:
- long autonomous optimization loops
-- heavy worktree usage
-- benchmark-driven decisions
-- huge markdown logs that blur evidence, narrative, and verdicts
+- heavy benchmark slicing
+- worktree churn
+- huge markdown logs that blur intervention, result, and verdict
That is the proving ground.
-## Immediate MVP Goal
+## Immediate Goal
-The MVP does not need to solve all of `libgrid`.
+The goal is not “ingest every scrap of prose.”
-It needs to solve this specific problem:
-
-replace the giant freeform experiment log with a machine in which the active
-frontier, the accepted lines, the live evidence, and the dead ends are all
-explicit and queryable.
-
-When using a global unbound MCP session from a `libgrid` worktree, the first
-project-local action should be `project.bind` against the `libgrid` worktree
-root or any nested path inside it. The session should not assume the MCP host's
-own repo.
+The goal is to replace the giant freeform experiment log with a machine in
+which the active frontier, live hypotheses, current experiments, verdicts, and
+best benchmark lines are explicit and queryable.
## Mapping Libgrid Work Into The Model
@@ -33,163 +26,101 @@ own repo.
One optimization objective becomes one frontier:
-- improve MILP solve quality
-- reduce wall-clock time
-- reduce LP pressure
-- improve node throughput
-- improve best-bound quality
-
-### Contract node
-
-The root contract should state:
-
-- objective in plain language
-- benchmark suite set
-- primary metric
-- supporting metrics
-- promotion criteria
-
-### Change node
-
-Use `hypothesis.record` to capture:
-
-- what hypothesis is being tested
-- what benchmark suite matters
-- any terse sketch of the intended delta
-
-### Run node
-
-The run node should capture:
-
-- exact command
-- cwd
-- backend kind
-- run dimensions
-- resulting metrics
+- root cash-out
+- LP spend reduction
+- primal improvement
+- search throughput
+- cut pipeline quality
-### Decision node
+The frontier brief should answer where the campaign stands right now, not dump
+historical narrative.
-The decision should make the verdict explicit:
+### Hypothesis
-- accepted
-- kept
-- parked
-- rejected
+A hypothesis should capture one concrete intervention claim:
-### Off-path nodes
+- terse title
+- one-line summary
+- one-paragraph body
-Use these freely:
+If the body wants to become a design memo, it is too large.
-- `source` for ideas, external references, algorithm sketches
-- `source` for scaffolding that is not yet a benchmarked experiment
-- `note` for quick observations
+### Experiment
-This is how the system avoids forcing every useful thought into experiment
-closure.
+Each measured slice becomes one experiment under exactly one hypothesis.
-## Suggested Libgrid Project Schema
+The experiment closes with:
-The `libgrid` project should eventually define richer payload conventions in
-`.fidget_spinner/schema.json`.
-
-The MVP does not need hard rejection. It does need meaningful warnings.
-
-Good first project fields:
+- dimensions such as `instance`, `profile`, `duration_s`
+- primary metric
+- supporting metrics
+- verdict: `accepted | kept | parked | rejected`
+- rationale
+- optional analysis
-- `hypothesis` on `hypothesis`
-- `benchmark_suite` on `hypothesis` and `run`
-- `body` on `hypothesis`, `source`, and `note`
-- `comparison_claim` on `analysis`
-- `rationale` on `decision`
+If a tranche doc reports multiple benchmark slices, it should become multiple
+experiments, not one prose blob.
-Good first metric vocabulary:
+### Artifact
-- `wall_clock_s`
-- `solved_instance_count`
-- `nodes_expanded`
-- `best_bound_delta`
-- `lp_calls`
-- `memory_bytes`
+Historical markdown, logs, tables, and other large dumps should be attached as
+artifacts by reference when they matter. They should not live in the ledger as
+default-enumerated prose.
-## Libgrid MVP Workflow
+## Libgrid Workflow
-### 1. Seed the frontier
+### 1. Ground
-1. Initialize the project store.
-2. Create a frontier contract.
+1. Bind the MCP to the libgrid worktree.
+2. Read `frontier.open`.
+3. Decide whether the next move is a new hypothesis, a new experiment on an
+ existing hypothesis, or a frontier brief update.
### 2. Start a line of attack
-1. Read the current frontier and the recent DAG tail.
-2. Record a `hypothesis`.
-3. If needed, attach off-path `source` or `note` nodes first.
+1. Record a hypothesis.
+2. Attach any necessary artifacts by reference.
+3. Open one experiment for the concrete slice being tested.
-### 3. Execute one experiment
+### 3. Execute
1. Modify the worktree.
2. Run the benchmark protocol.
-3. Close the experiment atomically.
+3. Close the experiment atomically with parsed metrics and an explicit verdict.
### 4. Judge and continue
-1. Mark the line accepted, kept, parked, or rejected.
-2. Archive dead ends instead of leaving them noisy and active.
-3. Repeat.
+1. Use `accepted`, `kept`, `parked`, and `rejected` honestly.
+2. Let the frontier brief summarize the current strategic state.
+3. Let historical tranche markdown live as artifacts when preservation matters.
## Benchmark Discipline
-For `libgrid`, the benchmark evidence needs to be structurally trustworthy.
-
-The MVP should always preserve at least:
+For `libgrid`, the minimum trustworthy record is:
- run dimensions
- primary metric
-- supporting metrics
-- command envelope
-
-This is the minimum needed to prevent "I think this was faster" folklore.
-
-## What The MVP Can Defer
-
-These are useful but not required for the first real dogfood loop:
-
-- strong markdown migration
-- multi-agent coordination
-- rich artifact bundling
-- pruning or vacuum passes beyond archive
-- UI-heavy analysis
-
-The right sequence is:
-
-1. start a clean front
-2. run new work through Fidget Spinner
-3. backfill old markdown only when it is worth the effort
-
-## Repo-Local Dogfood Before Libgrid
+- supporting metrics that materially explain the verdict
+- rationale
-This repository itself is a valid off-path dogfood target even though it is not
-a benchmark-heavy repo.
+This is the minimum needed to prevent “I think this was faster” folklore.
-That means we can already use it to test:
+## Active Metric Discipline
-- project initialization
-- schema visibility
-- frontier creation and status projection
-- off-path source recording
-- hidden annotations
-- MCP read and write flows
+`libgrid` will accumulate many niche metrics.
-What it cannot honestly test is heavy benchmark ingestion and the retrieval
-pressure that comes with it. That still belongs in a real optimization corpus
-such as the `libgrid` worktree.
+The hot path should care about live metrics only: the metrics touched by the
+active experimental frontier and its immediate comparison set. Old, situational
+metrics may remain in the registry without dominating `frontier.open`.
-## Acceptance Bar For Libgrid
+## Acceptance Bar
Fidget Spinner is ready for serious `libgrid` use when:
-- an agent can run for hours without generating a giant markdown graveyard
-- the operator can identify accepted, kept, parked, and rejected lines mechanically
-- each completed experiment has result, note, and verdict
-- off-path side investigations stay preserved but do not pollute the core path
+- an agent can run for hours without generating a markdown graveyard
+- `frontier.open` gives a truthful, bounded orientation surface
+- active hypotheses and open experiments are obvious
+- closed experiments carry parsed metrics rather than prose-only results
+- artifacts preserve source texture without flooding the hot path
- the system feels like a machine for evidence rather than a diary with better
typography