From 9d63844f3a28fde70b19500422f17379e99e588a Mon Sep 17 00:00:00 2001
From: main <main@swarm.moe>
Date: Fri, 20 Mar 2026 16:00:30 -0400
Subject: Refound Spinner as an austere frontier ledger

---
 docs/libgrid-dogfood.md | 199 ++++++++++++++++--------------------------------
 1 file changed, 65 insertions(+), 134 deletions(-)

(limited to 'docs/libgrid-dogfood.md')

diff --git a/docs/libgrid-dogfood.md b/docs/libgrid-dogfood.md
index 206c4d7..9d81993 100644
--- a/docs/libgrid-dogfood.md
+++ b/docs/libgrid-dogfood.md
@@ -6,26 +6,19 @@
 failure mode Fidget Spinner is designed to kill:
 
 - long autonomous optimization loops
-- heavy worktree usage
-- benchmark-driven decisions
-- huge markdown logs that blur evidence, narrative, and verdicts
+- heavy benchmark slicing
+- worktree churn
+- huge markdown logs that blur intervention, result, and verdict
 
 That is the proving ground.
 
-## Immediate MVP Goal
+## Immediate Goal
 
-The MVP does not need to solve all of `libgrid`.
+The goal is not “ingest every scrap of prose.”
 
-It needs to solve this specific problem:
-
-replace the giant freeform experiment log with a machine in which the active
-frontier, the accepted lines, the live evidence, and the dead ends are all
-explicit and queryable.
-
-When using a global unbound MCP session from a `libgrid` worktree, the first
-project-local action should be `project.bind` against the `libgrid` worktree
-root or any nested path inside it. The session should not assume the MCP host's
-own repo.
+The goal is to replace the giant freeform experiment log with a machine in
+which the active frontier, live hypotheses, current experiments, verdicts, and
+best benchmark lines are explicit and queryable.
 
 ## Mapping Libgrid Work Into The Model
 
@@ -33,163 +26,101 @@ own repo.
 
 One optimization objective becomes one frontier:
 
-- improve MILP solve quality
-- reduce wall-clock time
-- reduce LP pressure
-- improve node throughput
-- improve best-bound quality
-
-### Contract node
-
-The root contract should state:
-
-- objective in plain language
-- benchmark suite set
-- primary metric
-- supporting metrics
-- promotion criteria
-
-### Change node
-
-Use `hypothesis.record` to capture:
-
-- what hypothesis is being tested
-- what benchmark suite matters
-- any terse sketch of the intended delta
-
-### Run node
-
-The run node should capture:
-
-- exact command
-- cwd
-- backend kind
-- run dimensions
-- resulting metrics
+- root cash-out
+- LP spend reduction
+- primal improvement
+- search throughput
+- cut pipeline quality
 
-### Decision node
+The frontier brief should answer where the campaign stands right now, not dump
+historical narrative.
 
-The decision should make the verdict explicit:
+### Hypothesis
 
-- accepted
-- kept
-- parked
-- rejected
+A hypothesis should capture one concrete intervention claim:
 
-### Off-path nodes
+- terse title
+- one-line summary
+- one-paragraph body
 
-Use these freely:
+If the body wants to become a design memo, it is too large.
 
-- `source` for ideas, external references, algorithm sketches
-- `source` for scaffolding that is not yet a benchmarked experiment
-- `note` for quick observations
+### Experiment
 
-This is how the system avoids forcing every useful thought into experiment
-closure.
+Each measured slice becomes one experiment under exactly one hypothesis.
 
-## Suggested Libgrid Project Schema
+The experiment closes with:
 
-The `libgrid` project should eventually define richer payload conventions in
-`.fidget_spinner/schema.json`.
-
-The MVP does not need hard rejection. It does need meaningful warnings.
-
-Good first project fields:
+- dimensions such as `instance`, `profile`, `duration_s`
+- primary metric
+- supporting metrics
+- verdict: `accepted | kept | parked | rejected`
+- rationale
+- optional analysis
 
-- `hypothesis` on `hypothesis`
-- `benchmark_suite` on `hypothesis` and `run`
-- `body` on `hypothesis`, `source`, and `note`
-- `comparison_claim` on `analysis`
-- `rationale` on `decision`
+If a tranche doc reports multiple benchmark slices, it should become multiple
+experiments, not one prose blob.
 
-Good first metric vocabulary:
+### Artifact
 
-- `wall_clock_s`
-- `solved_instance_count`
-- `nodes_expanded`
-- `best_bound_delta`
-- `lp_calls`
-- `memory_bytes`
+Historical markdown, logs, tables, and other large dumps should be attached as
+artifacts by reference when they matter. They should not live in the ledger as
+default-enumerated prose.
 
-## Libgrid MVP Workflow
+## Libgrid Workflow
 
-### 1. Seed the frontier
+### 1. Ground
 
-1. Initialize the project store.
-2. Create a frontier contract.
+1. Bind the MCP to the libgrid worktree.
+2. Read `frontier.open`.
+3. Decide whether the next move is a new hypothesis, a new experiment on an
+   existing hypothesis, or a frontier brief update.
 
 ### 2. Start a line of attack
 
-1. Read the current frontier and the recent DAG tail.
-2. Record a `hypothesis`.
-3. If needed, attach off-path `source` or `note` nodes first.
+1. Record a hypothesis.
+2. Attach any necessary artifacts by reference.
+3. Open one experiment for the concrete slice being tested.
 
-### 3. Execute one experiment
+### 3. Execute
 
 1. Modify the worktree.
 2. Run the benchmark protocol.
-3. Close the experiment atomically.
+3. Close the experiment atomically with parsed metrics and an explicit verdict.
 
 ### 4. Judge and continue
 
-1. Mark the line accepted, kept, parked, or rejected.
-2. Archive dead ends instead of leaving them noisy and active.
-3. Repeat.
+1. Use `accepted`, `kept`, `parked`, and `rejected` honestly.
+2. Let the frontier brief summarize the current strategic state.
+3. Let historical tranche markdown live as artifacts when preservation matters.
 
 ## Benchmark Discipline
 
-For `libgrid`, the benchmark evidence needs to be structurally trustworthy.
-
-The MVP should always preserve at least:
+For `libgrid`, the minimum trustworthy record is:
 
 - run dimensions
 - primary metric
-- supporting metrics
-- command envelope
-
-This is the minimum needed to prevent "I think this was faster" folklore.
-
-## What The MVP Can Defer
-
-These are useful but not required for the first real dogfood loop:
-
-- strong markdown migration
-- multi-agent coordination
-- rich artifact bundling
-- pruning or vacuum passes beyond archive
-- UI-heavy analysis
-
-The right sequence is:
-
-1. start a clean front
-2. run new work through Fidget Spinner
-3. backfill old markdown only when it is worth the effort
-
-## Repo-Local Dogfood Before Libgrid
+- supporting metrics that materially explain the verdict
+- rationale
 
-This repository itself is a valid off-path dogfood target even though it is not
-a benchmark-heavy repo.
+This is the minimum needed to prevent “I think this was faster” folklore.
 
-That means we can already use it to test:
+## Active Metric Discipline
 
-- project initialization
-- schema visibility
-- frontier creation and status projection
-- off-path source recording
-- hidden annotations
-- MCP read and write flows
+`libgrid` will accumulate many niche metrics.
 
-What it cannot honestly test is heavy benchmark ingestion and the retrieval
-pressure that comes with it. That still belongs in a real optimization corpus
-such as the `libgrid` worktree.
+The hot path should care about live metrics only: the metrics touched by the
+active experimental frontier and its immediate comparison set. Old, situational
+metrics may remain in the registry without dominating `frontier.open`.
 
-## Acceptance Bar For Libgrid
+## Acceptance Bar
 
 Fidget Spinner is ready for serious `libgrid` use when:
 
-- an agent can run for hours without generating a giant markdown graveyard
-- the operator can identify accepted, kept, parked, and rejected lines mechanically
-- each completed experiment has result, note, and verdict
-- off-path side investigations stay preserved but do not pollute the core path
+- an agent can run for hours without generating a markdown graveyard
+- `frontier.open` gives a truthful, bounded orientation surface
+- active hypotheses and open experiments are obvious
+- closed experiments carry parsed metrics rather than prose-only results
+- artifacts preserve source texture without flooding the hot path
 - the system feels like a machine for evidence rather than a diary with better
   typography
-- 
cgit v1.2.3