From 9d63844f3a28fde70b19500422f17379e99e588a Mon Sep 17 00:00:00 2001 From: main Date: Fri, 20 Mar 2026 16:00:30 -0400 Subject: Refound Spinner as an austere frontier ledger --- docs/libgrid-dogfood.md | 199 ++++++++++++++++-------------------------------- 1 file changed, 65 insertions(+), 134 deletions(-) (limited to 'docs/libgrid-dogfood.md') diff --git a/docs/libgrid-dogfood.md b/docs/libgrid-dogfood.md index 206c4d7..9d81993 100644 --- a/docs/libgrid-dogfood.md +++ b/docs/libgrid-dogfood.md @@ -6,26 +6,19 @@ failure mode Fidget Spinner is designed to kill: - long autonomous optimization loops -- heavy worktree usage -- benchmark-driven decisions -- huge markdown logs that blur evidence, narrative, and verdicts +- heavy benchmark slicing +- worktree churn +- huge markdown logs that blur intervention, result, and verdict That is the proving ground. -## Immediate MVP Goal +## Immediate Goal -The MVP does not need to solve all of `libgrid`. +The goal is not “ingest every scrap of prose.” -It needs to solve this specific problem: - -replace the giant freeform experiment log with a machine in which the active -frontier, the accepted lines, the live evidence, and the dead ends are all -explicit and queryable. - -When using a global unbound MCP session from a `libgrid` worktree, the first -project-local action should be `project.bind` against the `libgrid` worktree -root or any nested path inside it. The session should not assume the MCP host's -own repo. +The goal is to replace the giant freeform experiment log with a machine in +which the active frontier, live hypotheses, current experiments, verdicts, and +best benchmark lines are explicit and queryable. ## Mapping Libgrid Work Into The Model @@ -33,163 +26,101 @@ own repo. One optimization objective becomes one frontier: -- improve MILP solve quality -- reduce wall-clock time -- reduce LP pressure -- improve node throughput -- improve best-bound quality - -### Contract node - -The root contract should state: - -- objective in plain language -- benchmark suite set -- primary metric -- supporting metrics -- promotion criteria - -### Change node - -Use `hypothesis.record` to capture: - -- what hypothesis is being tested -- what benchmark suite matters -- any terse sketch of the intended delta - -### Run node - -The run node should capture: - -- exact command -- cwd -- backend kind -- run dimensions -- resulting metrics +- root cash-out +- LP spend reduction +- primal improvement +- search throughput +- cut pipeline quality -### Decision node +The frontier brief should answer where the campaign stands right now, not dump +historical narrative. -The decision should make the verdict explicit: +### Hypothesis -- accepted -- kept -- parked -- rejected +A hypothesis should capture one concrete intervention claim: -### Off-path nodes +- terse title +- one-line summary +- one-paragraph body -Use these freely: +If the body wants to become a design memo, it is too large. -- `source` for ideas, external references, algorithm sketches -- `source` for scaffolding that is not yet a benchmarked experiment -- `note` for quick observations +### Experiment -This is how the system avoids forcing every useful thought into experiment -closure. +Each measured slice becomes one experiment under exactly one hypothesis. -## Suggested Libgrid Project Schema +The experiment closes with: -The `libgrid` project should eventually define richer payload conventions in -`.fidget_spinner/schema.json`. - -The MVP does not need hard rejection. It does need meaningful warnings. - -Good first project fields: +- dimensions such as `instance`, `profile`, `duration_s` +- primary metric +- supporting metrics +- verdict: `accepted | kept | parked | rejected` +- rationale +- optional analysis -- `hypothesis` on `hypothesis` -- `benchmark_suite` on `hypothesis` and `run` -- `body` on `hypothesis`, `source`, and `note` -- `comparison_claim` on `analysis` -- `rationale` on `decision` +If a tranche doc reports multiple benchmark slices, it should become multiple +experiments, not one prose blob. -Good first metric vocabulary: +### Artifact -- `wall_clock_s` -- `solved_instance_count` -- `nodes_expanded` -- `best_bound_delta` -- `lp_calls` -- `memory_bytes` +Historical markdown, logs, tables, and other large dumps should be attached as +artifacts by reference when they matter. They should not live in the ledger as +default-enumerated prose. -## Libgrid MVP Workflow +## Libgrid Workflow -### 1. Seed the frontier +### 1. Ground -1. Initialize the project store. -2. Create a frontier contract. +1. Bind the MCP to the libgrid worktree. +2. Read `frontier.open`. +3. Decide whether the next move is a new hypothesis, a new experiment on an + existing hypothesis, or a frontier brief update. ### 2. Start a line of attack -1. Read the current frontier and the recent DAG tail. -2. Record a `hypothesis`. -3. If needed, attach off-path `source` or `note` nodes first. +1. Record a hypothesis. +2. Attach any necessary artifacts by reference. +3. Open one experiment for the concrete slice being tested. -### 3. Execute one experiment +### 3. Execute 1. Modify the worktree. 2. Run the benchmark protocol. -3. Close the experiment atomically. +3. Close the experiment atomically with parsed metrics and an explicit verdict. ### 4. Judge and continue -1. Mark the line accepted, kept, parked, or rejected. -2. Archive dead ends instead of leaving them noisy and active. -3. Repeat. +1. Use `accepted`, `kept`, `parked`, and `rejected` honestly. +2. Let the frontier brief summarize the current strategic state. +3. Let historical tranche markdown live as artifacts when preservation matters. ## Benchmark Discipline -For `libgrid`, the benchmark evidence needs to be structurally trustworthy. - -The MVP should always preserve at least: +For `libgrid`, the minimum trustworthy record is: - run dimensions - primary metric -- supporting metrics -- command envelope - -This is the minimum needed to prevent "I think this was faster" folklore. - -## What The MVP Can Defer - -These are useful but not required for the first real dogfood loop: - -- strong markdown migration -- multi-agent coordination -- rich artifact bundling -- pruning or vacuum passes beyond archive -- UI-heavy analysis - -The right sequence is: - -1. start a clean front -2. run new work through Fidget Spinner -3. backfill old markdown only when it is worth the effort - -## Repo-Local Dogfood Before Libgrid +- supporting metrics that materially explain the verdict +- rationale -This repository itself is a valid off-path dogfood target even though it is not -a benchmark-heavy repo. +This is the minimum needed to prevent “I think this was faster” folklore. -That means we can already use it to test: +## Active Metric Discipline -- project initialization -- schema visibility -- frontier creation and status projection -- off-path source recording -- hidden annotations -- MCP read and write flows +`libgrid` will accumulate many niche metrics. -What it cannot honestly test is heavy benchmark ingestion and the retrieval -pressure that comes with it. That still belongs in a real optimization corpus -such as the `libgrid` worktree. +The hot path should care about live metrics only: the metrics touched by the +active experimental frontier and its immediate comparison set. Old, situational +metrics may remain in the registry without dominating `frontier.open`. -## Acceptance Bar For Libgrid +## Acceptance Bar Fidget Spinner is ready for serious `libgrid` use when: -- an agent can run for hours without generating a giant markdown graveyard -- the operator can identify accepted, kept, parked, and rejected lines mechanically -- each completed experiment has result, note, and verdict -- off-path side investigations stay preserved but do not pollute the core path +- an agent can run for hours without generating a markdown graveyard +- `frontier.open` gives a truthful, bounded orientation surface +- active hypotheses and open experiments are obvious +- closed experiments carry parsed metrics rather than prose-only results +- artifacts preserve source texture without flooding the hot path - the system feels like a machine for evidence rather than a diary with better typography -- cgit v1.2.3