diff options
| author | main <main@swarm.moe> | 2026-03-20 16:00:30 -0400 |
|---|---|---|
| committer | main <main@swarm.moe> | 2026-03-20 16:00:30 -0400 |
| commit | 9d63844f3a28fde70b19500422f17379e99e588a (patch) | |
| tree | 163cfbd65a8d3528346561410ef39eb1183a16f2 /docs | |
| parent | 22fe3d2ce7478450a1d7443c4ecbd85fd4c46716 (diff) | |
| download | fidget_spinner-9d63844f3a28fde70b19500422f17379e99e588a.zip | |
Refound Spinner as an austere frontier ledger
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/architecture.md | 578 | ||||
| -rw-r--r-- | docs/libgrid-dogfood.md | 199 | ||||
| -rw-r--r-- | docs/product-spec.md | 401 |
3 files changed, 326 insertions, 852 deletions
diff --git a/docs/architecture.md b/docs/architecture.md index e274ad5..2882c72 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,9 +1,8 @@ # Fidget Spinner Architecture -## Current Shape +## Runtime Shape -The current MVP implementation is intentionally narrower than the eventual full -product: +The current runtime is intentionally simple and hardened: ```text agent host @@ -22,21 +21,19 @@ spinner MCP host +-- disposable MCP worker | | | +-- per-project SQLite store - | +-- per-project blob directory - | +-- git/worktree introspection - | +-- atomic experiment closure + | +-- frontier / hypothesis / experiment / artifact services + | +-- navigator projections | v <project root>/.fidget_spinner/ ``` -There is no long-lived daemon yet. The first usable slice runs MCP from the CLI -binary, but it already follows the hardened host/worker split required for -long-lived sessions and safe replay behavior. +There is no long-lived daemon yet. The CLI binary owns the stdio host and the +local navigator. ## Package Boundary -The package currently contains three coupled layers: +The package contains three coupled crates: - `fidget-spinner-core` - `fidget-spinner-store-sqlite` @@ -47,7 +44,7 @@ And two bundled agent assets: - `assets/codex-skills/fidget-spinner/SKILL.md` - `assets/codex-skills/frontier-loop/SKILL.md` -Those parts should be treated as one release unit. +These are one release unit. ## Storage Topology @@ -56,524 +53,161 @@ Every initialized project owns a private state root: ```text <project root>/.fidget_spinner/ project.json - schema.json state.sqlite - blobs/ ``` Why this shape: -- schema freedom stays per project - migrations stay local - backup and portability stay simple -- we avoid premature pressure toward a single global schema +- no global store is required +- git remains the code substrate instead of being mirrored into Spinner -Cross-project search can come later as an additive index. +## Canonical Types -## State Layers +### Frontier -### 1. Global engine spine +Frontier is a scope and grounding object, not a graph vertex. -The engine depends on a stable, typed spine stored in SQLite: +It owns: -- nodes -- node annotations -- node edges -- frontiers -- runs -- metrics -- experiments -- event log - -This layer powers traversal, indexing, archiving, and frontier projection. - -### 2. Project payload layer - -Each node stores a project payload as JSON, namespaced and versioned by the -project schema in `.fidget_spinner/schema.json`. - -This is where domain-specific richness lives. - -Project field specs may optionally declare a light-touch `value_type` of: - -- `string` -- `numeric` -- `boolean` -- `timestamp` - -These are intentionally soft hints for validation and rendering, not rigid -engine-schema commitments. - -### 3. Annotation sidecar - -Annotations are stored separately from payload and are default-hidden unless -explicitly surfaced. - -That separation is important. It prevents free-form scratch text from silently -mutating into a shadow schema. - -## Validation Model - -Validation has three tiers. - -### Storage validity - -Hard-fail conditions: - -- malformed engine envelope -- broken ids -- invalid enum values -- broken relational integrity - -### Semantic quality - -Project field expectations are warning-heavy: - -- missing recommended fields emit diagnostics -- missing projection-gated fields remain storable -- mistyped typed fields emit diagnostics -- ingest usually succeeds - -### Operational eligibility - -Specific actions may refuse incomplete records. - -Examples: - -- core-path experiment closure requires complete run/result/note/verdict state -- future promotion helpers may require a projection-ready hypothesis payload +- label +- objective +- status +- brief -## SQLite Schema +And it partitions hypotheses and experiments. -### `nodes` +### Hypothesis -Stores the global node envelope: +Hypothesis is a true graph vertex. It carries: -- id -- class -- track -- frontier id -- archived flag - title - summary -- schema namespace -- schema version -- payload JSON -- diagnostics JSON -- agent session id -- timestamps - -### `node_annotations` - -Stores sidecar free-form annotations: - -- annotation id -- owning node id -- visibility -- optional label -- body -- created timestamp - -### `node_edges` - -Stores typed DAG edges: - -- source node id -- target node id -- edge kind +- exactly one paragraph of body +- tags +- influence parents -The current edge kinds are enough for the MVP: +### Experiment -- `lineage` -- `evidence` -- `comparison` -- `supersedes` -- `annotation` +Experiment is also a true graph vertex. It carries: -### `frontiers` - -Stores derived operational frontier records: - -- frontier id -- label -- root contract node id +- one mandatory owning hypothesis +- optional influence parents +- title +- summary +- tags - status -- timestamps - -Important constraint: - -- the root contract node itself also carries the same frontier id - -That keeps frontier filtering honest. - -### `runs` +- outcome when closed -Stores run envelopes: +The outcome contains: -- run id -- run node id -- frontier id - backend -- status -- run dimensions - command envelope -- started and finished timestamps - -### `metrics` - -Stores primary and supporting run metrics: - -- run id -- metric key -- value -- unit -- optimization objective - -### `experiments` - -Stores the atomic closure object for core-path work: - -- experiment id -- frontier id -- hypothesis node id -- run node id and run id -- optional analysis node id -- decision node id -- title -- summary +- run dimensions +- primary metric +- supporting metrics - verdict -- note payload -- created timestamp - -This table is the enforcement layer for frontier discipline. - -### `events` - -Stores durable audit events: - -- event id -- entity kind -- entity id -- event kind -- payload -- created timestamp - -## Core Types - -### Node classes - -Core path: - -- `contract` -- `hypothesis` -- `run` -- `analysis` -- `decision` - -Off path: - -- `source` -- `source` -- `note` - -### Node tracks - -- `core_path` -- `off_path` - -Track is derived from class, not operator whim. - -### Frontier projection - -The frontier projection currently exposes: - -- frontier record -- open experiment count -- completed experiment count -- verdict counts - -This projection is derived from canonical state and intentionally rebuildable. - -## Write Surfaces - -### Low-ceremony off-path writes - -These are intentionally cheap: - -- `note.quick`, but only with explicit tags from the repo-local registry -- `source.record`, optionally tagged into the same repo-local taxonomy -- generic `node.create` for escape-hatch use -- `node.annotate` - -### Low-ceremony core-path entry - -`hypothesis.record` exists to capture intent before worktree state becomes muddy. - -### Atomic core-path closure - -`experiment.close` is the important write path. - -It persists, in one transaction: - -- run node -- run record -- decision node -- experiment record -- lineage and evidence edges -- frontier touch and verdict accounting inputs - -That atomic boundary is the answer to the ceremony/atomicity pre-mortem. +- rationale +- optional analysis -## MCP Surface +### Artifact -The MVP MCP server is stdio-only and follows newline-delimited JSON-RPC message -framing. The public server is a stable host. It owns initialization state, -replay policy, telemetry, and host rollout. Execution happens in a disposable -worker subprocess. +Artifact is metadata plus a locator for an external thing. It attaches to +frontiers, hypotheses, and experiments. Spinner never reads or stores the +artifact body. -Presentation is orthogonal to payload detail: +## Graph Semantics -- `render=porcelain|json` -- `detail=concise|full` +Two relations matter: -Porcelain is the terse model-facing surface, not a pretty-printed JSON dump. +### Ownership -### Host responsibilities +Every experiment has exactly one owning hypothesis. -- own the public JSON-RPC session -- enforce initialize-before-use -- classify tools and resources by replay contract -- retry only explicitly safe operations after retryable worker faults -- expose health and telemetry -- re-exec the host binary while preserving initialization seed and counters +This is the canonical tree spine. -### Worker responsibilities +### Influence -- open the per-project store -- execute tool logic and resource reads -- return typed success or typed fault records -- remain disposable without losing canonical state +Hypotheses and experiments may both cite later hypotheses or experiments as +influence parents. -## Minimal Navigator +This is the sparse DAG over the canonical tree. -The CLI also exposes a minimal localhost navigator through `ui serve`. +The product should make the ownership spine easy to read and the influence +network available without flooding the hot path. -Current shape: +## SQLite Shape -- left rail of repo-local tags -- single linear node feed in reverse chronological order -- full entry rendering in the main pane -- lightweight hyperlinking for text fields -- typed field badges for `string`, `numeric`, `boolean`, and `timestamp` +The store is normalized around the new ontology: -This is intentionally not a full DAG canvas. It is a text-first operator window -over the canonical store. +- `frontiers` +- `frontier_briefs` +- `hypotheses` +- `experiments` +- `vertex_influences` +- `artifacts` +- `artifact_attachments` +- `metric_definitions` +- `run_dimension_definitions` +- `experiment_metrics` +- `events` -## Binding Bootstrap +The important boundary is this: -`project.bind` may bootstrap a project store when the requested target root is -an existing empty directory. +- hypotheses and experiments are the scientific ledger +- artifacts are reference sidecars +- frontier projections are derived -That is intentionally narrow: +## Presentation Model -- empty root: initialize and bind -- non-empty uninitialized root: fail -- existing store anywhere above the requested path: bind to that discovered root +The system is designed to be hostile to accidental context burn. -### Fault model +`frontier.open` is the only sanctioned overview dump. It should be enough to +answer: -Faults are typed by: +- where the frontier stands +- which tags are active +- which metrics are live +- which hypotheses are active +- which experiments are open -- kind: `invalid_input`, `not_initialized`, `transient`, `internal` -- stage: `host`, `worker`, `store`, `transport`, `protocol`, `rollout` +Everything after that should require deliberate traversal: -Those faults are surfaced both as JSON-RPC errors and as structured tool -errors, depending on call type. +- `hypothesis.read` +- `experiment.read` +- `artifact.read` -### Replay contracts +Artifact reads stay metadata-only by design. -The tool catalog explicitly marks each operation as one of: +## Replay Model -- `safe_replay` -- `never_replay` +The MCP host owns: -Current policy: +- the public JSON-RPC session +- initialize-before-use semantics +- replay contracts +- health and telemetry +- host rollout -- reads such as `project.status`, `project.schema`, `tag.list`, `frontier.list`, - `frontier.status`, `node.list`, `node.read`, `skill.list`, `skill.show`, and - resource reads - are safe to replay once after a retryable worker fault -- mutating tools such as `tag.add`, `frontier.init`, `node.create`, `hypothesis.record`, - `node.annotate`, `node.archive`, `note.quick`, `source.record`, and - `experiment.close` are never auto-replayed +The worker owns: -This is the hardening answer to side-effect safety. +- project-store access +- tool execution +- typed success and fault results -Implemented server features: +Reads and safe operational surfaces may be replayed after retryable worker +faults. Mutating operations are never auto-replayed unless they are explicitly +designed to be safe. -- tools -- resources +## Navigator -### Tools +The local navigator mirrors the same philosophy: -Implemented tools: +- root page lists frontiers +- frontier page is the only overview page +- hypothesis and experiment pages are detail reads +- artifacts are discoverable but never expanded into body dumps -- `system.health` -- `system.telemetry` -- `project.bind` -- `project.status` -- `project.schema` -- `schema.field.upsert` -- `schema.field.remove` -- `tag.add` -- `tag.list` -- `frontier.list` -- `frontier.status` -- `frontier.init` -- `node.create` -- `hypothesis.record` -- `node.list` -- `node.read` -- `node.annotate` -- `node.archive` -- `note.quick` -- `source.record` -- `metric.define` -- `metric.keys` -- `metric.best` -- `metric.migrate` -- `run.dimension.define` -- `run.dimension.list` -- `experiment.close` -- `skill.list` -- `skill.show` - -### Resources - -Implemented resources: - -- `fidget-spinner://project/config` -- `fidget-spinner://project/schema` -- `fidget-spinner://skill/fidget-spinner` -- `fidget-spinner://skill/frontier-loop` - -### Operational tools - -`system.health` returns a typed operational snapshot. Concise/default output -stays on immediate session state; full detail widens to the entire health -object: - -- initialization state -- binding state -- worker generation and liveness -- current executable path -- launch-path stability -- rollout-pending state -- last recorded fault in full detail - -`system.telemetry` returns cumulative counters: - -- requests -- successes -- errors -- retries -- worker restarts -- host rollouts -- last recorded fault -- per-operation counts and last latencies - -### Rollout model - -The host fingerprints its executable at startup. If the binary changes on disk, -or if a rollout is explicitly requested, the host re-execs itself after sending -the current response. The re-exec carries forward: - -- initialization seed -- project binding -- telemetry counters -- request id sequence -- worker generation -- one-shot rollout and crash-test markers - -This keeps the public session stable while still allowing hot binary replacement. - -## CLI Surface - -The CLI remains thin and operational. - -Current commands: - -- `init` -- `schema show` -- `schema upsert-field` -- `schema remove-field` -- `frontier init` -- `frontier status` -- `node add` -- `node list` -- `node show` -- `node annotate` -- `node archive` -- `note quick` -- `tag add` -- `tag list` -- `source add` -- `metric define` -- `metric keys` -- `metric best` -- `metric migrate` -- `dimension define` -- `dimension list` -- `experiment close` -- `mcp serve` -- `ui serve` -- hidden internal `mcp worker` -- `skill list` -- `skill install` -- `skill show` - -The CLI is not the strategic write plane, but it is the easiest repair and -bootstrap surface. Its naming is intentionally parallel but not identical to -the MCP surface: - -- CLI subcommands use spaces such as `schema upsert-field` and `dimension define` -- MCP tools use dotted names such as `schema.field.upsert` and `run.dimension.define` - -## Bundled Skill - -The bundled `fidget-spinner` and `frontier-loop` skills should -be treated as part of the product, not stray prompts. - -Their job is to teach agents: - -- DAG first -- schema first -- cheap off-path pushes -- disciplined core-path closure -- archive rather than delete -- and, for the frontier-loop specialization, how to run an indefinite push - -The asset lives in-tree so it can drift only via an explicit code change. - -## Full-Product Trajectory - -The full product should add, not replace, the MVP implementation. - -Planned next layers: - -- `spinnerd` as a long-lived local daemon -- HTTP and SSE -- read-mostly local UI -- runner orchestration beyond direct process execution -- interruption recovery and resumable long loops -- archive and pruning passes -- optional cross-project indexing - -The invariant for that future work is strict: - -- keep the DAG canonical -- keep frontier state derived -- keep project payloads local and flexible -- keep off-path writes cheap -- keep core-path closure atomic -- keep host-owned replay contracts explicit and auditable +The UI should help a model or operator walk the graph conservatively, not tempt +it into giant all-history feeds. diff --git a/docs/libgrid-dogfood.md b/docs/libgrid-dogfood.md index 206c4d7..9d81993 100644 --- a/docs/libgrid-dogfood.md +++ b/docs/libgrid-dogfood.md @@ -6,26 +6,19 @@ failure mode Fidget Spinner is designed to kill: - long autonomous optimization loops -- heavy worktree usage -- benchmark-driven decisions -- huge markdown logs that blur evidence, narrative, and verdicts +- heavy benchmark slicing +- worktree churn +- huge markdown logs that blur intervention, result, and verdict That is the proving ground. -## Immediate MVP Goal +## Immediate Goal -The MVP does not need to solve all of `libgrid`. +The goal is not “ingest every scrap of prose.” -It needs to solve this specific problem: - -replace the giant freeform experiment log with a machine in which the active -frontier, the accepted lines, the live evidence, and the dead ends are all -explicit and queryable. - -When using a global unbound MCP session from a `libgrid` worktree, the first -project-local action should be `project.bind` against the `libgrid` worktree -root or any nested path inside it. The session should not assume the MCP host's -own repo. +The goal is to replace the giant freeform experiment log with a machine in +which the active frontier, live hypotheses, current experiments, verdicts, and +best benchmark lines are explicit and queryable. ## Mapping Libgrid Work Into The Model @@ -33,163 +26,101 @@ own repo. One optimization objective becomes one frontier: -- improve MILP solve quality -- reduce wall-clock time -- reduce LP pressure -- improve node throughput -- improve best-bound quality - -### Contract node - -The root contract should state: - -- objective in plain language -- benchmark suite set -- primary metric -- supporting metrics -- promotion criteria - -### Change node - -Use `hypothesis.record` to capture: - -- what hypothesis is being tested -- what benchmark suite matters -- any terse sketch of the intended delta - -### Run node - -The run node should capture: - -- exact command -- cwd -- backend kind -- run dimensions -- resulting metrics +- root cash-out +- LP spend reduction +- primal improvement +- search throughput +- cut pipeline quality -### Decision node +The frontier brief should answer where the campaign stands right now, not dump +historical narrative. -The decision should make the verdict explicit: +### Hypothesis -- accepted -- kept -- parked -- rejected +A hypothesis should capture one concrete intervention claim: -### Off-path nodes +- terse title +- one-line summary +- one-paragraph body -Use these freely: +If the body wants to become a design memo, it is too large. -- `source` for ideas, external references, algorithm sketches -- `source` for scaffolding that is not yet a benchmarked experiment -- `note` for quick observations +### Experiment -This is how the system avoids forcing every useful thought into experiment -closure. +Each measured slice becomes one experiment under exactly one hypothesis. -## Suggested Libgrid Project Schema +The experiment closes with: -The `libgrid` project should eventually define richer payload conventions in -`.fidget_spinner/schema.json`. - -The MVP does not need hard rejection. It does need meaningful warnings. - -Good first project fields: +- dimensions such as `instance`, `profile`, `duration_s` +- primary metric +- supporting metrics +- verdict: `accepted | kept | parked | rejected` +- rationale +- optional analysis -- `hypothesis` on `hypothesis` -- `benchmark_suite` on `hypothesis` and `run` -- `body` on `hypothesis`, `source`, and `note` -- `comparison_claim` on `analysis` -- `rationale` on `decision` +If a tranche doc reports multiple benchmark slices, it should become multiple +experiments, not one prose blob. -Good first metric vocabulary: +### Artifact -- `wall_clock_s` -- `solved_instance_count` -- `nodes_expanded` -- `best_bound_delta` -- `lp_calls` -- `memory_bytes` +Historical markdown, logs, tables, and other large dumps should be attached as +artifacts by reference when they matter. They should not live in the ledger as +default-enumerated prose. -## Libgrid MVP Workflow +## Libgrid Workflow -### 1. Seed the frontier +### 1. Ground -1. Initialize the project store. -2. Create a frontier contract. +1. Bind the MCP to the libgrid worktree. +2. Read `frontier.open`. +3. Decide whether the next move is a new hypothesis, a new experiment on an + existing hypothesis, or a frontier brief update. ### 2. Start a line of attack -1. Read the current frontier and the recent DAG tail. -2. Record a `hypothesis`. -3. If needed, attach off-path `source` or `note` nodes first. +1. Record a hypothesis. +2. Attach any necessary artifacts by reference. +3. Open one experiment for the concrete slice being tested. -### 3. Execute one experiment +### 3. Execute 1. Modify the worktree. 2. Run the benchmark protocol. -3. Close the experiment atomically. +3. Close the experiment atomically with parsed metrics and an explicit verdict. ### 4. Judge and continue -1. Mark the line accepted, kept, parked, or rejected. -2. Archive dead ends instead of leaving them noisy and active. -3. Repeat. +1. Use `accepted`, `kept`, `parked`, and `rejected` honestly. +2. Let the frontier brief summarize the current strategic state. +3. Let historical tranche markdown live as artifacts when preservation matters. ## Benchmark Discipline -For `libgrid`, the benchmark evidence needs to be structurally trustworthy. - -The MVP should always preserve at least: +For `libgrid`, the minimum trustworthy record is: - run dimensions - primary metric -- supporting metrics -- command envelope - -This is the minimum needed to prevent "I think this was faster" folklore. - -## What The MVP Can Defer - -These are useful but not required for the first real dogfood loop: - -- strong markdown migration -- multi-agent coordination -- rich artifact bundling -- pruning or vacuum passes beyond archive -- UI-heavy analysis - -The right sequence is: - -1. start a clean front -2. run new work through Fidget Spinner -3. backfill old markdown only when it is worth the effort - -## Repo-Local Dogfood Before Libgrid +- supporting metrics that materially explain the verdict +- rationale -This repository itself is a valid off-path dogfood target even though it is not -a benchmark-heavy repo. +This is the minimum needed to prevent “I think this was faster” folklore. -That means we can already use it to test: +## Active Metric Discipline -- project initialization -- schema visibility -- frontier creation and status projection -- off-path source recording -- hidden annotations -- MCP read and write flows +`libgrid` will accumulate many niche metrics. -What it cannot honestly test is heavy benchmark ingestion and the retrieval -pressure that comes with it. That still belongs in a real optimization corpus -such as the `libgrid` worktree. +The hot path should care about live metrics only: the metrics touched by the +active experimental frontier and its immediate comparison set. Old, situational +metrics may remain in the registry without dominating `frontier.open`. -## Acceptance Bar For Libgrid +## Acceptance Bar Fidget Spinner is ready for serious `libgrid` use when: -- an agent can run for hours without generating a giant markdown graveyard -- the operator can identify accepted, kept, parked, and rejected lines mechanically -- each completed experiment has result, note, and verdict -- off-path side investigations stay preserved but do not pollute the core path +- an agent can run for hours without generating a markdown graveyard +- `frontier.open` gives a truthful, bounded orientation surface +- active hypotheses and open experiments are obvious +- closed experiments carry parsed metrics rather than prose-only results +- artifacts preserve source texture without flooding the hot path - the system feels like a machine for evidence rather than a diary with better typography diff --git a/docs/product-spec.md b/docs/product-spec.md index 85561ad..ce881c6 100644 --- a/docs/product-spec.md +++ b/docs/product-spec.md @@ -2,341 +2,250 @@ ## Thesis -Fidget Spinner is a local-first, agent-first frontier machine for autonomous -program optimization, source capture, and experiment adjudication. +Fidget Spinner is a local-first, agent-first frontier ledger for autonomous +optimization work. -The immediate target is brutally practical: replace gigantic freeform -experiment markdown with a machine that preserves evidence as structure. +It is not a notebook. It is not a generic DAG memory. It is not an inner +platform for git. It is a hard experimental spine whose job is to preserve +scientific truth with enough structure that agents can resume work without +reconstructing everything from prose. The package is deliberately two things at once: -- a local MCP-backed DAG substrate -- bundled skills that teach agents how to drive that substrate +- a local MCP-backed frontier ledger +- bundled skills that teach agents how to drive that ledger -Those two halves should be versioned together and treated as one product. +Those two halves are one product and should be versioned together. ## Product Position -This is not a hosted lab notebook. +This is a machine for long-running frontier work in local repos. -This is not a cloud compute marketplace. +Humans and agents should be able to answer: -This is not a collaboration shell with experiments bolted on. +- what frontier is active +- which hypotheses are live +- which experiments are still open +- what the latest accepted, kept, parked, and rejected outcomes are +- which metrics matter right now -This is a local machine for indefinite frontier pushes, with agents as primary -writers and humans as auditors, reviewers, and occasional editors. +without opening a markdown graveyard. ## Non-Goals These are explicitly out of scope for the core product: -- OAuth - hosted identity - cloud tenancy -- billing, credits, and subscriptions -- managed provider brokerage +- billing or credits - chat as the system of record - mandatory remote control planes - replacing git +- storing or rendering large artifact bodies -Git remains the code substrate. Fidget Spinner is the evidence substrate. +Git remains the code substrate. Fidget Spinner is the experimental ledger. ## Locked Design Decisions -These are the load-bearing decisions to hold fixed through the MVP push. +### 1. The ledger is austere -### 1. The DAG is canonical truth +The only freeform overview surface is the frontier brief, read through +`frontier.open`. -The canonical record is the DAG plus its normalized supporting tables. +Everything else should require deliberate traversal one selector at a time. +Slow is better than burning tokens on giant feeds. -Frontier state is not a rival authority. It is a derived, rebuildable -projection over the DAG and related run/experiment records. +### 2. The ontology is small -### 2. Storage is per-project +The canonical object families are: -Each project owns its own local store under: - -```text -<project root>/.fidget_spinner/ - state.sqlite - project.json - schema.json - blobs/ -``` - -There is no mandatory global database in the MVP. - -### 3. Node structure is layered - -Every node has three layers: - -- a hard global envelope for indexing and traversal -- a project-local structured payload -- free-form sidecar annotations as an escape hatch - -The engine only hard-depends on the envelope. Project payloads remain flexible. - -### 4. Validation is warning-heavy - -Engine integrity is hard-validated. - -Project semantics are diagnostically validated. - -Workflow eligibility is action-gated. - -In other words: - -- bad engine state is rejected -- incomplete project payloads are usually admitted with diagnostics -- projections and frontier actions may refuse incomplete nodes later +- `frontier` +- `hypothesis` +- `experiment` +- `artifact` -### 5. Core-path and off-path work must diverge +There are no canonical `note` or `source` ledger nodes. -Core-path work is disciplined and atomic. +### 3. Frontier is scope, not a graph vertex -Off-path work is cheap and permissive. +A frontier is a named scope and grounding object. It owns: -The point is to avoid forcing every scrap of source digestion or note-taking through the full -benchmark/decision bureaucracy while still preserving it in the DAG. +- objective +- status +- brief -### 6. Completed core-path experiments are atomic +And it partitions hypotheses and experiments. -A completed experiment exists only when all of these exist together: +### 4. Hypothesis and experiment are the true graph vertices -- measured result -- terse note -- explicit verdict +A hypothesis is a terse intervention claim. -The write surface should make that one atomic mutation, not a loose sequence of -low-level calls. +An experiment is a stateful scientific record. Every experiment has: -## Node Model +- one mandatory owning hypothesis +- optional influence parents drawn from hypotheses or experiments -### Global envelope +This gives the product a canonical tree spine plus a sparse influence network. -The hard spine should be stable across projects. It includes at least: +### 5. Artifacts are references only -- node id -- node class -- node track -- frontier id if any -- archived flag -- title -- summary -- schema namespace and version -- timestamps -- diagnostics -- hidden or visible annotations +Artifacts are metadata plus locators for external material: -This is the engine layer: the part that powers indexing, traversal, archiving, -default enumeration, and model-facing summaries. +- files +- links +- logs +- tables +- plots +- dumps +- bibliographies -### Project-local payload +Spinner never reads artifact bodies. If a wall of text matters, attach it as an +artifact and summarize the operational truth elsewhere. -Every project may define richer payload fields in: +### 6. Experiment closure is atomic -`<project root>/.fidget_spinner/schema.json` +A closed experiment exists only when all of these exist together: -That file is a model-facing contract. It defines field names and soft -validation tiers without forcing global schema churn. +- dimensions +- primary metric +- verdict +- rationale +- optional supporting metrics +- optional analysis -Per-field settings should express at least: +Closing an experiment is one atomic mutation, not a loose pile of lower-level +writes. -- presence: `required`, `recommended`, `optional` -- severity: `error`, `warning`, `info` -- role: `index`, `projection_gate`, `render_only`, `opaque` -- inference policy: whether the model may infer the field +### 7. Live metrics are derived -These settings are advisory at ingest time and stricter at projection/action -time. +The hot-path metric surface is not “all metrics that have ever existed.” -### Free-form annotations +The hot-path metric surface is the derived live set for the active frontier. +That set should stay small, frontier-relevant, and queryable. -Any node may carry free-form annotations. +## Canonical Data Model -These are explicitly sidecar, not primary payload. They are: +### Frontier -- allowed everywhere -- hidden from default enumeration -- useful as a scratchpad or escape hatch -- not allowed to become the only home of critical operational truth +Frontier is a scope/partition object with one mutable brief. -If a fact matters to automation, comparison, or promotion, it must migrate into -the spine or project payload. +The brief is the sanctioned grounding object. It should stay short and answer: -## Node Taxonomy +- situation +- roadmap +- unknowns -### Core-path node classes +### Hypothesis -These are the disciplined frontier-loop classes: +A hypothesis is a disciplined claim: -- `contract` -- `hypothesis` -- `run` -- `analysis` -- `decision` +- title +- summary +- exactly one paragraph of body +- tags +- influence parents -### Off-path node classes +It is not a design doc and not a catch-all prose bucket. -These are deliberately low-ceremony: +### Experiment -- `source` -- `source` -- `note` +An experiment is a stateful object: -They exist so the product can absorb real thinking instead of forcing users and -agents back into sprawling markdown. +- open while the work is live +- closed when the result is in -## Frontier Model +A closed experiment stores: -The frontier is a derived operational view over the canonical DAG. +- dimensions +- primary metric +- supporting metrics +- verdict: `accepted | kept | parked | rejected` +- rationale +- optional analysis +- attached artifacts -It answers: +### Artifact -- what objective is active -- how many experiments are open -- how many experiments are completed -- how the verdict mix currently breaks down +Artifacts preserve external material by reference. They are deliberately off the +token hot path. Artifact metadata should be enough to discover the thing; the +body lives elsewhere. -The DAG answers: +## Token Discipline -- what changed -- what ran -- what evidence was collected -- what was concluded -- what dead ends and side investigations exist +`frontier.open` is the only sanctioned overview dump. It should return: -That split is deliberate. It prevents "frontier state" from turning into a -second unofficial database. +- frontier brief +- active tags +- live metric keys +- active hypotheses with deduped current state +- open experiments -## First Usable MVP +After that, the model should walk explicitly: -The first usable MVP is the first cut that can already replace a meaningful -slice of the markdown habit without pretending the whole full-product vision is -done. +- `hypothesis.read` +- `experiment.read` +- `artifact.read` -### MVP deliverables +No broad list surface should dump large prose. Artifact bodies are never in the +MCP path. -- per-project `.fidget_spinner/` state -- local SQLite backing store -- local blob directory -- typed Rust core model -- optional light-touch project field types: `string`, `numeric`, `boolean`, `timestamp` -- thin CLI for bootstrap and repair -- hardened stdio MCP host exposed from the CLI -- minimal read-only web navigator with tag filtering and linear node rendering -- disposable MCP worker execution runtime -- bundled `fidget-spinner` base skill -- bundled `frontier-loop` skill -- low-ceremony off-path note and source recording -- explicit experiment open/close lifecycle for the core path +## Storage -### Explicitly deferred from the MVP +Every project owns a private state root: -- long-lived `spinnerd` -- web UI -- remote runners -- multi-agent hardening -- aggressive pruning and vacuuming -- strong markdown migration tooling -- cross-project indexing +```text +<project root>/.fidget_spinner/ + project.json + state.sqlite +``` -### MVP model-facing surface +There is no required global database. -The model-facing surface is a local MCP server oriented around frontier work. +## MVP Surface -The initial tools should be: +The current model-facing surface is: - `system.health` - `system.telemetry` - `project.bind` - `project.status` -- `project.schema` - `tag.add` - `tag.list` +- `frontier.create` - `frontier.list` -- `frontier.status` -- `frontier.init` -- `node.create` +- `frontier.read` +- `frontier.open` +- `frontier.brief.update` +- `frontier.history` - `hypothesis.record` -- `node.list` -- `node.read` -- `node.annotate` -- `node.archive` -- `note.quick` -- `source.record` +- `hypothesis.list` +- `hypothesis.read` +- `hypothesis.update` +- `hypothesis.history` - `experiment.open` - `experiment.list` - `experiment.read` +- `experiment.update` - `experiment.close` -- `skill.list` -- `skill.show` - -The important point is not the exact names. The important point is the shape: - -- cheap read access to project and frontier context -- cheap off-path writes -- low-ceremony hypothesis capture -- one explicit experiment-open step plus one experiment-close step -- explicit operational introspection for long-lived agent sessions -- explicit replay boundaries so side effects are never duplicated by accident - -### MVP skill posture - -The bundled skills should instruct agents to: - -1. inspect `system.health` first -2. bind the MCP session to the target project before project-local reads or writes -3. read project schema, tag registry, and frontier state -4. pull context from the DAG instead of giant prose dumps -5. use `note.quick` and `source.record` freely off path, but always pass an explicit tag list for notes -6. use `hypothesis.record` before worktree thrash becomes ambiguous -7. use `experiment.open` before running a live hypothesis-owned line -8. use `experiment.close` to seal that line with measured evidence -9. archive detritus instead of deleting it -10. use the base `fidget-spinner` skill for ordinary DAG work and add - `frontier-loop` only when the task becomes a true autonomous frontier push - -### MVP acceptance bar - -The MVP is successful when: - -- a project can be initialized locally with no hosted dependencies -- an agent can inspect frontier state through MCP -- an agent can inspect MCP health and telemetry through MCP -- an agent can record off-path sources and notes without bureaucratic pain -- the project schema can softly declare whether payload fields are strings, numbers, booleans, or timestamps -- an operator can inspect recent nodes through a minimal localhost web navigator filtered by tag -- a project can close a real core-path experiment atomically -- retryable worker faults do not duplicate side effects -- stale nodes can be archived instead of polluting normal enumeration -- a human can answer "what was tried, what ran, what was accepted or parked, - and why?" without doing markdown archaeology - -## Full Product - -The full product grows outward from the MVP rather than replacing it. - -### Planned additions - -- `spinnerd` as a long-lived local daemon -- local HTTP and SSE -- read-mostly graph and run inspection UI -- richer artifact handling -- model-driven pruning and archive passes -- stronger interruption recovery -- local runner backends beyond direct process execution -- optional global indexing across projects -- import/export and subgraph packaging - -### Invariant for all later stages - -No future layer should invalidate the MVP spine: - -- DAG canonical -- frontier derived -- project-local store -- layered node model -- warning-heavy schema validation -- cheap off-path writes -- atomic core-path closure +- `experiment.history` +- `artifact.record` +- `artifact.list` +- `artifact.read` +- `artifact.update` +- `artifact.history` +- `metric.define` +- `metric.keys` +- `metric.best` +- `run.dimension.define` +- `run.dimension.list` + +## Explicitly Deferred + +Still out of scope: + +- remote runners +- hosted multi-user control planes +- broad artifact ingestion +- reading artifact bodies through Spinner +- giant auto-generated context dumps +- replacing git or reconstructing git inside the ledger |