swarm repositories / source
aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/architecture.md578
-rw-r--r--docs/libgrid-dogfood.md199
-rw-r--r--docs/product-spec.md401
3 files changed, 326 insertions, 852 deletions
diff --git a/docs/architecture.md b/docs/architecture.md
index e274ad5..2882c72 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,9 +1,8 @@
# Fidget Spinner Architecture
-## Current Shape
+## Runtime Shape
-The current MVP implementation is intentionally narrower than the eventual full
-product:
+The current runtime is intentionally simple and hardened:
```text
agent host
@@ -22,21 +21,19 @@ spinner MCP host
+-- disposable MCP worker
| |
| +-- per-project SQLite store
- | +-- per-project blob directory
- | +-- git/worktree introspection
- | +-- atomic experiment closure
+ | +-- frontier / hypothesis / experiment / artifact services
+ | +-- navigator projections
|
v
<project root>/.fidget_spinner/
```
-There is no long-lived daemon yet. The first usable slice runs MCP from the CLI
-binary, but it already follows the hardened host/worker split required for
-long-lived sessions and safe replay behavior.
+There is no long-lived daemon yet. The CLI binary owns the stdio host and the
+local navigator.
## Package Boundary
-The package currently contains three coupled layers:
+The package contains three coupled crates:
- `fidget-spinner-core`
- `fidget-spinner-store-sqlite`
@@ -47,7 +44,7 @@ And two bundled agent assets:
- `assets/codex-skills/fidget-spinner/SKILL.md`
- `assets/codex-skills/frontier-loop/SKILL.md`
-Those parts should be treated as one release unit.
+These are one release unit.
## Storage Topology
@@ -56,524 +53,161 @@ Every initialized project owns a private state root:
```text
<project root>/.fidget_spinner/
project.json
- schema.json
state.sqlite
- blobs/
```
Why this shape:
-- schema freedom stays per project
- migrations stay local
- backup and portability stay simple
-- we avoid premature pressure toward a single global schema
+- no global store is required
+- git remains the code substrate instead of being mirrored into Spinner
-Cross-project search can come later as an additive index.
+## Canonical Types
-## State Layers
+### Frontier
-### 1. Global engine spine
+Frontier is a scope and grounding object, not a graph vertex.
-The engine depends on a stable, typed spine stored in SQLite:
+It owns:
-- nodes
-- node annotations
-- node edges
-- frontiers
-- runs
-- metrics
-- experiments
-- event log
-
-This layer powers traversal, indexing, archiving, and frontier projection.
-
-### 2. Project payload layer
-
-Each node stores a project payload as JSON, namespaced and versioned by the
-project schema in `.fidget_spinner/schema.json`.
-
-This is where domain-specific richness lives.
-
-Project field specs may optionally declare a light-touch `value_type` of:
-
-- `string`
-- `numeric`
-- `boolean`
-- `timestamp`
-
-These are intentionally soft hints for validation and rendering, not rigid
-engine-schema commitments.
-
-### 3. Annotation sidecar
-
-Annotations are stored separately from payload and are default-hidden unless
-explicitly surfaced.
-
-That separation is important. It prevents free-form scratch text from silently
-mutating into a shadow schema.
-
-## Validation Model
-
-Validation has three tiers.
-
-### Storage validity
-
-Hard-fail conditions:
-
-- malformed engine envelope
-- broken ids
-- invalid enum values
-- broken relational integrity
-
-### Semantic quality
-
-Project field expectations are warning-heavy:
-
-- missing recommended fields emit diagnostics
-- missing projection-gated fields remain storable
-- mistyped typed fields emit diagnostics
-- ingest usually succeeds
-
-### Operational eligibility
-
-Specific actions may refuse incomplete records.
-
-Examples:
-
-- core-path experiment closure requires complete run/result/note/verdict state
-- future promotion helpers may require a projection-ready hypothesis payload
+- label
+- objective
+- status
+- brief
-## SQLite Schema
+And it partitions hypotheses and experiments.
-### `nodes`
+### Hypothesis
-Stores the global node envelope:
+Hypothesis is a true graph vertex. It carries:
-- id
-- class
-- track
-- frontier id
-- archived flag
- title
- summary
-- schema namespace
-- schema version
-- payload JSON
-- diagnostics JSON
-- agent session id
-- timestamps
-
-### `node_annotations`
-
-Stores sidecar free-form annotations:
-
-- annotation id
-- owning node id
-- visibility
-- optional label
-- body
-- created timestamp
-
-### `node_edges`
-
-Stores typed DAG edges:
-
-- source node id
-- target node id
-- edge kind
+- exactly one paragraph of body
+- tags
+- influence parents
-The current edge kinds are enough for the MVP:
+### Experiment
-- `lineage`
-- `evidence`
-- `comparison`
-- `supersedes`
-- `annotation`
+Experiment is also a true graph vertex. It carries:
-### `frontiers`
-
-Stores derived operational frontier records:
-
-- frontier id
-- label
-- root contract node id
+- one mandatory owning hypothesis
+- optional influence parents
+- title
+- summary
+- tags
- status
-- timestamps
-
-Important constraint:
-
-- the root contract node itself also carries the same frontier id
-
-That keeps frontier filtering honest.
-
-### `runs`
+- outcome when closed
-Stores run envelopes:
+The outcome contains:
-- run id
-- run node id
-- frontier id
- backend
-- status
-- run dimensions
- command envelope
-- started and finished timestamps
-
-### `metrics`
-
-Stores primary and supporting run metrics:
-
-- run id
-- metric key
-- value
-- unit
-- optimization objective
-
-### `experiments`
-
-Stores the atomic closure object for core-path work:
-
-- experiment id
-- frontier id
-- hypothesis node id
-- run node id and run id
-- optional analysis node id
-- decision node id
-- title
-- summary
+- run dimensions
+- primary metric
+- supporting metrics
- verdict
-- note payload
-- created timestamp
-
-This table is the enforcement layer for frontier discipline.
-
-### `events`
-
-Stores durable audit events:
-
-- event id
-- entity kind
-- entity id
-- event kind
-- payload
-- created timestamp
-
-## Core Types
-
-### Node classes
-
-Core path:
-
-- `contract`
-- `hypothesis`
-- `run`
-- `analysis`
-- `decision`
-
-Off path:
-
-- `source`
-- `source`
-- `note`
-
-### Node tracks
-
-- `core_path`
-- `off_path`
-
-Track is derived from class, not operator whim.
-
-### Frontier projection
-
-The frontier projection currently exposes:
-
-- frontier record
-- open experiment count
-- completed experiment count
-- verdict counts
-
-This projection is derived from canonical state and intentionally rebuildable.
-
-## Write Surfaces
-
-### Low-ceremony off-path writes
-
-These are intentionally cheap:
-
-- `note.quick`, but only with explicit tags from the repo-local registry
-- `source.record`, optionally tagged into the same repo-local taxonomy
-- generic `node.create` for escape-hatch use
-- `node.annotate`
-
-### Low-ceremony core-path entry
-
-`hypothesis.record` exists to capture intent before worktree state becomes muddy.
-
-### Atomic core-path closure
-
-`experiment.close` is the important write path.
-
-It persists, in one transaction:
-
-- run node
-- run record
-- decision node
-- experiment record
-- lineage and evidence edges
-- frontier touch and verdict accounting inputs
-
-That atomic boundary is the answer to the ceremony/atomicity pre-mortem.
+- rationale
+- optional analysis
-## MCP Surface
+### Artifact
-The MVP MCP server is stdio-only and follows newline-delimited JSON-RPC message
-framing. The public server is a stable host. It owns initialization state,
-replay policy, telemetry, and host rollout. Execution happens in a disposable
-worker subprocess.
+Artifact is metadata plus a locator for an external thing. It attaches to
+frontiers, hypotheses, and experiments. Spinner never reads or stores the
+artifact body.
-Presentation is orthogonal to payload detail:
+## Graph Semantics
-- `render=porcelain|json`
-- `detail=concise|full`
+Two relations matter:
-Porcelain is the terse model-facing surface, not a pretty-printed JSON dump.
+### Ownership
-### Host responsibilities
+Every experiment has exactly one owning hypothesis.
-- own the public JSON-RPC session
-- enforce initialize-before-use
-- classify tools and resources by replay contract
-- retry only explicitly safe operations after retryable worker faults
-- expose health and telemetry
-- re-exec the host binary while preserving initialization seed and counters
+This is the canonical tree spine.
-### Worker responsibilities
+### Influence
-- open the per-project store
-- execute tool logic and resource reads
-- return typed success or typed fault records
-- remain disposable without losing canonical state
+Hypotheses and experiments may both cite later hypotheses or experiments as
+influence parents.
-## Minimal Navigator
+This is the sparse DAG over the canonical tree.
-The CLI also exposes a minimal localhost navigator through `ui serve`.
+The product should make the ownership spine easy to read and the influence
+network available without flooding the hot path.
-Current shape:
+## SQLite Shape
-- left rail of repo-local tags
-- single linear node feed in reverse chronological order
-- full entry rendering in the main pane
-- lightweight hyperlinking for text fields
-- typed field badges for `string`, `numeric`, `boolean`, and `timestamp`
+The store is normalized around the new ontology:
-This is intentionally not a full DAG canvas. It is a text-first operator window
-over the canonical store.
+- `frontiers`
+- `frontier_briefs`
+- `hypotheses`
+- `experiments`
+- `vertex_influences`
+- `artifacts`
+- `artifact_attachments`
+- `metric_definitions`
+- `run_dimension_definitions`
+- `experiment_metrics`
+- `events`
-## Binding Bootstrap
+The important boundary is this:
-`project.bind` may bootstrap a project store when the requested target root is
-an existing empty directory.
+- hypotheses and experiments are the scientific ledger
+- artifacts are reference sidecars
+- frontier projections are derived
-That is intentionally narrow:
+## Presentation Model
-- empty root: initialize and bind
-- non-empty uninitialized root: fail
-- existing store anywhere above the requested path: bind to that discovered root
+The system is designed to be hostile to accidental context burn.
-### Fault model
+`frontier.open` is the only sanctioned overview dump. It should be enough to
+answer:
-Faults are typed by:
+- where the frontier stands
+- which tags are active
+- which metrics are live
+- which hypotheses are active
+- which experiments are open
-- kind: `invalid_input`, `not_initialized`, `transient`, `internal`
-- stage: `host`, `worker`, `store`, `transport`, `protocol`, `rollout`
+Everything after that should require deliberate traversal:
-Those faults are surfaced both as JSON-RPC errors and as structured tool
-errors, depending on call type.
+- `hypothesis.read`
+- `experiment.read`
+- `artifact.read`
-### Replay contracts
+Artifact reads stay metadata-only by design.
-The tool catalog explicitly marks each operation as one of:
+## Replay Model
-- `safe_replay`
-- `never_replay`
+The MCP host owns:
-Current policy:
+- the public JSON-RPC session
+- initialize-before-use semantics
+- replay contracts
+- health and telemetry
+- host rollout
-- reads such as `project.status`, `project.schema`, `tag.list`, `frontier.list`,
- `frontier.status`, `node.list`, `node.read`, `skill.list`, `skill.show`, and
- resource reads
- are safe to replay once after a retryable worker fault
-- mutating tools such as `tag.add`, `frontier.init`, `node.create`, `hypothesis.record`,
- `node.annotate`, `node.archive`, `note.quick`, `source.record`, and
- `experiment.close` are never auto-replayed
+The worker owns:
-This is the hardening answer to side-effect safety.
+- project-store access
+- tool execution
+- typed success and fault results
-Implemented server features:
+Reads and safe operational surfaces may be replayed after retryable worker
+faults. Mutating operations are never auto-replayed unless they are explicitly
+designed to be safe.
-- tools
-- resources
+## Navigator
-### Tools
+The local navigator mirrors the same philosophy:
-Implemented tools:
+- root page lists frontiers
+- frontier page is the only overview page
+- hypothesis and experiment pages are detail reads
+- artifacts are discoverable but never expanded into body dumps
-- `system.health`
-- `system.telemetry`
-- `project.bind`
-- `project.status`
-- `project.schema`
-- `schema.field.upsert`
-- `schema.field.remove`
-- `tag.add`
-- `tag.list`
-- `frontier.list`
-- `frontier.status`
-- `frontier.init`
-- `node.create`
-- `hypothesis.record`
-- `node.list`
-- `node.read`
-- `node.annotate`
-- `node.archive`
-- `note.quick`
-- `source.record`
-- `metric.define`
-- `metric.keys`
-- `metric.best`
-- `metric.migrate`
-- `run.dimension.define`
-- `run.dimension.list`
-- `experiment.close`
-- `skill.list`
-- `skill.show`
-
-### Resources
-
-Implemented resources:
-
-- `fidget-spinner://project/config`
-- `fidget-spinner://project/schema`
-- `fidget-spinner://skill/fidget-spinner`
-- `fidget-spinner://skill/frontier-loop`
-
-### Operational tools
-
-`system.health` returns a typed operational snapshot. Concise/default output
-stays on immediate session state; full detail widens to the entire health
-object:
-
-- initialization state
-- binding state
-- worker generation and liveness
-- current executable path
-- launch-path stability
-- rollout-pending state
-- last recorded fault in full detail
-
-`system.telemetry` returns cumulative counters:
-
-- requests
-- successes
-- errors
-- retries
-- worker restarts
-- host rollouts
-- last recorded fault
-- per-operation counts and last latencies
-
-### Rollout model
-
-The host fingerprints its executable at startup. If the binary changes on disk,
-or if a rollout is explicitly requested, the host re-execs itself after sending
-the current response. The re-exec carries forward:
-
-- initialization seed
-- project binding
-- telemetry counters
-- request id sequence
-- worker generation
-- one-shot rollout and crash-test markers
-
-This keeps the public session stable while still allowing hot binary replacement.
-
-## CLI Surface
-
-The CLI remains thin and operational.
-
-Current commands:
-
-- `init`
-- `schema show`
-- `schema upsert-field`
-- `schema remove-field`
-- `frontier init`
-- `frontier status`
-- `node add`
-- `node list`
-- `node show`
-- `node annotate`
-- `node archive`
-- `note quick`
-- `tag add`
-- `tag list`
-- `source add`
-- `metric define`
-- `metric keys`
-- `metric best`
-- `metric migrate`
-- `dimension define`
-- `dimension list`
-- `experiment close`
-- `mcp serve`
-- `ui serve`
-- hidden internal `mcp worker`
-- `skill list`
-- `skill install`
-- `skill show`
-
-The CLI is not the strategic write plane, but it is the easiest repair and
-bootstrap surface. Its naming is intentionally parallel but not identical to
-the MCP surface:
-
-- CLI subcommands use spaces such as `schema upsert-field` and `dimension define`
-- MCP tools use dotted names such as `schema.field.upsert` and `run.dimension.define`
-
-## Bundled Skill
-
-The bundled `fidget-spinner` and `frontier-loop` skills should
-be treated as part of the product, not stray prompts.
-
-Their job is to teach agents:
-
-- DAG first
-- schema first
-- cheap off-path pushes
-- disciplined core-path closure
-- archive rather than delete
-- and, for the frontier-loop specialization, how to run an indefinite push
-
-The asset lives in-tree so it can drift only via an explicit code change.
-
-## Full-Product Trajectory
-
-The full product should add, not replace, the MVP implementation.
-
-Planned next layers:
-
-- `spinnerd` as a long-lived local daemon
-- HTTP and SSE
-- read-mostly local UI
-- runner orchestration beyond direct process execution
-- interruption recovery and resumable long loops
-- archive and pruning passes
-- optional cross-project indexing
-
-The invariant for that future work is strict:
-
-- keep the DAG canonical
-- keep frontier state derived
-- keep project payloads local and flexible
-- keep off-path writes cheap
-- keep core-path closure atomic
-- keep host-owned replay contracts explicit and auditable
+The UI should help a model or operator walk the graph conservatively, not tempt
+it into giant all-history feeds.
diff --git a/docs/libgrid-dogfood.md b/docs/libgrid-dogfood.md
index 206c4d7..9d81993 100644
--- a/docs/libgrid-dogfood.md
+++ b/docs/libgrid-dogfood.md
@@ -6,26 +6,19 @@
failure mode Fidget Spinner is designed to kill:
- long autonomous optimization loops
-- heavy worktree usage
-- benchmark-driven decisions
-- huge markdown logs that blur evidence, narrative, and verdicts
+- heavy benchmark slicing
+- worktree churn
+- huge markdown logs that blur intervention, result, and verdict
That is the proving ground.
-## Immediate MVP Goal
+## Immediate Goal
-The MVP does not need to solve all of `libgrid`.
+The goal is not “ingest every scrap of prose.”
-It needs to solve this specific problem:
-
-replace the giant freeform experiment log with a machine in which the active
-frontier, the accepted lines, the live evidence, and the dead ends are all
-explicit and queryable.
-
-When using a global unbound MCP session from a `libgrid` worktree, the first
-project-local action should be `project.bind` against the `libgrid` worktree
-root or any nested path inside it. The session should not assume the MCP host's
-own repo.
+The goal is to replace the giant freeform experiment log with a machine in
+which the active frontier, live hypotheses, current experiments, verdicts, and
+best benchmark lines are explicit and queryable.
## Mapping Libgrid Work Into The Model
@@ -33,163 +26,101 @@ own repo.
One optimization objective becomes one frontier:
-- improve MILP solve quality
-- reduce wall-clock time
-- reduce LP pressure
-- improve node throughput
-- improve best-bound quality
-
-### Contract node
-
-The root contract should state:
-
-- objective in plain language
-- benchmark suite set
-- primary metric
-- supporting metrics
-- promotion criteria
-
-### Change node
-
-Use `hypothesis.record` to capture:
-
-- what hypothesis is being tested
-- what benchmark suite matters
-- any terse sketch of the intended delta
-
-### Run node
-
-The run node should capture:
-
-- exact command
-- cwd
-- backend kind
-- run dimensions
-- resulting metrics
+- root cash-out
+- LP spend reduction
+- primal improvement
+- search throughput
+- cut pipeline quality
-### Decision node
+The frontier brief should answer where the campaign stands right now, not dump
+historical narrative.
-The decision should make the verdict explicit:
+### Hypothesis
-- accepted
-- kept
-- parked
-- rejected
+A hypothesis should capture one concrete intervention claim:
-### Off-path nodes
+- terse title
+- one-line summary
+- one-paragraph body
-Use these freely:
+If the body wants to become a design memo, it is too large.
-- `source` for ideas, external references, algorithm sketches
-- `source` for scaffolding that is not yet a benchmarked experiment
-- `note` for quick observations
+### Experiment
-This is how the system avoids forcing every useful thought into experiment
-closure.
+Each measured slice becomes one experiment under exactly one hypothesis.
-## Suggested Libgrid Project Schema
+The experiment closes with:
-The `libgrid` project should eventually define richer payload conventions in
-`.fidget_spinner/schema.json`.
-
-The MVP does not need hard rejection. It does need meaningful warnings.
-
-Good first project fields:
+- dimensions such as `instance`, `profile`, `duration_s`
+- primary metric
+- supporting metrics
+- verdict: `accepted | kept | parked | rejected`
+- rationale
+- optional analysis
-- `hypothesis` on `hypothesis`
-- `benchmark_suite` on `hypothesis` and `run`
-- `body` on `hypothesis`, `source`, and `note`
-- `comparison_claim` on `analysis`
-- `rationale` on `decision`
+If a tranche doc reports multiple benchmark slices, it should become multiple
+experiments, not one prose blob.
-Good first metric vocabulary:
+### Artifact
-- `wall_clock_s`
-- `solved_instance_count`
-- `nodes_expanded`
-- `best_bound_delta`
-- `lp_calls`
-- `memory_bytes`
+Historical markdown, logs, tables, and other large dumps should be attached as
+artifacts by reference when they matter. They should not live in the ledger as
+default-enumerated prose.
-## Libgrid MVP Workflow
+## Libgrid Workflow
-### 1. Seed the frontier
+### 1. Ground
-1. Initialize the project store.
-2. Create a frontier contract.
+1. Bind the MCP to the libgrid worktree.
+2. Read `frontier.open`.
+3. Decide whether the next move is a new hypothesis, a new experiment on an
+ existing hypothesis, or a frontier brief update.
### 2. Start a line of attack
-1. Read the current frontier and the recent DAG tail.
-2. Record a `hypothesis`.
-3. If needed, attach off-path `source` or `note` nodes first.
+1. Record a hypothesis.
+2. Attach any necessary artifacts by reference.
+3. Open one experiment for the concrete slice being tested.
-### 3. Execute one experiment
+### 3. Execute
1. Modify the worktree.
2. Run the benchmark protocol.
-3. Close the experiment atomically.
+3. Close the experiment atomically with parsed metrics and an explicit verdict.
### 4. Judge and continue
-1. Mark the line accepted, kept, parked, or rejected.
-2. Archive dead ends instead of leaving them noisy and active.
-3. Repeat.
+1. Use `accepted`, `kept`, `parked`, and `rejected` honestly.
+2. Let the frontier brief summarize the current strategic state.
+3. Let historical tranche markdown live as artifacts when preservation matters.
## Benchmark Discipline
-For `libgrid`, the benchmark evidence needs to be structurally trustworthy.
-
-The MVP should always preserve at least:
+For `libgrid`, the minimum trustworthy record is:
- run dimensions
- primary metric
-- supporting metrics
-- command envelope
-
-This is the minimum needed to prevent "I think this was faster" folklore.
-
-## What The MVP Can Defer
-
-These are useful but not required for the first real dogfood loop:
-
-- strong markdown migration
-- multi-agent coordination
-- rich artifact bundling
-- pruning or vacuum passes beyond archive
-- UI-heavy analysis
-
-The right sequence is:
-
-1. start a clean front
-2. run new work through Fidget Spinner
-3. backfill old markdown only when it is worth the effort
-
-## Repo-Local Dogfood Before Libgrid
+- supporting metrics that materially explain the verdict
+- rationale
-This repository itself is a valid off-path dogfood target even though it is not
-a benchmark-heavy repo.
+This is the minimum needed to prevent “I think this was faster” folklore.
-That means we can already use it to test:
+## Active Metric Discipline
-- project initialization
-- schema visibility
-- frontier creation and status projection
-- off-path source recording
-- hidden annotations
-- MCP read and write flows
+`libgrid` will accumulate many niche metrics.
-What it cannot honestly test is heavy benchmark ingestion and the retrieval
-pressure that comes with it. That still belongs in a real optimization corpus
-such as the `libgrid` worktree.
+The hot path should care about live metrics only: the metrics touched by the
+active experimental frontier and its immediate comparison set. Old, situational
+metrics may remain in the registry without dominating `frontier.open`.
-## Acceptance Bar For Libgrid
+## Acceptance Bar
Fidget Spinner is ready for serious `libgrid` use when:
-- an agent can run for hours without generating a giant markdown graveyard
-- the operator can identify accepted, kept, parked, and rejected lines mechanically
-- each completed experiment has result, note, and verdict
-- off-path side investigations stay preserved but do not pollute the core path
+- an agent can run for hours without generating a markdown graveyard
+- `frontier.open` gives a truthful, bounded orientation surface
+- active hypotheses and open experiments are obvious
+- closed experiments carry parsed metrics rather than prose-only results
+- artifacts preserve source texture without flooding the hot path
- the system feels like a machine for evidence rather than a diary with better
typography
diff --git a/docs/product-spec.md b/docs/product-spec.md
index 85561ad..ce881c6 100644
--- a/docs/product-spec.md
+++ b/docs/product-spec.md
@@ -2,341 +2,250 @@
## Thesis
-Fidget Spinner is a local-first, agent-first frontier machine for autonomous
-program optimization, source capture, and experiment adjudication.
+Fidget Spinner is a local-first, agent-first frontier ledger for autonomous
+optimization work.
-The immediate target is brutally practical: replace gigantic freeform
-experiment markdown with a machine that preserves evidence as structure.
+It is not a notebook. It is not a generic DAG memory. It is not an inner
+platform for git. It is a hard experimental spine whose job is to preserve
+scientific truth with enough structure that agents can resume work without
+reconstructing everything from prose.
The package is deliberately two things at once:
-- a local MCP-backed DAG substrate
-- bundled skills that teach agents how to drive that substrate
+- a local MCP-backed frontier ledger
+- bundled skills that teach agents how to drive that ledger
-Those two halves should be versioned together and treated as one product.
+Those two halves are one product and should be versioned together.
## Product Position
-This is not a hosted lab notebook.
+This is a machine for long-running frontier work in local repos.
-This is not a cloud compute marketplace.
+Humans and agents should be able to answer:
-This is not a collaboration shell with experiments bolted on.
+- what frontier is active
+- which hypotheses are live
+- which experiments are still open
+- what the latest accepted, kept, parked, and rejected outcomes are
+- which metrics matter right now
-This is a local machine for indefinite frontier pushes, with agents as primary
-writers and humans as auditors, reviewers, and occasional editors.
+without opening a markdown graveyard.
## Non-Goals
These are explicitly out of scope for the core product:
-- OAuth
- hosted identity
- cloud tenancy
-- billing, credits, and subscriptions
-- managed provider brokerage
+- billing or credits
- chat as the system of record
- mandatory remote control planes
- replacing git
+- storing or rendering large artifact bodies
-Git remains the code substrate. Fidget Spinner is the evidence substrate.
+Git remains the code substrate. Fidget Spinner is the experimental ledger.
## Locked Design Decisions
-These are the load-bearing decisions to hold fixed through the MVP push.
+### 1. The ledger is austere
-### 1. The DAG is canonical truth
+The only freeform overview surface is the frontier brief, read through
+`frontier.open`.
-The canonical record is the DAG plus its normalized supporting tables.
+Everything else should require deliberate traversal one selector at a time.
+Slow is better than burning tokens on giant feeds.
-Frontier state is not a rival authority. It is a derived, rebuildable
-projection over the DAG and related run/experiment records.
+### 2. The ontology is small
-### 2. Storage is per-project
+The canonical object families are:
-Each project owns its own local store under:
-
-```text
-<project root>/.fidget_spinner/
- state.sqlite
- project.json
- schema.json
- blobs/
-```
-
-There is no mandatory global database in the MVP.
-
-### 3. Node structure is layered
-
-Every node has three layers:
-
-- a hard global envelope for indexing and traversal
-- a project-local structured payload
-- free-form sidecar annotations as an escape hatch
-
-The engine only hard-depends on the envelope. Project payloads remain flexible.
-
-### 4. Validation is warning-heavy
-
-Engine integrity is hard-validated.
-
-Project semantics are diagnostically validated.
-
-Workflow eligibility is action-gated.
-
-In other words:
-
-- bad engine state is rejected
-- incomplete project payloads are usually admitted with diagnostics
-- projections and frontier actions may refuse incomplete nodes later
+- `frontier`
+- `hypothesis`
+- `experiment`
+- `artifact`
-### 5. Core-path and off-path work must diverge
+There are no canonical `note` or `source` ledger nodes.
-Core-path work is disciplined and atomic.
+### 3. Frontier is scope, not a graph vertex
-Off-path work is cheap and permissive.
+A frontier is a named scope and grounding object. It owns:
-The point is to avoid forcing every scrap of source digestion or note-taking through the full
-benchmark/decision bureaucracy while still preserving it in the DAG.
+- objective
+- status
+- brief
-### 6. Completed core-path experiments are atomic
+And it partitions hypotheses and experiments.
-A completed experiment exists only when all of these exist together:
+### 4. Hypothesis and experiment are the true graph vertices
-- measured result
-- terse note
-- explicit verdict
+A hypothesis is a terse intervention claim.
-The write surface should make that one atomic mutation, not a loose sequence of
-low-level calls.
+An experiment is a stateful scientific record. Every experiment has:
-## Node Model
+- one mandatory owning hypothesis
+- optional influence parents drawn from hypotheses or experiments
-### Global envelope
+This gives the product a canonical tree spine plus a sparse influence network.
-The hard spine should be stable across projects. It includes at least:
+### 5. Artifacts are references only
-- node id
-- node class
-- node track
-- frontier id if any
-- archived flag
-- title
-- summary
-- schema namespace and version
-- timestamps
-- diagnostics
-- hidden or visible annotations
+Artifacts are metadata plus locators for external material:
-This is the engine layer: the part that powers indexing, traversal, archiving,
-default enumeration, and model-facing summaries.
+- files
+- links
+- logs
+- tables
+- plots
+- dumps
+- bibliographies
-### Project-local payload
+Spinner never reads artifact bodies. If a wall of text matters, attach it as an
+artifact and summarize the operational truth elsewhere.
-Every project may define richer payload fields in:
+### 6. Experiment closure is atomic
-`<project root>/.fidget_spinner/schema.json`
+A closed experiment exists only when all of these exist together:
-That file is a model-facing contract. It defines field names and soft
-validation tiers without forcing global schema churn.
+- dimensions
+- primary metric
+- verdict
+- rationale
+- optional supporting metrics
+- optional analysis
-Per-field settings should express at least:
+Closing an experiment is one atomic mutation, not a loose pile of lower-level
+writes.
-- presence: `required`, `recommended`, `optional`
-- severity: `error`, `warning`, `info`
-- role: `index`, `projection_gate`, `render_only`, `opaque`
-- inference policy: whether the model may infer the field
+### 7. Live metrics are derived
-These settings are advisory at ingest time and stricter at projection/action
-time.
+The hot-path metric surface is not “all metrics that have ever existed.”
-### Free-form annotations
+The hot-path metric surface is the derived live set for the active frontier.
+That set should stay small, frontier-relevant, and queryable.
-Any node may carry free-form annotations.
+## Canonical Data Model
-These are explicitly sidecar, not primary payload. They are:
+### Frontier
-- allowed everywhere
-- hidden from default enumeration
-- useful as a scratchpad or escape hatch
-- not allowed to become the only home of critical operational truth
+Frontier is a scope/partition object with one mutable brief.
-If a fact matters to automation, comparison, or promotion, it must migrate into
-the spine or project payload.
+The brief is the sanctioned grounding object. It should stay short and answer:
-## Node Taxonomy
+- situation
+- roadmap
+- unknowns
-### Core-path node classes
+### Hypothesis
-These are the disciplined frontier-loop classes:
+A hypothesis is a disciplined claim:
-- `contract`
-- `hypothesis`
-- `run`
-- `analysis`
-- `decision`
+- title
+- summary
+- exactly one paragraph of body
+- tags
+- influence parents
-### Off-path node classes
+It is not a design doc and not a catch-all prose bucket.
-These are deliberately low-ceremony:
+### Experiment
-- `source`
-- `source`
-- `note`
+An experiment is a stateful object:
-They exist so the product can absorb real thinking instead of forcing users and
-agents back into sprawling markdown.
+- open while the work is live
+- closed when the result is in
-## Frontier Model
+A closed experiment stores:
-The frontier is a derived operational view over the canonical DAG.
+- dimensions
+- primary metric
+- supporting metrics
+- verdict: `accepted | kept | parked | rejected`
+- rationale
+- optional analysis
+- attached artifacts
-It answers:
+### Artifact
-- what objective is active
-- how many experiments are open
-- how many experiments are completed
-- how the verdict mix currently breaks down
+Artifacts preserve external material by reference. They are deliberately off the
+token hot path. Artifact metadata should be enough to discover the thing; the
+body lives elsewhere.
-The DAG answers:
+## Token Discipline
-- what changed
-- what ran
-- what evidence was collected
-- what was concluded
-- what dead ends and side investigations exist
+`frontier.open` is the only sanctioned overview dump. It should return:
-That split is deliberate. It prevents "frontier state" from turning into a
-second unofficial database.
+- frontier brief
+- active tags
+- live metric keys
+- active hypotheses with deduped current state
+- open experiments
-## First Usable MVP
+After that, the model should walk explicitly:
-The first usable MVP is the first cut that can already replace a meaningful
-slice of the markdown habit without pretending the whole full-product vision is
-done.
+- `hypothesis.read`
+- `experiment.read`
+- `artifact.read`
-### MVP deliverables
+No broad list surface should dump large prose. Artifact bodies are never in the
+MCP path.
-- per-project `.fidget_spinner/` state
-- local SQLite backing store
-- local blob directory
-- typed Rust core model
-- optional light-touch project field types: `string`, `numeric`, `boolean`, `timestamp`
-- thin CLI for bootstrap and repair
-- hardened stdio MCP host exposed from the CLI
-- minimal read-only web navigator with tag filtering and linear node rendering
-- disposable MCP worker execution runtime
-- bundled `fidget-spinner` base skill
-- bundled `frontier-loop` skill
-- low-ceremony off-path note and source recording
-- explicit experiment open/close lifecycle for the core path
+## Storage
-### Explicitly deferred from the MVP
+Every project owns a private state root:
-- long-lived `spinnerd`
-- web UI
-- remote runners
-- multi-agent hardening
-- aggressive pruning and vacuuming
-- strong markdown migration tooling
-- cross-project indexing
+```text
+<project root>/.fidget_spinner/
+ project.json
+ state.sqlite
+```
-### MVP model-facing surface
+There is no required global database.
-The model-facing surface is a local MCP server oriented around frontier work.
+## MVP Surface
-The initial tools should be:
+The current model-facing surface is:
- `system.health`
- `system.telemetry`
- `project.bind`
- `project.status`
-- `project.schema`
- `tag.add`
- `tag.list`
+- `frontier.create`
- `frontier.list`
-- `frontier.status`
-- `frontier.init`
-- `node.create`
+- `frontier.read`
+- `frontier.open`
+- `frontier.brief.update`
+- `frontier.history`
- `hypothesis.record`
-- `node.list`
-- `node.read`
-- `node.annotate`
-- `node.archive`
-- `note.quick`
-- `source.record`
+- `hypothesis.list`
+- `hypothesis.read`
+- `hypothesis.update`
+- `hypothesis.history`
- `experiment.open`
- `experiment.list`
- `experiment.read`
+- `experiment.update`
- `experiment.close`
-- `skill.list`
-- `skill.show`
-
-The important point is not the exact names. The important point is the shape:
-
-- cheap read access to project and frontier context
-- cheap off-path writes
-- low-ceremony hypothesis capture
-- one explicit experiment-open step plus one experiment-close step
-- explicit operational introspection for long-lived agent sessions
-- explicit replay boundaries so side effects are never duplicated by accident
-
-### MVP skill posture
-
-The bundled skills should instruct agents to:
-
-1. inspect `system.health` first
-2. bind the MCP session to the target project before project-local reads or writes
-3. read project schema, tag registry, and frontier state
-4. pull context from the DAG instead of giant prose dumps
-5. use `note.quick` and `source.record` freely off path, but always pass an explicit tag list for notes
-6. use `hypothesis.record` before worktree thrash becomes ambiguous
-7. use `experiment.open` before running a live hypothesis-owned line
-8. use `experiment.close` to seal that line with measured evidence
-9. archive detritus instead of deleting it
-10. use the base `fidget-spinner` skill for ordinary DAG work and add
- `frontier-loop` only when the task becomes a true autonomous frontier push
-
-### MVP acceptance bar
-
-The MVP is successful when:
-
-- a project can be initialized locally with no hosted dependencies
-- an agent can inspect frontier state through MCP
-- an agent can inspect MCP health and telemetry through MCP
-- an agent can record off-path sources and notes without bureaucratic pain
-- the project schema can softly declare whether payload fields are strings, numbers, booleans, or timestamps
-- an operator can inspect recent nodes through a minimal localhost web navigator filtered by tag
-- a project can close a real core-path experiment atomically
-- retryable worker faults do not duplicate side effects
-- stale nodes can be archived instead of polluting normal enumeration
-- a human can answer "what was tried, what ran, what was accepted or parked,
- and why?" without doing markdown archaeology
-
-## Full Product
-
-The full product grows outward from the MVP rather than replacing it.
-
-### Planned additions
-
-- `spinnerd` as a long-lived local daemon
-- local HTTP and SSE
-- read-mostly graph and run inspection UI
-- richer artifact handling
-- model-driven pruning and archive passes
-- stronger interruption recovery
-- local runner backends beyond direct process execution
-- optional global indexing across projects
-- import/export and subgraph packaging
-
-### Invariant for all later stages
-
-No future layer should invalidate the MVP spine:
-
-- DAG canonical
-- frontier derived
-- project-local store
-- layered node model
-- warning-heavy schema validation
-- cheap off-path writes
-- atomic core-path closure
+- `experiment.history`
+- `artifact.record`
+- `artifact.list`
+- `artifact.read`
+- `artifact.update`
+- `artifact.history`
+- `metric.define`
+- `metric.keys`
+- `metric.best`
+- `run.dimension.define`
+- `run.dimension.list`
+
+## Explicitly Deferred
+
+Still out of scope:
+
+- remote runners
+- hosted multi-user control planes
+- broad artifact ingestion
+- reading artifact bodies through Spinner
+- giant auto-generated context dumps
+- replacing git or reconstructing git inside the ledger