diff options
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/architecture.md | 527 | ||||
| -rw-r--r-- | docs/libgrid-dogfood.md | 202 | ||||
| -rw-r--r-- | docs/product-spec.md | 341 |
3 files changed, 1070 insertions, 0 deletions
diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..acab8fe --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,527 @@ +# Fidget Spinner Architecture + +## Current Shape + +The current MVP implementation is intentionally narrower than the eventual full +product: + +```text +agent host + | + | bundled fidget-spinner skills + stdio MCP + v +spinner MCP host + | + +-- public JSON-RPC transport + +-- session seed capture and restore + +-- explicit project binding + +-- tool catalog and replay contracts + +-- health and telemetry + +-- hot rollout / re-exec + | + +-- disposable MCP worker + | | + | +-- per-project SQLite store + | +-- per-project blob directory + | +-- git/worktree introspection + | +-- atomic experiment closure + | + v +<project root>/.fidget_spinner/ +``` + +There is no long-lived daemon yet. The first usable slice runs MCP from the CLI +binary, but it already follows the hardened host/worker split required for +long-lived sessions and safe replay behavior. + +## Package Boundary + +The package currently contains three coupled layers: + +- `fidget-spinner-core` +- `fidget-spinner-store-sqlite` +- `fidget-spinner-cli` + +And two bundled agent assets: + +- `assets/codex-skills/fidget-spinner/SKILL.md` +- `assets/codex-skills/frontier-loop/SKILL.md` + +Those parts should be treated as one release unit. + +## Storage Topology + +Every initialized project owns a private state root: + +```text +<project root>/.fidget_spinner/ + project.json + schema.json + state.sqlite + blobs/ +``` + +Why this shape: + +- schema freedom stays per project +- migrations stay local +- backup and portability stay simple +- we avoid premature pressure toward a single global schema + +Cross-project search can come later as an additive index. + +## State Layers + +### 1. Global engine spine + +The engine depends on a stable, typed spine stored in SQLite: + +- nodes +- node annotations +- node edges +- frontiers +- checkpoints +- runs +- metrics +- experiments +- event log + +This layer powers traversal, indexing, archiving, and frontier projection. + +### 2. Project payload layer + +Each node stores a project payload as JSON, namespaced and versioned by the +project schema in `.fidget_spinner/schema.json`. + +This is where domain-specific richness lives. + +### 3. Annotation sidecar + +Annotations are stored separately from payload and are default-hidden unless +explicitly surfaced. + +That separation is important. It prevents free-form scratch text from silently +mutating into a shadow schema. + +## Validation Model + +Validation has three tiers. + +### Storage validity + +Hard-fail conditions: + +- malformed engine envelope +- broken ids +- invalid enum values +- broken relational integrity + +### Semantic quality + +Project field expectations are warning-heavy: + +- missing recommended fields emit diagnostics +- missing projection-gated fields remain storable +- ingest usually succeeds + +### Operational eligibility + +Specific actions may refuse incomplete records. + +Examples: + +- core-path experiment closure requires complete run/result/note/verdict state +- future promotion helpers may require a projection-ready change payload + +## SQLite Schema + +### `nodes` + +Stores the global node envelope: + +- id +- class +- track +- frontier id +- archived flag +- title +- summary +- schema namespace +- schema version +- payload JSON +- diagnostics JSON +- agent session id +- timestamps + +### `node_annotations` + +Stores sidecar free-form annotations: + +- annotation id +- owning node id +- visibility +- optional label +- body +- created timestamp + +### `node_edges` + +Stores typed DAG edges: + +- source node id +- target node id +- edge kind + +The current edge kinds are enough for the MVP: + +- `lineage` +- `evidence` +- `comparison` +- `supersedes` +- `annotation` + +### `frontiers` + +Stores derived operational frontier records: + +- frontier id +- label +- root contract node id +- status +- timestamps + +Important constraint: + +- the root contract node itself also carries the same frontier id + +That keeps frontier filtering honest. + +### `checkpoints` + +Stores committed candidate or champion checkpoints: + +- checkpoint id +- frontier id +- anchoring node id +- repo/worktree metadata +- commit hash +- disposition +- summary +- created timestamp + +In the current codebase, a frontier may temporarily exist without a champion if +it was initialized outside a git repo. Core-path experimentation is only fully +available once git-backed checkpoints exist. + +### `runs` + +Stores run envelopes: + +- run id +- run node id +- frontier id +- backend +- status +- code snapshot metadata +- benchmark suite +- command envelope +- started and finished timestamps + +### `metrics` + +Stores primary and supporting run metrics: + +- run id +- metric key +- value +- unit +- optimization objective + +### `experiments` + +Stores the atomic closure object for core-path work: + +- experiment id +- frontier id +- base checkpoint id +- candidate checkpoint id +- change node id +- run node id and run id +- optional analysis node id +- decision node id +- verdict +- note payload +- created timestamp + +This table is the enforcement layer for frontier discipline. + +### `events` + +Stores durable audit events: + +- event id +- entity kind +- entity id +- event kind +- payload +- created timestamp + +## Core Types + +### Node classes + +Core path: + +- `contract` +- `change` +- `run` +- `analysis` +- `decision` + +Off path: + +- `research` +- `enabling` +- `note` + +### Node tracks + +- `core_path` +- `off_path` + +Track is derived from class, not operator whim. + +### Frontier projection + +The frontier projection currently exposes: + +- frontier record +- champion checkpoint id +- active candidate checkpoint ids +- experiment count + +This projection is derived from canonical state and intentionally rebuildable. + +## Write Surfaces + +### Low-ceremony off-path writes + +These are intentionally cheap: + +- `note.quick` +- `research.record` +- generic `node.create` for escape-hatch use +- `node.annotate` + +### Low-ceremony core-path entry + +`change.record` exists to capture intent before worktree state becomes muddy. + +### Atomic core-path closure + +`experiment.close` is the important write path. + +It persists, in one transaction: + +- run node +- run record +- candidate checkpoint +- decision node +- experiment record +- lineage and evidence edges +- frontier touch and champion demotion when needed + +That atomic boundary is the answer to the ceremony/atomicity pre-mortem. + +## MCP Surface + +The MVP MCP server is stdio-only and follows newline-delimited JSON-RPC message +framing. The public server is a stable host. It owns initialization state, +replay policy, telemetry, and host rollout. Execution happens in a disposable +worker subprocess. + +### Host responsibilities + +- own the public JSON-RPC session +- enforce initialize-before-use +- classify tools and resources by replay contract +- retry only explicitly safe operations after retryable worker faults +- expose health and telemetry +- re-exec the host binary while preserving initialization seed and counters + +### Worker responsibilities + +- open the per-project store +- execute tool logic and resource reads +- return typed success or typed fault records +- remain disposable without losing canonical state + +### Fault model + +Faults are typed by: + +- kind: `invalid_input`, `not_initialized`, `transient`, `internal` +- stage: `host`, `worker`, `store`, `transport`, `protocol`, `rollout` + +Those faults are surfaced both as JSON-RPC errors and as structured tool +errors, depending on call type. + +### Replay contracts + +The tool catalog explicitly marks each operation as one of: + +- `safe_replay` +- `never_replay` + +Current policy: + +- reads such as `project.status`, `project.schema`, `frontier.list`, + `frontier.status`, `node.list`, `node.read`, `skill.list`, `skill.show`, and + resource reads + are safe to replay once after a retryable worker fault +- mutating tools such as `frontier.init`, `node.create`, `change.record`, + `node.annotate`, `node.archive`, `note.quick`, `research.record`, and + `experiment.close` are never auto-replayed + +This is the hardening answer to side-effect safety. + +Implemented server features: + +- tools +- resources + +### Tools + +Implemented tools: + +- `system.health` +- `system.telemetry` +- `project.bind` +- `project.status` +- `project.schema` +- `frontier.list` +- `frontier.status` +- `frontier.init` +- `node.create` +- `change.record` +- `node.list` +- `node.read` +- `node.annotate` +- `node.archive` +- `note.quick` +- `research.record` +- `experiment.close` +- `skill.list` +- `skill.show` + +### Resources + +Implemented resources: + +- `fidget-spinner://project/config` +- `fidget-spinner://project/schema` +- `fidget-spinner://skill/fidget-spinner` +- `fidget-spinner://skill/frontier-loop` + +### Operational tools + +`system.health` returns a typed operational snapshot: + +- initialization state +- binding state +- worker generation and liveness +- current executable path +- launch-path stability +- rollout-pending state +- last recorded fault + +`system.telemetry` returns cumulative counters: + +- requests +- successes +- errors +- retries +- worker restarts +- host rollouts +- per-operation counts and last latencies + +### Rollout model + +The host fingerprints its executable at startup. If the binary changes on disk, +or if a rollout is explicitly requested, the host re-execs itself after sending +the current response. The re-exec carries forward: + +- initialization seed +- project binding +- telemetry counters +- request id sequence +- worker generation +- one-shot rollout and crash-test markers + +This keeps the public session stable while still allowing hot binary replacement. + +## CLI Surface + +The CLI remains thin and operational. + +Current commands: + +- `init` +- `schema show` +- `frontier init` +- `frontier status` +- `node add` +- `node list` +- `node show` +- `node annotate` +- `node archive` +- `note quick` +- `research add` +- `experiment close` +- `mcp serve` +- hidden internal `mcp worker` +- `skill list` +- `skill install` +- `skill show` + +The CLI is not the strategic write plane, but it is the easiest repair and +bootstrap surface. + +## Bundled Skill + +The bundled `fidget-spinner` and `frontier-loop` skills should +be treated as part of the product, not stray prompts. + +Their job is to teach agents: + +- DAG first +- schema first +- cheap off-path pushes +- disciplined core-path closure +- archive rather than delete +- and, for the frontier-loop specialization, how to run an indefinite push + +The asset lives in-tree so it can drift only via an explicit code change. + +## Full-Product Trajectory + +The full product should add, not replace, the MVP implementation. + +Planned next layers: + +- `spinnerd` as a long-lived local daemon +- HTTP and SSE +- read-mostly local UI +- runner orchestration beyond direct process execution +- interruption recovery and resumable long loops +- archive and pruning passes +- optional cross-project indexing + +The invariant for that future work is strict: + +- keep the DAG canonical +- keep frontier state derived +- keep project payloads local and flexible +- keep off-path writes cheap +- keep core-path closure atomic +- keep host-owned replay contracts explicit and auditable diff --git a/docs/libgrid-dogfood.md b/docs/libgrid-dogfood.md new file mode 100644 index 0000000..5e13e51 --- /dev/null +++ b/docs/libgrid-dogfood.md @@ -0,0 +1,202 @@ +# Libgrid Dogfood Plan + +## Why Libgrid + +`libgrid` is the right first serious dogfood target because it has exactly the +failure mode Fidget Spinner is designed to kill: + +- long autonomous optimization loops +- heavy worktree usage +- benchmark-driven decisions +- huge markdown logs that blur evidence, narrative, and verdicts + +That is the proving ground. + +## Immediate MVP Goal + +The MVP does not need to solve all of `libgrid`. + +It needs to solve this specific problem: + +replace the giant freeform experiment log with a machine in which the active +frontier, the current champion, the candidate evidence, and the dead ends are +all explicit and queryable. + +When using a global unbound MCP session from a `libgrid` worktree, the first +project-local action should be `project.bind` against the `libgrid` worktree +root or any nested path inside it. The session should not assume the MCP host's +own repo. + +## Mapping Libgrid Work Into The Model + +### Frontier + +One optimization objective becomes one frontier: + +- improve MILP solve quality +- reduce wall-clock time +- reduce LP pressure +- improve node throughput +- improve best-bound quality + +### Contract node + +The root contract should state: + +- objective in plain language +- benchmark suite set +- primary metric +- supporting metrics +- promotion criteria + +### Change node + +Use `change.record` to capture: + +- what hypothesis is being tested +- what base checkpoint it starts from +- what benchmark suite matters +- any terse sketch of the intended delta + +### Run node + +The run node should capture: + +- exact command +- cwd +- backend kind +- benchmark suite +- code snapshot +- resulting metrics + +### Decision node + +The decision should make the verdict explicit: + +- promote to champion +- keep on frontier +- revert to champion +- archive dead end +- needs more evidence + +### Off-path nodes + +Use these freely: + +- `research` for ideas, external references, algorithm sketches +- `enabling` for scaffolding that is not yet a benchmarked experiment +- `note` for quick observations + +This is how the system avoids forcing every useful thought into experiment +closure. + +## Suggested Libgrid Project Schema + +The `libgrid` project should eventually define richer payload conventions in +`.fidget_spinner/schema.json`. + +The MVP does not need hard rejection. It does need meaningful warnings. + +Good first project fields: + +- `hypothesis` on `change` +- `base_checkpoint_id` on `change` +- `benchmark_suite` on `change` and `run` +- `body` on `change`, `research`, and `note` +- `comparison_claim` on `analysis` +- `rationale` on `decision` + +Good first metric vocabulary: + +- `wall_clock_s` +- `solved_instance_count` +- `nodes_expanded` +- `best_bound_delta` +- `lp_calls` +- `memory_bytes` + +## Libgrid MVP Workflow + +### 1. Seed the frontier + +1. Initialize the project store. +2. Create a frontier contract. +3. Capture the incumbent git checkpoint if available. + +### 2. Start a line of attack + +1. Read the current frontier and the recent DAG tail. +2. Record a `change`. +3. If needed, attach off-path `research` or `note` nodes first. + +### 3. Execute one experiment + +1. Modify the worktree. +2. Commit the candidate checkpoint. +3. Run the benchmark protocol. +4. Close the experiment atomically. + +### 4. Judge and continue + +1. Promote the checkpoint or keep it alive. +2. Archive dead ends instead of leaving them noisy and active. +3. Repeat. + +## Benchmark Discipline + +For `libgrid`, the benchmark evidence needs to be structurally trustworthy. + +The MVP should always preserve at least: + +- benchmark suite identity +- primary metric +- supporting metrics +- command envelope +- host/worktree metadata +- git commit identity + +This is the minimum needed to prevent "I think this was faster" folklore. + +## What The MVP Can Defer + +These are useful but not required for the first real dogfood loop: + +- strong markdown migration +- multi-agent coordination +- rich artifact bundling +- pruning or vacuum passes beyond archive +- UI-heavy analysis + +The right sequence is: + +1. start a clean front +2. run new work through Fidget Spinner +3. backfill old markdown only when it is worth the effort + +## Repo-Local Dogfood Before Libgrid + +This repository itself is a valid off-path dogfood target even though it is not +currently a git repo. + +That means we can already use it to test: + +- project initialization +- schema visibility +- frontier creation without a champion +- off-path research recording +- hidden annotations +- MCP read and write flows + +What it cannot honestly test is full git-backed core-path experiment closure. +That still belongs in a real repo such as the `libgrid` worktree. + +## Acceptance Bar For Libgrid + +Fidget Spinner is ready for serious `libgrid` use when: + +- an agent can run for hours without generating a giant markdown graveyard +- the operator can identify the champion checkpoint mechanically +- each completed experiment has checkpoint, result, note, and verdict +- off-path side investigations stay preserved but do not pollute the core path +- the system feels like a machine for evidence rather than a diary with better + typography diff --git a/docs/product-spec.md b/docs/product-spec.md new file mode 100644 index 0000000..8ab6210 --- /dev/null +++ b/docs/product-spec.md @@ -0,0 +1,341 @@ +# Fidget Spinner Product Spec + +## Thesis + +Fidget Spinner is a local-first, agent-first frontier machine for autonomous +program optimization and research. + +The immediate target is brutally practical: replace gigantic freeform +experiment markdown with a machine that preserves evidence as structure. + +The package is deliberately two things at once: + +- a local MCP-backed DAG substrate +- bundled skills that teach agents how to drive that substrate + +Those two halves should be versioned together and treated as one product. + +## Product Position + +This is not a hosted lab notebook. + +This is not a cloud compute marketplace. + +This is not a collaboration shell with experiments bolted on. + +This is a local machine for indefinite frontier pushes, with agents as primary +writers and humans as auditors, reviewers, and occasional editors. + +## Non-Goals + +These are explicitly out of scope for the core product: + +- OAuth +- hosted identity +- cloud tenancy +- billing, credits, and subscriptions +- managed provider brokerage +- chat as the system of record +- mandatory remote control planes +- replacing git + +Git remains the code substrate. Fidget Spinner is the evidence substrate. + +## Locked Design Decisions + +These are the load-bearing decisions to hold fixed through the MVP push. + +### 1. The DAG is canonical truth + +The canonical record is the DAG plus its normalized supporting tables. + +Frontier state is not a rival authority. It is a derived, rebuildable +projection over the DAG and related run/checkpoint/experiment records. + +### 2. Storage is per-project + +Each project owns its own local store under: + +```text +<project root>/.fidget_spinner/ + state.sqlite + project.json + schema.json + blobs/ +``` + +There is no mandatory global database in the MVP. + +### 3. Node structure is layered + +Every node has three layers: + +- a hard global envelope for indexing and traversal +- a project-local structured payload +- free-form sidecar annotations as an escape hatch + +The engine only hard-depends on the envelope. Project payloads remain flexible. + +### 4. Validation is warning-heavy + +Engine integrity is hard-validated. + +Project semantics are diagnostically validated. + +Workflow eligibility is action-gated. + +In other words: + +- bad engine state is rejected +- incomplete project payloads are usually admitted with diagnostics +- projections and frontier actions may refuse incomplete nodes later + +### 5. Core-path and off-path work must diverge + +Core-path work is disciplined and atomic. + +Off-path work is cheap and permissive. + +The point is to avoid forcing every scrap of research through the full +benchmark/decision bureaucracy while still preserving it in the DAG. + +### 6. Completed core-path experiments are atomic + +A completed experiment exists only when all of these exist together: + +- base checkpoint +- candidate checkpoint +- measured result +- terse note +- explicit verdict + +The write surface should make that one atomic mutation, not a loose sequence of +low-level calls. + +### 7. Checkpoints are git-backed + +Dirty worktree snapshots are useful as descriptive context, but a completed +core-path experiment should anchor to a committed candidate checkpoint. + +Off-path notes and research can remain lightweight and non-committal. + +## Node Model + +### Global envelope + +The hard spine should be stable across projects. It includes at least: + +- node id +- node class +- node track +- frontier id if any +- archived flag +- title +- summary +- schema namespace and version +- timestamps +- diagnostics +- hidden or visible annotations + +This is the engine layer: the part that powers indexing, traversal, archiving, +default enumeration, and model-facing summaries. + +### Project-local payload + +Every project may define richer payload fields in: + +`<project root>/.fidget_spinner/schema.json` + +That file is a model-facing contract. It defines field names and soft +validation tiers without forcing global schema churn. + +Per-field settings should express at least: + +- presence: `required`, `recommended`, `optional` +- severity: `error`, `warning`, `info` +- role: `index`, `projection_gate`, `render_only`, `opaque` +- inference policy: whether the model may infer the field + +These settings are advisory at ingest time and stricter at projection/action +time. + +### Free-form annotations + +Any node may carry free-form annotations. + +These are explicitly sidecar, not primary payload. They are: + +- allowed everywhere +- hidden from default enumeration +- useful as a scratchpad or escape hatch +- not allowed to become the only home of critical operational truth + +If a fact matters to automation, comparison, or promotion, it must migrate into +the spine or project payload. + +## Node Taxonomy + +### Core-path node classes + +These are the disciplined frontier-loop classes: + +- `contract` +- `change` +- `run` +- `analysis` +- `decision` + +### Off-path node classes + +These are deliberately low-ceremony: + +- `research` +- `enabling` +- `note` + +They exist so the product can absorb real thinking instead of forcing users and +agents back into sprawling markdown. + +## Frontier Model + +The frontier is a derived operational view over the canonical DAG. + +It answers: + +- what objective is active +- what the current champion checkpoint is +- which candidate checkpoints are still alive +- how many completed experiments exist + +The DAG answers: + +- what changed +- what ran +- what evidence was collected +- what was concluded +- what dead ends and side investigations exist + +That split is deliberate. It prevents "frontier state" from turning into a +second unofficial database. + +## First Usable MVP + +The first usable MVP is the first cut that can already replace a meaningful +slice of the markdown habit without pretending the whole full-product vision is +done. + +### MVP deliverables + +- per-project `.fidget_spinner/` state +- local SQLite backing store +- local blob directory +- typed Rust core model +- thin CLI for bootstrap and repair +- hardened stdio MCP host exposed from the CLI +- disposable MCP worker execution runtime +- bundled `fidget-spinner` base skill +- bundled `frontier-loop` skill +- low-ceremony off-path note and research recording +- atomic core-path experiment closure + +### Explicitly deferred from the MVP + +- long-lived `spinnerd` +- web UI +- remote runners +- multi-agent hardening +- aggressive pruning and vacuuming +- strong markdown migration tooling +- cross-project indexing + +### MVP model-facing surface + +The model-facing surface is a local MCP server oriented around frontier work. + +The initial tools should be: + +- `system.health` +- `system.telemetry` +- `project.bind` +- `project.status` +- `project.schema` +- `frontier.list` +- `frontier.status` +- `frontier.init` +- `node.create` +- `change.record` +- `node.list` +- `node.read` +- `node.annotate` +- `node.archive` +- `note.quick` +- `research.record` +- `experiment.close` +- `skill.list` +- `skill.show` + +The important point is not the exact names. The important point is the shape: + +- cheap read access to project and frontier context +- cheap off-path writes +- low-ceremony change capture +- one atomic "close the experiment" tool +- explicit operational introspection for long-lived agent sessions +- explicit replay boundaries so side effects are never duplicated by accident + +### MVP skill posture + +The bundled skills should instruct agents to: + +1. inspect `system.health` first +2. bind the MCP session to the target project before project-local reads or writes +3. read project schema and frontier state +4. pull context from the DAG instead of giant prose dumps +5. use `note.quick` and `research.record` freely off path +6. use `change.record` before worktree thrash becomes ambiguous +7. use `experiment.close` to atomically seal core-path work +8. archive detritus instead of deleting it +9. use the base `fidget-spinner` skill for ordinary DAG work and add + `frontier-loop` only when the task becomes a true autonomous frontier push + +### MVP acceptance bar + +The MVP is successful when: + +- a project can be initialized locally with no hosted dependencies +- an agent can inspect frontier state through MCP +- an agent can inspect MCP health and telemetry through MCP +- an agent can record off-path research without bureaucratic pain +- a git-backed project can close a real core-path experiment atomically +- retryable worker faults do not duplicate side effects +- stale nodes can be archived instead of polluting normal enumeration +- a human can answer "what changed, what ran, what is the current champion, + and why?" without doing markdown archaeology + +## Full Product + +The full product grows outward from the MVP rather than replacing it. + +### Planned additions + +- `spinnerd` as a long-lived local daemon +- local HTTP and SSE +- read-mostly graph and run inspection UI +- richer artifact handling +- model-driven pruning and archive passes +- stronger interruption recovery +- local runner backends beyond direct process execution +- optional global indexing across projects +- import/export and subgraph packaging + +### Invariant for all later stages + +No future layer should invalidate the MVP spine: + +- DAG canonical +- frontier derived +- project-local store +- layered node model +- warning-heavy schema validation +- cheap off-path writes +- atomic core-path closure |