diff options
| author | main <main@swarm.moe> | 2026-03-20 00:33:08 -0400 |
|---|---|---|
| committer | main <main@swarm.moe> | 2026-03-20 00:33:08 -0400 |
| commit | ce41a229dcd57f9a2c35359fe77d9f54f603e985 (patch) | |
| tree | 1d42649c5711bf83cb738c40d83b81cbe7b08238 | |
| parent | 5047a141c45d18ef23ddd369fb262ecac867da11 (diff) | |
| download | fidget_spinner-ce41a229dcd57f9a2c35359fe77d9f54f603e985.zip | |
Refound ontology around hypotheses and experiments
| -rw-r--r-- | AGENTS.md | 6 | ||||
| -rw-r--r-- | README.md | 54 | ||||
| -rw-r--r-- | assets/codex-skills/fidget-spinner/SKILL.md | 40 | ||||
| -rw-r--r-- | assets/codex-skills/frontier-loop/SKILL.md | 7 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/main.rs | 207 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/mcp/catalog.rs | 104 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/mcp/service.rs | 313 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/tests/mcp_hardening.rs | 99 | ||||
| -rw-r--r-- | crates/fidget-spinner-core/src/lib.rs | 2 | ||||
| -rw-r--r-- | crates/fidget-spinner-core/src/model.rs | 37 | ||||
| -rw-r--r-- | crates/fidget-spinner-store-sqlite/src/lib.rs | 534 | ||||
| -rw-r--r-- | docs/architecture.md | 24 | ||||
| -rw-r--r-- | docs/libgrid-dogfood.md | 20 | ||||
| -rw-r--r-- | docs/product-spec.md | 40 |
14 files changed, 1123 insertions, 364 deletions
@@ -1,7 +1,7 @@ # Fidget Spinner Fidget Spinner is a local-first, agent-first experimental DAG for autonomous -program optimization and research. +program optimization, source capture, and experiment adjudication. Constraints that are part of the product: @@ -14,7 +14,7 @@ Constraints that are part of the product: - per-project state lives under `.fidget_spinner/` - project payload schemas are local and warning-heavy, not globally rigid - off-path nodes should remain cheap -- core-path experiment closure should remain atomic +- core-path work should remain hypothesis-owned and experiment-gated Engineering posture: @@ -28,7 +28,7 @@ MVP target: - dogfood against `libgrid` worktrees - replace sprawling freeform experiment markdown with structured - contract/change/run/analysis/decision nodes plus cheap research/note side paths + contract/hypothesis/run/analysis/decision nodes plus cheap source/note side paths - make runs, comparisons, artifacts, and code snapshots first-class - bundle the frontier-loop skill with the MCP surface instead of treating it as folklore @@ -1,7 +1,7 @@ # Fidget Spinner Fidget Spinner is a local-first, agent-first experimental DAG for autonomous -program optimization and research. +program optimization, source capture, and experiment adjudication. It is aimed at the ugly, practical problem of replacing sprawling experiment markdown in worktree-heavy optimization projects such as `libgrid` with a @@ -12,7 +12,7 @@ The current shape is built around four ideas: - the DAG is canonical truth - frontier state is a derived projection - project payload schemas are local and flexible -- core-path experiment closure is atomic +- core-path work is hypothesis-owned and experiment-gated ## Current Scope @@ -85,7 +85,7 @@ lot of experiments: cargo run -p fidget-spinner-cli -- schema upsert-field \ --project . \ --name scenario \ - --class change \ + --class hypothesis \ --class analysis \ --presence recommended \ --severity warning \ @@ -129,7 +129,7 @@ cargo run -p fidget-spinner-cli -- tag add \ ``` ```bash -cargo run -p fidget-spinner-cli -- research add \ +cargo run -p fidget-spinner-cli -- source add \ --project . \ --title "next feature slate" \ --summary "Investigate the next tranche of high-value product work." \ @@ -146,6 +146,27 @@ cargo run -p fidget-spinner-cli -- note quick \ --tag dogfood/mvp ``` +Record a core-path hypothesis and open an experiment against it: + +```bash +cargo run -p fidget-spinner-cli -- hypothesis add \ + --project . \ + --frontier <frontier-id> \ + --title "inline metric table" \ + --summary "Rendering candidate metrics on cards will improve navigator utility." \ + --body "Surface experiment metrics and objective-aware deltas directly on change cards." +``` + +```bash +cargo run -p fidget-spinner-cli -- experiment open \ + --project . \ + --frontier <frontier-id> \ + --base-checkpoint <checkpoint-id> \ + --hypothesis-node <hypothesis-node-id> \ + --title "navigator metric card pass" \ + --summary "Evaluate inline metrics on experiment-bearing cards." +``` + ```bash cargo run -p fidget-spinner-cli -- metric keys --project . ``` @@ -229,13 +250,16 @@ The current MCP tools are: - `frontier.status` - `frontier.init` - `node.create` -- `change.record` +- `hypothesis.record` +- `experiment.open` +- `experiment.list` +- `experiment.read` - `node.list` - `node.read` - `node.annotate` - `node.archive` - `note.quick` -- `research.record` +- `source.record` - `metric.define` - `metric.keys` - `metric.best` @@ -262,11 +286,11 @@ created with `tag.add`, each with a required human description. `note.quick` accepts `tags: []` when no existing tag applies, but the field itself is still mandatory so note classification is always conscious. -`research.record` now also accepts optional `tags`, so rich imported documents +`source.record` now also accepts optional `tags`, so rich imported documents can join the same campaign/subsystem index as terse notes without falling back to the generic escape hatch. -`note.quick`, `research.record`, and generic `node create` for `note`/`research` +`note.quick`, `source.record`, and generic `node create` for `note`/`source` now enforce the same strict prose split: `title` is terse identity, `summary` is the triage/search layer, and `body` holds the full text. List-like surfaces stay on `title` + `summary`; full prose is for explicit reads only. @@ -292,17 +316,19 @@ The intended flow is: 1. inspect `system.health` 2. `project.bind` to the target project root or any nested path inside it 3. read `project.status`, `tag.list`, and `frontier.list` -4. read `project.schema` only when payload rules are actually relevant -5. pull context from the DAG -6. use cheap off-path writes liberally -7. record a `change` before core-path work -8. seal core-path work with one atomic `experiment.close` +4. read `experiment.list` if the session may be resuming in-flight work +5. read `project.schema` only when payload rules are actually relevant +6. pull context from the DAG +7. use `source.record` for documentary context and `note.quick` for atomic takeaways +8. record a `hypothesis` before core-path work +9. open the live experiment explicitly with `experiment.open` +10. seal core-path work with `experiment.close` ## Git-Backed Vs Plain Local Projects Off-path work does not require git. You can initialize a local project and use: -- `research add` +- `source add` - `tag add` - `note quick` - `metric keys` diff --git a/assets/codex-skills/fidget-spinner/SKILL.md b/assets/codex-skills/fidget-spinner/SKILL.md index 1e4c2a3..cfa3521 100644 --- a/assets/codex-skills/fidget-spinner/SKILL.md +++ b/assets/codex-skills/fidget-spinner/SKILL.md @@ -1,6 +1,6 @@ --- name: fidget-spinner -description: Use Fidget Spinner as the local system of record for structured research and optimization work. Read health, schema, and frontier state first; prefer cheap off-path DAG writes; reserve atomic experiment closure for benchmarked core-path work. +description: Use Fidget Spinner as the local system of record for source capture, hypothesis tracking, and experiment adjudication. Read health, schema, and frontier state first; keep off-path prose cheap; drive core-path work through hypothesis-owned experiments. --- # Fidget Spinner @@ -22,6 +22,7 @@ Then read: - `tag.list` - `frontier.list` - `frontier.status` for the active frontier +- `experiment.list` if you may be resuming in-flight core-path work Read `project.schema` only when payload authoring, validation rules, or local field vocabulary are actually relevant. When in doubt, start with @@ -31,6 +32,7 @@ If you need more context, pull it from: - `node.list` - `node.read` +- `experiment.read` ## Posture @@ -38,6 +40,8 @@ If you need more context, pull it from: - frontier state is a derived projection - project payload validation is warning-heavy at ingest - annotations are sidecar and hidden by default +- `source` and `note` are off-path memory +- `hypothesis` and `experiment` are the disciplined core path ## Choose The Cheapest Tool @@ -45,19 +49,35 @@ If you need more context, pull it from: - `tag.list` before inventing note tags by memory - `schema.field.upsert` when one project payload field needs to become canonical without hand-editing `schema.json` - `schema.field.remove` when one project payload field definition should be purged cleanly -- `research.record` for exploratory work, design notes, dead ends, and enabling ideas; always pass `title`, `summary`, and `body`, and pass `tags` when the research belongs in a campaign/subsystem index -- `note.quick` for terse state pushes, always with an explicit `tags` list plus `title`, `summary`, and `body`; use `[]` only when no registered tag applies +- `source.record` for imported source material, documentary context, or one substantial source digest; always pass `title`, `summary`, and `body`, and pass `tags` when the source belongs in a campaign/subsystem index +- `note.quick` for atomic reusable takeaways, always with an explicit `tags` list plus `title`, `summary`, and `body`; use `[]` only when no registered tag applies +- `hypothesis.record` before core-path work; every experiment must hang off exactly one hypothesis +- `experiment.open` once a hypothesis has a concrete base checkpoint and is ready to be tested +- `experiment.list` or `experiment.read` when resuming a session and you need to recover open experimental state - `metric.define` when a project-level metric key needs a canonical unit, objective, or human description - `run.dimension.define` when a new experiment slicer such as `scenario` or `duration_s` becomes query-worthy - `run.dimension.list` before guessing which run dimensions actually exist in the store - `metric.keys` before guessing which numeric signals are actually rankable; pass exact run-dimension filters when narrowing to one workload slice - `metric.best` when you need the best closed experiments by one numeric key; pass `order` for noncanonical payload fields and exact run-dimension filters when comparing one slice - `node.annotate` for scratch text that should stay off the main path -- `change.record` before core-path work -- `experiment.close` only when you have checkpoint, measured result, note, and verdict +- `experiment.close` only for an already-open experiment and only when you have checkpoint, measured result, note, and verdict; attach `analysis` when the result needs explicit interpretation - `node.archive` to hide stale detritus without deleting evidence - `node.create` only as a true escape hatch +## Workflow + +1. Preserve source texture with `source.record` only when keeping the source itself matters. +2. Extract reusable claims into `note.quick`. +3. State the intended intervention with `hypothesis.record`. +4. Open a live experiment with `experiment.open`. +5. Do the work. +6. Close the experiment with `experiment.close`, including metrics, verdict, and optional analysis. + +Do not dump a whole markdown tranche into one giant prose node and call that progress. +If a later agent should enumerate it by tag or node list, it should usually be a `note.quick`. +If the point is to preserve or digest a source document, it should be `source.record`. +If the point is to test a claim, it should become a hypothesis plus an experiment. + ## Discipline 1. Pull context from the DAG, not from sprawling prompt prose. @@ -70,7 +90,11 @@ If you need more context, pull it from: 6. Treat metric keys as project-level registry entries and run dimensions as the first-class slice surface for experiment comparison; do not encode scenario context into the metric key itself. -7. Porcelain is the terse triage surface. Use `detail=full` only when concise +7. A source node is not a dumping ground for every thought spawned by that source. + Preserve one source digest if needed, then extract reusable claims into notes. +8. A hypothesis is not an experiment. Open the experiment explicitly; do not + smuggle “planned work” into off-path prose. +9. Porcelain is the terse triage surface. Use `detail=full` only when concise output stops being decision-sufficient. -8. When the task becomes a true indefinite optimization push, pair this skill - with `frontier-loop`. +10. When the task becomes a true indefinite optimization push, pair this skill + with `frontier-loop`. diff --git a/assets/codex-skills/frontier-loop/SKILL.md b/assets/codex-skills/frontier-loop/SKILL.md index b3ea44d..e8f1b94 100644 --- a/assets/codex-skills/frontier-loop/SKILL.md +++ b/assets/codex-skills/frontier-loop/SKILL.md @@ -62,7 +62,7 @@ ASSUME YOU ARE RUNNING OVERNIGHT. 2. Study existing evidence from `fidget-spinner`. 3. Search outward if the local frontier looks exhausted or you are starting to take unambitious strides. 4. Form a strong, falsifiable hypothesis. -5. Make the change. +5. Record the hypothesis and open the experiment. 6. Measure it. 7. If the result is surprising, noisy, or broken, debug the implementation and rerun only enough to understand the outcome. @@ -99,11 +99,12 @@ Every real experiment must leave an auditable record in `fidget-spinner`. If something matters to the frontier, put it in the DAG. -Use off-path records liberally for enabling work, side investigations, and dead +Use off-path records liberally for source capture, side investigations, and dead ends. When a line becomes a real measured experiment, close it through the proper -`fidget-spinner` path instead of improvising a chain of half-recorded steps. +`fidget-spinner` path instead of improvising a chain of half-recorded steps: +`hypothesis.record` -> `experiment.open` -> `experiment.close`. ## Resume Discipline diff --git a/crates/fidget-spinner-cli/src/main.rs b/crates/fidget-spinner-cli/src/main.rs index 7711cb4..491e30d 100644 --- a/crates/fidget-spinner-cli/src/main.rs +++ b/crates/fidget-spinner-cli/src/main.rs @@ -17,9 +17,10 @@ use fidget_spinner_core::{ }; use fidget_spinner_store_sqlite::{ CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest, - DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ListNodesQuery, - MetricBestQuery, MetricFieldSource, MetricKeyQuery, MetricRankOrder, ProjectStore, - RemoveSchemaFieldRequest, STORE_DIR_NAME, StoreError, UpsertSchemaFieldRequest, + DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ExperimentAnalysisDraft, + ListNodesQuery, MetricBestQuery, MetricFieldSource, MetricKeyQuery, MetricRankOrder, + OpenExperimentRequest, ProjectStore, RemoveSchemaFieldRequest, STORE_DIR_NAME, StoreError, + UpsertSchemaFieldRequest, }; use serde::Serialize; use serde_json::{Map, Value, json}; @@ -57,13 +58,15 @@ enum Command { }, /// Record terse off-path notes. Note(NoteCommand), + /// Record core-path hypotheses before experimental work begins. + Hypothesis(HypothesisCommand), /// Manage the repo-local tag registry. Tag { #[command(subcommand)] command: TagCommand, }, - /// Record off-path research and enabling work. - Research(ResearchCommand), + /// Record imported sources and documentary context. + Source(SourceCommand), /// Inspect rankable metrics across closed experiments. Metric { #[command(subcommand)] @@ -186,10 +189,10 @@ struct NodeAddArgs { #[arg(long)] title: String, #[arg(long)] - /// Required for `note` and `research` nodes. + /// Required for `note` and `source` nodes. summary: Option<String>, #[arg(long = "payload-json")] - /// JSON object payload. `note` and `research` nodes require a non-empty `body` string. + /// JSON object payload. `note` and `source` nodes require a non-empty `body` string. payload_json: Option<String>, #[arg(long = "payload-file")] payload_file: Option<PathBuf>, @@ -263,6 +266,12 @@ struct NoteCommand { command: NoteSubcommand, } +#[derive(Args)] +struct HypothesisCommand { + #[command(subcommand)] + command: HypothesisSubcommand, +} + #[derive(Subcommand)] enum NoteSubcommand { /// Record a quick off-path note. @@ -270,6 +279,12 @@ enum NoteSubcommand { } #[derive(Subcommand)] +enum HypothesisSubcommand { + /// Record a core-path hypothesis with low ceremony. + Add(QuickHypothesisArgs), +} + +#[derive(Subcommand)] enum TagCommand { /// Register a new repo-local tag. Add(TagAddArgs), @@ -278,15 +293,15 @@ enum TagCommand { } #[derive(Args)] -struct ResearchCommand { +struct SourceCommand { #[command(subcommand)] - command: ResearchSubcommand, + command: SourceSubcommand, } #[derive(Subcommand)] -enum ResearchSubcommand { - /// Record off-path research or enabling work. - Add(QuickResearchArgs), +enum SourceSubcommand { + /// Record imported source material or documentary context. + Add(QuickSourceArgs), } #[derive(Subcommand)] @@ -376,6 +391,22 @@ struct QuickNoteArgs { } #[derive(Args)] +struct QuickHypothesisArgs { + #[command(flatten)] + project: ProjectArg, + #[arg(long)] + frontier: String, + #[arg(long)] + title: String, + #[arg(long)] + summary: String, + #[arg(long)] + body: String, + #[arg(long = "parent")] + parents: Vec<String>, +} + +#[derive(Args)] struct TagAddArgs { #[command(flatten)] project: ProjectArg, @@ -386,7 +417,7 @@ struct TagAddArgs { } #[derive(Args)] -struct QuickResearchArgs { +struct QuickSourceArgs { #[command(flatten)] project: ProjectArg, #[arg(long)] @@ -459,8 +490,12 @@ struct MetricBestArgs { #[derive(Subcommand)] enum ExperimentCommand { + /// Open a stateful experiment against one hypothesis and base checkpoint. + Open(ExperimentOpenArgs), + /// List open experiments, optionally narrowed to one frontier. + List(ExperimentListArgs), /// Close a core-path experiment with checkpoint, run, note, and verdict. - Close(ExperimentCloseArgs), + Close(Box<ExperimentCloseArgs>), } #[derive(Subcommand)] @@ -481,12 +516,8 @@ enum UiCommand { struct ExperimentCloseArgs { #[command(flatten)] project: ProjectArg, - #[arg(long)] - frontier: String, - #[arg(long = "base-checkpoint")] - base_checkpoint: String, - #[arg(long = "change-node")] - change_node: String, + #[arg(long = "experiment")] + experiment_id: String, #[arg(long = "candidate-summary")] candidate_summary: String, #[arg(long = "run-title")] @@ -518,12 +549,42 @@ struct ExperimentCloseArgs { next_hypotheses: Vec<String>, #[arg(long = "verdict", value_enum)] verdict: CliFrontierVerdict, + #[arg(long = "analysis-title")] + analysis_title: Option<String>, + #[arg(long = "analysis-summary")] + analysis_summary: Option<String>, + #[arg(long = "analysis-body")] + analysis_body: Option<String>, #[arg(long = "decision-title")] decision_title: String, #[arg(long = "decision-rationale")] decision_rationale: String, } +#[derive(Args)] +struct ExperimentOpenArgs { + #[command(flatten)] + project: ProjectArg, + #[arg(long)] + frontier: String, + #[arg(long = "base-checkpoint")] + base_checkpoint: String, + #[arg(long = "hypothesis-node")] + hypothesis_node: String, + #[arg(long)] + title: String, + #[arg(long)] + summary: Option<String>, +} + +#[derive(Args)] +struct ExperimentListArgs { + #[command(flatten)] + project: ProjectArg, + #[arg(long)] + frontier: Option<String>, +} + #[derive(Subcommand)] enum SkillCommand { /// List bundled skills. @@ -588,12 +649,11 @@ struct UiServeArgs { #[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] enum CliNodeClass { Contract, - Change, + Hypothesis, Run, Analysis, Decision, - Research, - Enabling, + Source, Note, } @@ -623,7 +683,7 @@ enum CliExecutionBackend { #[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] enum CliMetricSource { RunMetric, - ChangePayload, + HypothesisPayload, RunPayload, AnalysisPayload, DecisionPayload, @@ -713,12 +773,15 @@ fn run() -> Result<(), StoreError> { Command::Note(command) => match command.command { NoteSubcommand::Quick(args) => run_quick_note(args), }, + Command::Hypothesis(command) => match command.command { + HypothesisSubcommand::Add(args) => run_quick_hypothesis(args), + }, Command::Tag { command } => match command { TagCommand::Add(args) => run_tag_add(args), TagCommand::List(project) => run_tag_list(project), }, - Command::Research(command) => match command.command { - ResearchSubcommand::Add(args) => run_quick_research(args), + Command::Source(command) => match command.command { + SourceSubcommand::Add(args) => run_quick_source(args), }, Command::Metric { command } => match command { MetricCommand::Define(args) => run_metric_define(args), @@ -731,7 +794,9 @@ fn run() -> Result<(), StoreError> { DimensionCommand::List(project) => run_dimension_list(project), }, Command::Experiment { command } => match command { - ExperimentCommand::Close(args) => run_experiment_close(args), + ExperimentCommand::Open(args) => run_experiment_open(args), + ExperimentCommand::List(args) => run_experiment_list(args), + ExperimentCommand::Close(args) => run_experiment_close(*args), }, Command::Mcp { command } => match command { McpCommand::Serve(args) => mcp::serve(args.project), @@ -942,6 +1007,25 @@ fn run_quick_note(args: QuickNoteArgs) -> Result<(), StoreError> { print_json(&node) } +fn run_quick_hypothesis(args: QuickHypothesisArgs) -> Result<(), StoreError> { + let mut store = open_store(&args.project.project)?; + let payload = NodePayload::with_schema( + store.schema().schema_ref(), + json_object(json!({ "body": args.body }))?, + ); + let node = store.add_node(CreateNodeRequest { + class: NodeClass::Hypothesis, + frontier_id: Some(parse_frontier_id(&args.frontier)?), + title: NonEmptyText::new(args.title)?, + summary: Some(NonEmptyText::new(args.summary)?), + tags: None, + payload, + annotations: Vec::new(), + attachments: lineage_attachments(args.parents)?, + })?; + print_json(&node) +} + fn run_tag_add(args: TagAddArgs) -> Result<(), StoreError> { let mut store = open_store(&args.project.project)?; let tag = store.add_tag( @@ -956,14 +1040,14 @@ fn run_tag_list(args: ProjectArg) -> Result<(), StoreError> { print_json(&store.list_tags()?) } -fn run_quick_research(args: QuickResearchArgs) -> Result<(), StoreError> { +fn run_quick_source(args: QuickSourceArgs) -> Result<(), StoreError> { let mut store = open_store(&args.project.project)?; let payload = NodePayload::with_schema( store.schema().schema_ref(), json_object(json!({ "body": args.body }))?, ); let node = store.add_node(CreateNodeRequest { - class: NodeClass::Research, + class: NodeClass::Source, frontier_id: args .frontier .as_deref() @@ -1042,9 +1126,31 @@ fn run_dimension_list(args: ProjectArg) -> Result<(), StoreError> { print_json(&store.list_run_dimensions()?) } +fn run_experiment_open(args: ExperimentOpenArgs) -> Result<(), StoreError> { + let mut store = open_store(&args.project.project)?; + let summary = args.summary.map(NonEmptyText::new).transpose()?; + let experiment = store.open_experiment(OpenExperimentRequest { + frontier_id: parse_frontier_id(&args.frontier)?, + base_checkpoint_id: parse_checkpoint_id(&args.base_checkpoint)?, + hypothesis_node_id: parse_node_id(&args.hypothesis_node)?, + title: NonEmptyText::new(args.title)?, + summary, + })?; + print_json(&experiment) +} + +fn run_experiment_list(args: ExperimentListArgs) -> Result<(), StoreError> { + let store = open_store(&args.project.project)?; + let frontier_id = args + .frontier + .as_deref() + .map(parse_frontier_id) + .transpose()?; + print_json(&store.list_open_experiments(frontier_id)?) +} + fn run_experiment_close(args: ExperimentCloseArgs) -> Result<(), StoreError> { let mut store = open_store(&args.project.project)?; - let frontier_id = parse_frontier_id(&args.frontier)?; let snapshot = store .auto_capture_checkpoint(NonEmptyText::new(args.candidate_summary.clone())?)? .map(|seed| seed.snapshot) @@ -1058,10 +1164,28 @@ fn run_experiment_close(args: ExperimentCloseArgs) -> Result<(), StoreError> { to_text_vec(args.argv)?, parse_env(args.env), )?; + let analysis = match ( + args.analysis_title, + args.analysis_summary, + args.analysis_body, + ) { + (Some(title), Some(summary), Some(body)) => Some(ExperimentAnalysisDraft { + title: NonEmptyText::new(title)?, + summary: NonEmptyText::new(summary)?, + body: NonEmptyText::new(body)?, + }), + (None, None, None) => None, + _ => { + return Err(StoreError::Json(serde_json::Error::io( + std::io::Error::new( + std::io::ErrorKind::InvalidInput, + "analysis-title, analysis-summary, and analysis-body must be provided together", + ), + ))); + } + }; let receipt = store.close_experiment(CloseExperimentRequest { - frontier_id, - base_checkpoint_id: parse_checkpoint_id(&args.base_checkpoint)?, - change_node_id: parse_node_id(&args.change_node)?, + experiment_id: parse_experiment_id(&args.experiment_id)?, candidate_summary: NonEmptyText::new(args.candidate_summary)?, candidate_snapshot: snapshot, run_title: NonEmptyText::new(args.run_title)?, @@ -1081,9 +1205,9 @@ fn run_experiment_close(args: ExperimentCloseArgs) -> Result<(), StoreError> { next_hypotheses: to_text_vec(args.next_hypotheses)?, }, verdict: args.verdict.into(), + analysis, decision_title: NonEmptyText::new(args.decision_title)?, decision_rationale: NonEmptyText::new(args.decision_rationale)?, - analysis_node_id: None, })?; print_json(&receipt) } @@ -1378,7 +1502,7 @@ fn validate_cli_prose_payload( summary: Option<&str>, payload: &NodePayload, ) -> Result<(), StoreError> { - if !matches!(class, NodeClass::Note | NodeClass::Research) { + if !matches!(class, NodeClass::Note | NodeClass::Source) { return Ok(()); } if summary.is_none() { @@ -1584,6 +1708,12 @@ fn parse_checkpoint_id(raw: &str) -> Result<fidget_spinner_core::CheckpointId, S )) } +fn parse_experiment_id(raw: &str) -> Result<fidget_spinner_core::ExperimentId, StoreError> { + Ok(fidget_spinner_core::ExperimentId::from_uuid( + Uuid::parse_str(raw)?, + )) +} + fn print_json<T: Serialize>(value: &T) -> Result<(), StoreError> { println!("{}", to_pretty_json(value)?); Ok(()) @@ -1604,12 +1734,11 @@ impl From<CliNodeClass> for NodeClass { fn from(value: CliNodeClass) -> Self { match value { CliNodeClass::Contract => Self::Contract, - CliNodeClass::Change => Self::Change, + CliNodeClass::Hypothesis => Self::Hypothesis, CliNodeClass::Run => Self::Run, CliNodeClass::Analysis => Self::Analysis, CliNodeClass::Decision => Self::Decision, - CliNodeClass::Research => Self::Research, - CliNodeClass::Enabling => Self::Enabling, + CliNodeClass::Source => Self::Source, CliNodeClass::Note => Self::Note, } } @@ -1651,7 +1780,7 @@ impl From<CliMetricSource> for MetricFieldSource { fn from(value: CliMetricSource) -> Self { match value { CliMetricSource::RunMetric => Self::RunMetric, - CliMetricSource::ChangePayload => Self::ChangePayload, + CliMetricSource::HypothesisPayload => Self::HypothesisPayload, CliMetricSource::RunPayload => Self::RunPayload, CliMetricSource::AnalysisPayload => Self::AnalysisPayload, CliMetricSource::DecisionPayload => Self::DecisionPayload, diff --git a/crates/fidget-spinner-cli/src/mcp/catalog.rs b/crates/fidget-spinner-cli/src/mcp/catalog.rs index 0831ba4..3b8abcc 100644 --- a/crates/fidget-spinner-cli/src/mcp/catalog.rs +++ b/crates/fidget-spinner-cli/src/mcp/catalog.rs @@ -115,9 +115,9 @@ pub(crate) fn tool_spec(name: &str) -> Option<ToolSpec> { dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), - "change.record" => Some(ToolSpec { - name: "change.record", - description: "Record a core-path change hypothesis with low ceremony.", + "hypothesis.record" => Some(ToolSpec { + name: "hypothesis.record", + description: "Record a core-path hypothesis with low ceremony.", dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), @@ -151,9 +151,9 @@ pub(crate) fn tool_spec(name: &str) -> Option<ToolSpec> { dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), - "research.record" => Some(ToolSpec { - name: "research.record", - description: "Record off-path research or enabling work that should live in the DAG but not on the bureaucratic core path.", + "source.record" => Some(ToolSpec { + name: "source.record", + description: "Record imported sources and documentary context that should live in the DAG without polluting the core path.", dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), @@ -193,9 +193,27 @@ pub(crate) fn tool_spec(name: &str) -> Option<ToolSpec> { dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), + "experiment.open" => Some(ToolSpec { + name: "experiment.open", + description: "Open a stateful experiment against one hypothesis and one base checkpoint.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::NeverReplay, + }), + "experiment.list" => Some(ToolSpec { + name: "experiment.list", + description: "List currently open experiments, optionally narrowed to one frontier.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::Convergent, + }), + "experiment.read" => Some(ToolSpec { + name: "experiment.read", + description: "Read one currently open experiment by id.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::Convergent, + }), "experiment.close" => Some(ToolSpec { name: "experiment.close", - description: "Atomically close a core-path experiment with typed run dimensions, preregistered metric observations, candidate checkpoint capture, note, and verdict.", + description: "Close one open experiment with typed run dimensions, preregistered metric observations, candidate checkpoint capture, optional analysis, note, and verdict.", dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), @@ -268,19 +286,22 @@ pub(crate) fn tool_definitions() -> Vec<Value> { "frontier.status", "frontier.init", "node.create", - "change.record", + "hypothesis.record", "node.list", "node.read", "node.annotate", "node.archive", "note.quick", - "research.record", + "source.record", "metric.define", "run.dimension.define", "run.dimension.list", "metric.keys", "metric.best", "metric.migrate", + "experiment.open", + "experiment.list", + "experiment.read", "experiment.close", "skill.list", "skill.show", @@ -414,29 +435,26 @@ fn input_schema(name: &str) -> Value { "class": node_class_schema(), "frontier_id": { "type": "string" }, "title": { "type": "string" }, - "summary": { "type": "string", "description": "Required for `note` and `research` nodes." }, + "summary": { "type": "string", "description": "Required for `note` and `source` nodes." }, "tags": { "type": "array", "items": tag_name_schema(), "description": "Required for `note` nodes; optional for other classes." }, - "payload": { "type": "object", "description": "`note` and `research` nodes require a non-empty string `body` field." }, + "payload": { "type": "object", "description": "`note` and `source` nodes require a non-empty string `body` field." }, "annotations": { "type": "array", "items": annotation_schema() }, "parents": { "type": "array", "items": { "type": "string" } } }, "required": ["class", "title"], "additionalProperties": false }), - "change.record" => json!({ + "hypothesis.record" => json!({ "type": "object", "properties": { "frontier_id": { "type": "string" }, "title": { "type": "string" }, "summary": { "type": "string" }, "body": { "type": "string" }, - "hypothesis": { "type": "string" }, - "base_checkpoint_id": { "type": "string" }, - "benchmark_suite": { "type": "string" }, "annotations": { "type": "array", "items": annotation_schema() }, "parents": { "type": "array", "items": { "type": "string" } } }, - "required": ["frontier_id", "title", "body"], + "required": ["frontier_id", "title", "summary", "body"], "additionalProperties": false }), "node.list" => json!({ @@ -483,7 +501,7 @@ fn input_schema(name: &str) -> Value { "required": ["title", "summary", "body", "tags"], "additionalProperties": false }), - "research.record" => json!({ + "source.record" => json!({ "type": "object", "properties": { "frontier_id": { "type": "string" }, @@ -540,12 +558,37 @@ fn input_schema(name: &str) -> Value { "required": ["key"], "additionalProperties": false }), - "experiment.close" => json!({ + "experiment.open" => json!({ "type": "object", "properties": { "frontier_id": { "type": "string" }, "base_checkpoint_id": { "type": "string" }, - "change_node_id": { "type": "string" }, + "hypothesis_node_id": { "type": "string" }, + "title": { "type": "string" }, + "summary": { "type": "string" } + }, + "required": ["frontier_id", "base_checkpoint_id", "hypothesis_node_id", "title"], + "additionalProperties": false + }), + "experiment.list" => json!({ + "type": "object", + "properties": { + "frontier_id": { "type": "string" } + }, + "additionalProperties": false + }), + "experiment.read" => json!({ + "type": "object", + "properties": { + "experiment_id": { "type": "string" } + }, + "required": ["experiment_id"], + "additionalProperties": false + }), + "experiment.close" => json!({ + "type": "object", + "properties": { + "experiment_id": { "type": "string" }, "candidate_summary": { "type": "string" }, "run": run_schema(), "primary_metric": metric_value_schema(), @@ -554,12 +597,10 @@ fn input_schema(name: &str) -> Value { "verdict": verdict_schema(), "decision_title": { "type": "string" }, "decision_rationale": { "type": "string" }, - "analysis_node_id": { "type": "string" } + "analysis": analysis_schema() }, "required": [ - "frontier_id", - "base_checkpoint_id", - "change_node_id", + "experiment_id", "candidate_summary", "run", "primary_metric", @@ -612,6 +653,19 @@ fn annotation_schema() -> Value { }) } +fn analysis_schema() -> Value { + json!({ + "type": "object", + "properties": { + "title": { "type": "string" }, + "summary": { "type": "string" }, + "body": { "type": "string" } + }, + "required": ["title", "summary", "body"], + "additionalProperties": false + }) +} + fn tag_name_schema() -> Value { json!({ "type": "string", @@ -622,7 +676,7 @@ fn tag_name_schema() -> Value { fn node_class_schema() -> Value { json!({ "type": "string", - "enum": ["contract", "change", "run", "analysis", "decision", "research", "enabling", "note"] + "enum": ["contract", "hypothesis", "run", "analysis", "decision", "source", "note"] }) } @@ -638,7 +692,7 @@ fn metric_source_schema() -> Value { "type": "string", "enum": [ "run_metric", - "change_payload", + "hypothesis_payload", "run_payload", "analysis_payload", "decision_payload" diff --git a/crates/fidget-spinner-cli/src/mcp/service.rs b/crates/fidget-spinner-cli/src/mcp/service.rs index 62e3641..05f2382 100644 --- a/crates/fidget-spinner-cli/src/mcp/service.rs +++ b/crates/fidget-spinner-cli/src/mcp/service.rs @@ -11,10 +11,10 @@ use fidget_spinner_core::{ }; use fidget_spinner_store_sqlite::{ CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest, - DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ExperimentReceipt, - ListNodesQuery, MetricBestQuery, MetricFieldSource, MetricKeyQuery, MetricKeySummary, - MetricRankOrder, NodeSummary, ProjectStore, RemoveSchemaFieldRequest, StoreError, - UpsertSchemaFieldRequest, + DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ExperimentAnalysisDraft, + ExperimentReceipt, ListNodesQuery, MetricBestQuery, MetricFieldSource, MetricKeyQuery, + MetricKeySummary, MetricRankOrder, NodeSummary, OpenExperimentRequest, OpenExperimentSummary, + ProjectStore, RemoveSchemaFieldRequest, StoreError, UpsertSchemaFieldRequest, }; use serde::Deserialize; use serde_json::{Map, Value, json}; @@ -303,51 +303,43 @@ impl WorkerService { "tools/call:node.create", ) } - "change.record" => { - let args = deserialize::<ChangeRecordToolArgs>(arguments)?; - let mut fields = Map::new(); - let _ = fields.insert("body".to_owned(), Value::String(args.body)); - if let Some(hypothesis) = args.hypothesis { - let _ = fields.insert("hypothesis".to_owned(), Value::String(hypothesis)); - } - if let Some(base_checkpoint_id) = args.base_checkpoint_id { - let _ = fields.insert( - "base_checkpoint_id".to_owned(), - Value::String(base_checkpoint_id), - ); - } - if let Some(benchmark_suite) = args.benchmark_suite { - let _ = - fields.insert("benchmark_suite".to_owned(), Value::String(benchmark_suite)); - } + "hypothesis.record" => { + let args = deserialize::<HypothesisRecordToolArgs>(arguments)?; let node = self .store .add_node(CreateNodeRequest { - class: NodeClass::Change, + class: NodeClass::Hypothesis, frontier_id: Some( crate::parse_frontier_id(&args.frontier_id) - .map_err(store_fault("tools/call:change.record"))?, + .map_err(store_fault("tools/call:hypothesis.record"))?, ), title: NonEmptyText::new(args.title) - .map_err(store_fault("tools/call:change.record"))?, - summary: args - .summary - .map(NonEmptyText::new) - .transpose() - .map_err(store_fault("tools/call:change.record"))?, + .map_err(store_fault("tools/call:hypothesis.record"))?, + summary: Some( + NonEmptyText::new(args.summary) + .map_err(store_fault("tools/call:hypothesis.record"))?, + ), tags: None, - payload: NodePayload::with_schema(self.store.schema().schema_ref(), fields), + payload: NodePayload::with_schema( + self.store.schema().schema_ref(), + crate::json_object(json!({ "body": args.body })) + .map_err(store_fault("tools/call:hypothesis.record"))?, + ), annotations: tool_annotations(args.annotations) - .map_err(store_fault("tools/call:change.record"))?, + .map_err(store_fault("tools/call:hypothesis.record"))?, attachments: lineage_attachments(args.parents) - .map_err(store_fault("tools/call:change.record"))?, + .map_err(store_fault("tools/call:hypothesis.record"))?, }) - .map_err(store_fault("tools/call:change.record"))?; + .map_err(store_fault("tools/call:hypothesis.record"))?; tool_success( - created_node_output("recorded change", &node, "tools/call:change.record")?, + created_node_output( + "recorded hypothesis", + &node, + "tools/call:hypothesis.record", + )?, presentation, FaultStage::Worker, - "tools/call:change.record", + "tools/call:hypothesis.record", ) } "node.list" => { @@ -498,44 +490,45 @@ impl WorkerService { "tools/call:note.quick", ) } - "research.record" => { - let args = deserialize::<ResearchRecordToolArgs>(arguments)?; + "source.record" => { + let args = deserialize::<SourceRecordToolArgs>(arguments)?; let node = self .store .add_node(CreateNodeRequest { - class: NodeClass::Research, + class: NodeClass::Source, frontier_id: args .frontier_id .as_deref() .map(crate::parse_frontier_id) .transpose() - .map_err(store_fault("tools/call:research.record"))?, + .map_err(store_fault("tools/call:source.record"))?, title: NonEmptyText::new(args.title) - .map_err(store_fault("tools/call:research.record"))?, + .map_err(store_fault("tools/call:source.record"))?, summary: Some( NonEmptyText::new(args.summary) - .map_err(store_fault("tools/call:research.record"))?, - ), - tags: Some( - parse_tag_set(args.tags) - .map_err(store_fault("tools/call:research.record"))?, + .map_err(store_fault("tools/call:source.record"))?, ), + tags: args + .tags + .map(parse_tag_set) + .transpose() + .map_err(store_fault("tools/call:source.record"))?, payload: NodePayload::with_schema( self.store.schema().schema_ref(), crate::json_object(json!({ "body": args.body })) - .map_err(store_fault("tools/call:research.record"))?, + .map_err(store_fault("tools/call:source.record"))?, ), annotations: tool_annotations(args.annotations) - .map_err(store_fault("tools/call:research.record"))?, + .map_err(store_fault("tools/call:source.record"))?, attachments: lineage_attachments(args.parents) - .map_err(store_fault("tools/call:research.record"))?, + .map_err(store_fault("tools/call:source.record"))?, }) - .map_err(store_fault("tools/call:research.record"))?; + .map_err(store_fault("tools/call:source.record"))?; tool_success( - created_node_output("recorded research", &node, "tools/call:research.record")?, + created_node_output("recorded source", &node, "tools/call:source.record")?, presentation, FaultStage::Worker, - "tools/call:research.record", + "tools/call:source.record", ) } "metric.define" => { @@ -702,10 +695,74 @@ impl WorkerService { "tools/call:metric.migrate", ) } + "experiment.open" => { + let args = deserialize::<ExperimentOpenToolArgs>(arguments)?; + let item = self + .store + .open_experiment(OpenExperimentRequest { + frontier_id: crate::parse_frontier_id(&args.frontier_id) + .map_err(store_fault("tools/call:experiment.open"))?, + base_checkpoint_id: crate::parse_checkpoint_id(&args.base_checkpoint_id) + .map_err(store_fault("tools/call:experiment.open"))?, + hypothesis_node_id: crate::parse_node_id(&args.hypothesis_node_id) + .map_err(store_fault("tools/call:experiment.open"))?, + title: NonEmptyText::new(args.title) + .map_err(store_fault("tools/call:experiment.open"))?, + summary: args + .summary + .map(NonEmptyText::new) + .transpose() + .map_err(store_fault("tools/call:experiment.open"))?, + }) + .map_err(store_fault("tools/call:experiment.open"))?; + tool_success( + experiment_open_output( + &item, + "tools/call:experiment.open", + "opened experiment", + )?, + presentation, + FaultStage::Worker, + "tools/call:experiment.open", + ) + } + "experiment.list" => { + let args = deserialize::<ExperimentListToolArgs>(arguments)?; + let items = self + .store + .list_open_experiments( + args.frontier_id + .as_deref() + .map(crate::parse_frontier_id) + .transpose() + .map_err(store_fault("tools/call:experiment.list"))?, + ) + .map_err(store_fault("tools/call:experiment.list"))?; + tool_success( + experiment_list_output(items.as_slice())?, + presentation, + FaultStage::Worker, + "tools/call:experiment.list", + ) + } + "experiment.read" => { + let args = deserialize::<ExperimentReadToolArgs>(arguments)?; + let item = self + .store + .read_open_experiment( + crate::parse_experiment_id(&args.experiment_id) + .map_err(store_fault("tools/call:experiment.read"))?, + ) + .map_err(store_fault("tools/call:experiment.read"))?; + tool_success( + experiment_open_output(&item, "tools/call:experiment.read", "open experiment")?, + presentation, + FaultStage::Worker, + "tools/call:experiment.read", + ) + } "experiment.close" => { let args = deserialize::<ExperimentCloseToolArgs>(arguments)?; - let frontier_id = crate::parse_frontier_id(&args.frontier_id) - .map_err(store_fault("tools/call:experiment.close"))?; let snapshot = self .store .auto_capture_checkpoint( @@ -728,10 +785,7 @@ impl WorkerService { let receipt = self .store .close_experiment(CloseExperimentRequest { - frontier_id, - base_checkpoint_id: crate::parse_checkpoint_id(&args.base_checkpoint_id) - .map_err(store_fault("tools/call:experiment.close"))?, - change_node_id: crate::parse_node_id(&args.change_node_id) + experiment_id: crate::parse_experiment_id(&args.experiment_id) .map_err(store_fault("tools/call:experiment.close"))?, candidate_summary: NonEmptyText::new(args.candidate_summary) .map_err(store_fault("tools/call:experiment.close"))?, @@ -776,16 +830,15 @@ impl WorkerService { }, verdict: parse_verdict_name(&args.verdict) .map_err(store_fault("tools/call:experiment.close"))?, + analysis: args + .analysis + .map(experiment_analysis_from_wire) + .transpose() + .map_err(store_fault("tools/call:experiment.close"))?, decision_title: NonEmptyText::new(args.decision_title) .map_err(store_fault("tools/call:experiment.close"))?, decision_rationale: NonEmptyText::new(args.decision_rationale) .map_err(store_fault("tools/call:experiment.close"))?, - analysis_node_id: args - .analysis_node_id - .as_deref() - .map(crate::parse_node_id) - .transpose() - .map_err(store_fault("tools/call:experiment.close"))?, }) .map_err(store_fault("tools/call:experiment.close"))?; tool_success( @@ -1296,6 +1349,7 @@ fn experiment_close_output( "candidate_checkpoint_id": receipt.experiment.candidate_checkpoint_id, "verdict": format!("{:?}", receipt.experiment.verdict).to_ascii_lowercase(), "run_id": receipt.run.run_id, + "hypothesis_node_id": receipt.experiment.hypothesis_node_id, "decision_node_id": receipt.decision_node.id, "dimensions": run_dimensions_value(&receipt.experiment.result.dimensions), "primary_metric": metric_value(store, &receipt.experiment.result.primary_metric)?, @@ -1308,6 +1362,7 @@ fn experiment_close_output( "closed experiment {} on frontier {}", receipt.experiment.id, receipt.experiment.frontier_id ), + format!("hypothesis: {}", receipt.experiment.hypothesis_node_id), format!("candidate: {}", receipt.experiment.candidate_checkpoint_id), format!( "verdict: {}", @@ -1330,6 +1385,71 @@ fn experiment_close_output( ) } +fn experiment_open_output( + item: &OpenExperimentSummary, + operation: &'static str, + action: &'static str, +) -> Result<ToolOutput, FaultRecord> { + let concise = json!({ + "experiment_id": item.id, + "frontier_id": item.frontier_id, + "base_checkpoint_id": item.base_checkpoint_id, + "hypothesis_node_id": item.hypothesis_node_id, + "title": item.title, + "summary": item.summary, + }); + detailed_tool_output( + &concise, + item, + [ + format!("{action} {}", item.id), + format!("frontier: {}", item.frontier_id), + format!("hypothesis: {}", item.hypothesis_node_id), + format!("base checkpoint: {}", item.base_checkpoint_id), + format!("title: {}", item.title), + item.summary + .as_ref() + .map(|summary| format!("summary: {summary}")) + .unwrap_or_else(|| "summary: <none>".to_owned()), + ] + .join("\n"), + None, + FaultStage::Worker, + operation, + ) +} + +fn experiment_list_output(items: &[OpenExperimentSummary]) -> Result<ToolOutput, FaultRecord> { + let concise = items + .iter() + .map(|item| { + json!({ + "experiment_id": item.id, + "frontier_id": item.frontier_id, + "base_checkpoint_id": item.base_checkpoint_id, + "hypothesis_node_id": item.hypothesis_node_id, + "title": item.title, + "summary": item.summary, + }) + }) + .collect::<Vec<_>>(); + let mut lines = vec![format!("{} open experiment(s)", items.len())]; + lines.extend(items.iter().map(|item| { + format!( + "{} {} | hypothesis={} | checkpoint={}", + item.id, item.title, item.hypothesis_node_id, item.base_checkpoint_id, + ) + })); + detailed_tool_output( + &concise, + &items, + lines.join("\n"), + None, + FaultStage::Worker, + "tools/call:experiment.list", + ) +} + fn metric_keys_output(keys: &[MetricKeySummary]) -> Result<ToolOutput, FaultRecord> { let concise = keys .iter() @@ -1392,8 +1512,8 @@ fn metric_best_output( "order": item.order.as_str(), "experiment_id": item.experiment_id, "frontier_id": item.frontier_id, - "change_node_id": item.change_node_id, - "change_title": item.change_title, + "hypothesis_node_id": item.hypothesis_node_id, + "hypothesis_title": item.hypothesis_title, "verdict": metric_verdict_name(item.verdict), "candidate_checkpoint_id": item.candidate_checkpoint_id, "candidate_commit_hash": item.candidate_commit_hash, @@ -1412,7 +1532,7 @@ fn metric_best_output( item.key, item.value, item.source.as_str(), - item.change_title, + item.hypothesis_title, metric_verdict_name(item.verdict), item.candidate_commit_hash, item.candidate_checkpoint_id, @@ -1775,7 +1895,7 @@ fn filtered_payload_fields( fields: &Map<String, Value>, ) -> impl Iterator<Item = (&String, &Value)> + '_ { fields.iter().filter(move |(name, _)| { - !matches!(class, NodeClass::Note | NodeClass::Research) || name.as_str() != "body" + !matches!(class, NodeClass::Note | NodeClass::Source) || name.as_str() != "body" }) } @@ -1817,7 +1937,7 @@ fn payload_value_preview(value: &Value) -> Value { } fn is_prose_node(class: NodeClass) -> bool { - matches!(class, NodeClass::Note | NodeClass::Research) + matches!(class, NodeClass::Note | NodeClass::Source) } fn truncated_inline_preview(text: &str, limit: usize) -> String { @@ -2017,6 +2137,14 @@ fn metric_value_from_wire(raw: WireMetricValue) -> Result<MetricValue, StoreErro }) } +fn experiment_analysis_from_wire(raw: WireAnalysis) -> Result<ExperimentAnalysisDraft, StoreError> { + Ok(ExperimentAnalysisDraft { + title: NonEmptyText::new(raw.title)?, + summary: NonEmptyText::new(raw.summary)?, + body: NonEmptyText::new(raw.body)?, + }) +} + fn metric_definition(store: &ProjectStore, key: &NonEmptyText) -> Result<MetricSpec, FaultRecord> { store .list_metric_definitions() @@ -2071,12 +2199,11 @@ fn capture_code_snapshot(project_root: &Utf8Path) -> Result<CodeSnapshotRef, Sto fn parse_node_class_name(raw: &str) -> Result<NodeClass, StoreError> { match raw { "contract" => Ok(NodeClass::Contract), - "change" => Ok(NodeClass::Change), + "hypothesis" => Ok(NodeClass::Hypothesis), "run" => Ok(NodeClass::Run), "analysis" => Ok(NodeClass::Analysis), "decision" => Ok(NodeClass::Decision), - "research" => Ok(NodeClass::Research), - "enabling" => Ok(NodeClass::Enabling), + "source" => Ok(NodeClass::Source), "note" => Ok(NodeClass::Note), other => Err(crate::invalid_input(format!( "unknown node class `{other}`" @@ -2091,7 +2218,7 @@ fn parse_metric_unit_name(raw: &str) -> Result<MetricUnit, StoreError> { fn parse_metric_source_name(raw: &str) -> Result<MetricFieldSource, StoreError> { match raw { "run_metric" => Ok(MetricFieldSource::RunMetric), - "change_payload" => Ok(MetricFieldSource::ChangePayload), + "hypothesis_payload" => Ok(MetricFieldSource::HypothesisPayload), "run_payload" => Ok(MetricFieldSource::RunPayload), "analysis_payload" => Ok(MetricFieldSource::AnalysisPayload), "decision_payload" => Ok(MetricFieldSource::DecisionPayload), @@ -2234,14 +2361,11 @@ struct NodeCreateToolArgs { } #[derive(Debug, Deserialize)] -struct ChangeRecordToolArgs { +struct HypothesisRecordToolArgs { frontier_id: String, title: String, - summary: Option<String>, + summary: String, body: String, - hypothesis: Option<String>, - base_checkpoint_id: Option<String>, - benchmark_suite: Option<String>, #[serde(default)] annotations: Vec<WireAnnotation>, #[serde(default)] @@ -2292,13 +2416,12 @@ struct QuickNoteToolArgs { } #[derive(Debug, Deserialize)] -struct ResearchRecordToolArgs { +struct SourceRecordToolArgs { frontier_id: Option<String>, title: String, summary: String, body: String, - #[serde(default)] - tags: Vec<String>, + tags: Option<Vec<String>>, #[serde(default)] annotations: Vec<WireAnnotation>, #[serde(default)] @@ -2355,10 +2478,27 @@ struct MetricBestToolArgs { } #[derive(Debug, Deserialize)] -struct ExperimentCloseToolArgs { +struct ExperimentOpenToolArgs { frontier_id: String, base_checkpoint_id: String, - change_node_id: String, + hypothesis_node_id: String, + title: String, + summary: Option<String>, +} + +#[derive(Debug, Deserialize, Default)] +struct ExperimentListToolArgs { + frontier_id: Option<String>, +} + +#[derive(Debug, Deserialize)] +struct ExperimentReadToolArgs { + experiment_id: String, +} + +#[derive(Debug, Deserialize)] +struct ExperimentCloseToolArgs { + experiment_id: String, candidate_summary: String, run: WireRun, primary_metric: WireMetricValue, @@ -2368,7 +2508,7 @@ struct ExperimentCloseToolArgs { verdict: String, decision_title: String, decision_rationale: String, - analysis_node_id: Option<String>, + analysis: Option<WireAnalysis>, } #[derive(Debug, Deserialize)] @@ -2403,6 +2543,13 @@ struct WireRun { } #[derive(Debug, Deserialize)] +struct WireAnalysis { + title: String, + summary: String, + body: String, +} + +#[derive(Debug, Deserialize)] struct WireRunCommand { working_directory: Option<String>, argv: Vec<String>, diff --git a/crates/fidget-spinner-cli/tests/mcp_hardening.rs b/crates/fidget-spinner-cli/tests/mcp_hardening.rs index f076d74..0142b77 100644 --- a/crates/fidget-spinner-cli/tests/mcp_hardening.rs +++ b/crates/fidget-spinner-cli/tests/mcp_hardening.rs @@ -397,7 +397,7 @@ fn side_effecting_request_is_not_replayed_after_worker_crash() -> TestResult { None, &[( "FIDGET_SPINNER_MCP_TEST_HOST_CRASH_ONCE_KEY", - "tools/call:research.record".to_owned(), + "tools/call:source.record".to_owned(), )], )?; let _ = harness.initialize()?; @@ -407,7 +407,7 @@ fn side_effecting_request_is_not_replayed_after_worker_crash() -> TestResult { let response = harness.call_tool( 7, - "research.record", + "source.record", json!({ "title": "should not duplicate", "summary": "dedupe check", @@ -710,7 +710,7 @@ fn research_record_accepts_tags_and_filtering() -> TestResult { let research = harness.call_tool( 453, - "research.record", + "source.record", json!({ "title": "ingest tranche", "summary": "Import the next libgrid tranche.", @@ -726,7 +726,7 @@ fn research_record_accepts_tags_and_filtering() -> TestResult { "filtered research nodes", )?; assert_eq!(nodes.len(), 1); - assert_eq!(nodes[0]["class"].as_str(), Some("research")); + assert_eq!(nodes[0]["class"].as_str(), Some("source")); assert_eq!(nodes[0]["tags"][0].as_str(), Some("campaign/libgrid")); Ok(()) } @@ -762,7 +762,7 @@ fn prose_tools_reject_invalid_shapes_over_mcp() -> TestResult { let missing_research_summary = harness.call_tool( 48, - "research.record", + "source.record", json!({ "title": "research only", "body": "body only", @@ -798,7 +798,7 @@ fn prose_tools_reject_invalid_shapes_over_mcp() -> TestResult { 50, "node.create", json!({ - "class": "research", + "class": "source", "title": "missing summary", "payload": { "body": "full research body" }, }), @@ -874,7 +874,7 @@ fn concise_prose_reads_only_surface_payload_field_names() -> TestResult { 532, "node.create", json!({ - "class": "research", + "class": "source", "title": "rich import", "summary": "triage layer only", "payload": { @@ -983,7 +983,7 @@ fn schema_field_tools_mutate_project_schema() -> TestResult { "schema.field.upsert", json!({ "name": "scenario", - "node_classes": ["change", "analysis"], + "node_classes": ["hypothesis", "analysis"], "presence": "recommended", "severity": "warning", "role": "projection_gate", @@ -998,7 +998,7 @@ fn schema_field_tools_mutate_project_schema() -> TestResult { ); assert_eq!( tool_content(&upsert)["field"]["node_classes"], - json!(["change", "analysis"]) + json!(["hypothesis", "analysis"]) ); let schema = harness.call_tool(862, "project.schema", json!({ "detail": "full" }))?; @@ -1013,7 +1013,7 @@ fn schema_field_tools_mutate_project_schema() -> TestResult { "schema.field.remove", json!({ "name": "scenario", - "node_classes": ["change", "analysis"] + "node_classes": ["hypothesis", "analysis"] }), )?; assert_eq!(remove["result"]["isError"].as_bool(), Some(false)); @@ -1041,7 +1041,7 @@ fn bind_open_backfills_legacy_missing_summary() -> TestResult { let mut store = must(ProjectStore::open(&project_root), "open project store")?; let node = must( store.add_node(fidget_spinner_store_sqlite::CreateNodeRequest { - class: fidget_spinner_core::NodeClass::Research, + class: fidget_spinner_core::NodeClass::Source, frontier_id: None, title: must(NonEmptyText::new("legacy research"), "legacy title")?, summary: Some(must( @@ -1096,7 +1096,7 @@ fn bind_open_backfills_legacy_missing_summary() -> TestResult { Some("Derived summary first paragraph.") ); - let listed = harness.call_tool(62, "node.list", json!({ "class": "research" }))?; + let listed = harness.call_tool(62, "node.list", json!({ "class": "source" }))?; let items = must_some(tool_content(&listed).as_array(), "research node list")?; assert_eq!(items.len(), 1); assert_eq!( @@ -1202,7 +1202,7 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu 71, "node.create", json!({ - "class": "change", + "class": "hypothesis", "frontier_id": frontier_id, "title": "first change", "summary": "first change summary", @@ -1217,15 +1217,29 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu tool_content(&first_change)["id"].as_str(), "first change id", )?; + let first_experiment = harness.call_tool( + 711, + "experiment.open", + json!({ + "frontier_id": frontier_id, + "base_checkpoint_id": base_checkpoint_id, + "hypothesis_node_id": first_change_id, + "title": "first experiment", + "summary": "first experiment summary" + }), + )?; + assert_eq!(first_experiment["result"]["isError"].as_bool(), Some(false)); + let first_experiment_id = must_some( + tool_content(&first_experiment)["experiment_id"].as_str(), + "first experiment id", + )?; let _first_commit = commit_project_state(&project_root, "candidate-one.txt", "candidate one")?; let first_close = harness.call_tool( 72, "experiment.close", json!({ - "frontier_id": frontier_id, - "base_checkpoint_id": base_checkpoint_id, - "change_node_id": first_change_id, + "experiment_id": first_experiment_id, "candidate_summary": "candidate one", "run": { "title": "first run", @@ -1265,7 +1279,7 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu 73, "node.create", json!({ - "class": "change", + "class": "hypothesis", "frontier_id": frontier_id, "title": "second change", "summary": "second change summary", @@ -1280,15 +1294,32 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu tool_content(&second_change)["id"].as_str(), "second change id", )?; + let second_experiment = harness.call_tool( + 712, + "experiment.open", + json!({ + "frontier_id": frontier_id, + "base_checkpoint_id": base_checkpoint_id, + "hypothesis_node_id": second_change_id, + "title": "second experiment", + "summary": "second experiment summary" + }), + )?; + assert_eq!( + second_experiment["result"]["isError"].as_bool(), + Some(false) + ); + let second_experiment_id = must_some( + tool_content(&second_experiment)["experiment_id"].as_str(), + "second experiment id", + )?; let second_commit = commit_project_state(&project_root, "candidate-two.txt", "candidate two")?; let second_close = harness.call_tool( 74, "experiment.close", json!({ - "frontier_id": frontier_id, - "base_checkpoint_id": base_checkpoint_id, - "change_node_id": second_change_id, + "experiment_id": second_experiment_id, "candidate_summary": "candidate two", "run": { "title": "second run", @@ -1355,7 +1386,7 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu 81, "node.create", json!({ - "class": "change", + "class": "hypothesis", "frontier_id": second_frontier_id, "title": "third change", "summary": "third change summary", @@ -1370,6 +1401,22 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu tool_content(&third_change)["id"].as_str(), "third change id", )?; + let third_experiment = harness.call_tool( + 811, + "experiment.open", + json!({ + "frontier_id": second_frontier_id, + "base_checkpoint_id": second_base_checkpoint_id, + "hypothesis_node_id": third_change_id, + "title": "third experiment", + "summary": "third experiment summary" + }), + )?; + assert_eq!(third_experiment["result"]["isError"].as_bool(), Some(false)); + let third_experiment_id = must_some( + tool_content(&third_experiment)["experiment_id"].as_str(), + "third experiment id", + )?; let third_commit = commit_project_state(&project_root, "candidate-three.txt", "candidate three")?; @@ -1377,9 +1424,7 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu 82, "experiment.close", json!({ - "frontier_id": second_frontier_id, - "base_checkpoint_id": second_base_checkpoint_id, - "change_node_id": third_change_id, + "experiment_id": third_experiment_id, "candidate_summary": "candidate three", "run": { "title": "third run", @@ -1428,7 +1473,7 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu })); assert!(key_rows.iter().any(|row| { row["key"].as_str() == Some("wall_clock_s") - && row["source"].as_str() == Some("change_payload") + && row["source"].as_str() == Some("hypothesis_payload") })); let filtered_keys = harness.call_tool( @@ -1502,7 +1547,7 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu "metric.best", json!({ "key": "wall_clock_s", - "source": "change_payload" + "source": "hypothesis_payload" }), )?; assert_eq!( @@ -1519,7 +1564,7 @@ fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResu "metric.best", json!({ "key": "wall_clock_s", - "source": "change_payload", + "source": "hypothesis_payload", "dimensions": { "scenario": "belt_4x5", "duration_s": 60.0 diff --git a/crates/fidget-spinner-core/src/lib.rs b/crates/fidget-spinner-core/src/lib.rs index c0d6fe2..3c9aaac 100644 --- a/crates/fidget-spinner-core/src/lib.rs +++ b/crates/fidget-spinner-core/src/lib.rs @@ -22,7 +22,7 @@ pub use crate::model::{ FrontierProjection, FrontierRecord, FrontierStatus, FrontierVerdict, GitCommitHash, InferencePolicy, JsonObject, MetricDefinition, MetricObservation, MetricSpec, MetricUnit, MetricValue, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NodeTrack, NonEmptyText, - OptimizationObjective, PayloadSchemaRef, ProjectFieldSpec, ProjectSchema, + OpenExperiment, OptimizationObjective, PayloadSchemaRef, ProjectFieldSpec, ProjectSchema, RunDimensionDefinition, RunDimensionValue, RunRecord, RunStatus, TagName, TagRecord, ValidationDiagnostic, }; diff --git a/crates/fidget-spinner-core/src/model.rs b/crates/fidget-spinner-core/src/model.rs index a77566f..170f49c 100644 --- a/crates/fidget-spinner-core/src/model.rs +++ b/crates/fidget-spinner-core/src/model.rs @@ -117,12 +117,11 @@ pub type JsonObject = Map<String, Value>; #[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)] pub enum NodeClass { Contract, - Change, + Hypothesis, Run, Analysis, Decision, - Research, - Enabling, + Source, Note, } @@ -131,12 +130,11 @@ impl NodeClass { pub const fn as_str(self) -> &'static str { match self { Self::Contract => "contract", - Self::Change => "change", + Self::Hypothesis => "hypothesis", Self::Run => "run", Self::Analysis => "analysis", Self::Decision => "decision", - Self::Research => "research", - Self::Enabling => "enabling", + Self::Source => "source", Self::Note => "note", } } @@ -144,10 +142,10 @@ impl NodeClass { #[must_use] pub const fn default_track(self) -> NodeTrack { match self { - Self::Contract | Self::Change | Self::Run | Self::Analysis | Self::Decision => { + Self::Contract | Self::Hypothesis | Self::Run | Self::Analysis | Self::Decision => { NodeTrack::CorePath } - Self::Research | Self::Enabling | Self::Note => NodeTrack::OffPath, + Self::Source | Self::Note => NodeTrack::OffPath, } } } @@ -867,6 +865,17 @@ pub struct ExperimentResult { } #[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +pub struct OpenExperiment { + pub id: ExperimentId, + pub frontier_id: FrontierId, + pub base_checkpoint_id: CheckpointId, + pub hypothesis_node_id: NodeId, + pub title: NonEmptyText, + pub summary: Option<NonEmptyText>, + pub created_at: OffsetDateTime, +} + +#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] pub struct FrontierNote { pub summary: NonEmptyText, pub next_hypotheses: Vec<NonEmptyText>, @@ -878,11 +887,13 @@ pub struct CompletedExperiment { pub frontier_id: FrontierId, pub base_checkpoint_id: CheckpointId, pub candidate_checkpoint_id: CheckpointId, - pub change_node_id: NodeId, + pub hypothesis_node_id: NodeId, pub run_node_id: NodeId, pub run_id: RunId, pub analysis_node_id: Option<NodeId>, pub decision_node_id: NodeId, + pub title: NonEmptyText, + pub summary: Option<NonEmptyText>, pub result: ExperimentResult, pub note: FrontierNote, pub verdict: FrontierVerdict, @@ -934,7 +945,7 @@ mod tests { version: 1, fields: vec![ProjectFieldSpec { name: NonEmptyText::new("hypothesis")?, - node_classes: BTreeSet::from([NodeClass::Change]), + node_classes: BTreeSet::from([NodeClass::Hypothesis]), presence: FieldPresence::Required, severity: DiagnosticSeverity::Warning, role: FieldRole::ProjectionGate, @@ -943,7 +954,7 @@ mod tests { }], }; let payload = NodePayload::with_schema(schema.schema_ref(), JsonObject::new()); - let diagnostics = schema.validate_node(NodeClass::Change, &payload); + let diagnostics = schema.validate_node(NodeClass::Hypothesis, &payload); assert_eq!(diagnostics.admission, super::AdmissionState::Admitted); assert_eq!(diagnostics.items.len(), 1); @@ -979,13 +990,13 @@ mod tests { } #[test] - fn research_nodes_default_to_off_path() -> Result<(), CoreError> { + fn source_nodes_default_to_off_path() -> Result<(), CoreError> { let payload = NodePayload { schema: None, fields: JsonObject::from_iter([("topic".to_owned(), json!("ideas"))]), }; let node = DagNode::new( - NodeClass::Research, + NodeClass::Source, None, NonEmptyText::new("feature scouting")?, None, diff --git a/crates/fidget-spinner-store-sqlite/src/lib.rs b/crates/fidget-spinner-store-sqlite/src/lib.rs index bcdbc01..1862590 100644 --- a/crates/fidget-spinner-store-sqlite/src/lib.rs +++ b/crates/fidget-spinner-store-sqlite/src/lib.rs @@ -13,7 +13,7 @@ use fidget_spinner_core::{ FrontierContract, FrontierNote, FrontierProjection, FrontierRecord, FrontierStatus, FrontierVerdict, GitCommitHash, InferencePolicy, JsonObject, MetricDefinition, MetricSpec, MetricUnit, MetricValue, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NonEmptyText, - OptimizationObjective, ProjectFieldSpec, ProjectSchema, RunDimensionDefinition, + OpenExperiment, OptimizationObjective, ProjectFieldSpec, ProjectSchema, RunDimensionDefinition, RunDimensionValue, RunRecord, RunStatus, TagName, TagRecord, }; use rusqlite::types::Value as SqlValue; @@ -29,6 +29,7 @@ pub const STORE_DIR_NAME: &str = ".fidget_spinner"; pub const STATE_DB_NAME: &str = "state.sqlite"; pub const PROJECT_CONFIG_NAME: &str = "project.json"; pub const PROJECT_SCHEMA_NAME: &str = "schema.json"; +pub const CURRENT_STORE_FORMAT_VERSION: u32 = 2; #[derive(Debug, Error)] pub enum StoreError { @@ -59,8 +60,14 @@ pub enum StoreError { FrontierNotFound(fidget_spinner_core::FrontierId), #[error("checkpoint {0} was not found")] CheckpointNotFound(fidget_spinner_core::CheckpointId), - #[error("node {0} is not a change node")] - NodeNotChange(fidget_spinner_core::NodeId), + #[error("experiment {0} was not found")] + ExperimentNotFound(fidget_spinner_core::ExperimentId), + #[error("node {0} is not a hypothesis node")] + NodeNotHypothesis(fidget_spinner_core::NodeId), + #[error( + "project store format {observed} is incompatible with this binary (expected {expected}); reinitialize the store" + )] + IncompatibleStoreFormatVersion { observed: u32, expected: u32 }, #[error("frontier {frontier_id} has no champion checkpoint")] MissingChampionCheckpoint { frontier_id: fidget_spinner_core::FrontierId, @@ -130,7 +137,7 @@ impl ProjectConfig { Self { display_name, created_at: OffsetDateTime::now_utc(), - store_format_version: 1, + store_format_version: CURRENT_STORE_FORMAT_VERSION, } } } @@ -219,7 +226,7 @@ pub struct NodeSummary { #[serde(rename_all = "snake_case")] pub enum MetricFieldSource { RunMetric, - ChangePayload, + HypothesisPayload, RunPayload, AnalysisPayload, DecisionPayload, @@ -230,7 +237,7 @@ impl MetricFieldSource { pub const fn as_str(self) -> &'static str { match self { Self::RunMetric => "run_metric", - Self::ChangePayload => "change_payload", + Self::HypothesisPayload => "hypothesis_payload", Self::RunPayload => "run_payload", Self::AnalysisPayload => "analysis_payload", Self::DecisionPayload => "decision_payload", @@ -240,13 +247,11 @@ impl MetricFieldSource { #[must_use] pub const fn from_payload_class(class: NodeClass) -> Option<Self> { match class { - NodeClass::Change => Some(Self::ChangePayload), + NodeClass::Hypothesis => Some(Self::HypothesisPayload), NodeClass::Run => Some(Self::RunPayload), NodeClass::Analysis => Some(Self::AnalysisPayload), NodeClass::Decision => Some(Self::DecisionPayload), - NodeClass::Contract | NodeClass::Research | NodeClass::Enabling | NodeClass::Note => { - None - } + NodeClass::Contract | NodeClass::Source | NodeClass::Note => None, } } } @@ -297,8 +302,8 @@ pub struct MetricBestEntry { pub order: MetricRankOrder, pub experiment_id: fidget_spinner_core::ExperimentId, pub frontier_id: fidget_spinner_core::FrontierId, - pub change_node_id: fidget_spinner_core::NodeId, - pub change_title: NonEmptyText, + pub hypothesis_node_id: fidget_spinner_core::NodeId, + pub hypothesis_title: NonEmptyText, pub run_id: fidget_spinner_core::RunId, pub verdict: FrontierVerdict, pub candidate_checkpoint_id: fidget_spinner_core::CheckpointId, @@ -380,10 +385,35 @@ pub struct CheckpointSeed { } #[derive(Clone, Debug)] -pub struct CloseExperimentRequest { +pub struct OpenExperimentRequest { pub frontier_id: fidget_spinner_core::FrontierId, pub base_checkpoint_id: fidget_spinner_core::CheckpointId, - pub change_node_id: fidget_spinner_core::NodeId, + pub hypothesis_node_id: fidget_spinner_core::NodeId, + pub title: NonEmptyText, + pub summary: Option<NonEmptyText>, +} + +#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +pub struct OpenExperimentSummary { + pub id: fidget_spinner_core::ExperimentId, + pub frontier_id: fidget_spinner_core::FrontierId, + pub base_checkpoint_id: fidget_spinner_core::CheckpointId, + pub hypothesis_node_id: fidget_spinner_core::NodeId, + pub title: NonEmptyText, + pub summary: Option<NonEmptyText>, + pub created_at: OffsetDateTime, +} + +#[derive(Clone, Debug)] +pub struct ExperimentAnalysisDraft { + pub title: NonEmptyText, + pub summary: NonEmptyText, + pub body: NonEmptyText, +} + +#[derive(Clone, Debug)] +pub struct CloseExperimentRequest { + pub experiment_id: fidget_spinner_core::ExperimentId, pub candidate_summary: NonEmptyText, pub candidate_snapshot: CheckpointSnapshotRef, pub run_title: NonEmptyText, @@ -396,16 +426,18 @@ pub struct CloseExperimentRequest { pub supporting_metrics: Vec<MetricValue>, pub note: FrontierNote, pub verdict: FrontierVerdict, + pub analysis: Option<ExperimentAnalysisDraft>, pub decision_title: NonEmptyText, pub decision_rationale: NonEmptyText, - pub analysis_node_id: Option<fidget_spinner_core::NodeId>, } #[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] pub struct ExperimentReceipt { + pub open_experiment: OpenExperiment, pub checkpoint: CheckpointRecord, pub run_node: DagNode, pub run: RunRecord, + pub analysis_node: Option<DagNode>, pub decision_node: DagNode, pub experiment: CompletedExperiment, } @@ -450,6 +482,12 @@ impl ProjectStore { .ok_or(StoreError::MissingProjectStore(requested_root))?; let state_root = state_root(&project_root); let config = read_json_file::<ProjectConfig>(&state_root.join(PROJECT_CONFIG_NAME))?; + if config.store_format_version != CURRENT_STORE_FORMAT_VERSION { + return Err(StoreError::IncompatibleStoreFormatVersion { + observed: config.store_format_version, + expected: CURRENT_STORE_FORMAT_VERSION, + }); + } let schema = read_json_file::<ProjectSchema>(&state_root.join(PROJECT_SCHEMA_NAME))?; let mut connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?; upgrade_store(&mut connection)?; @@ -1098,17 +1136,17 @@ impl ProjectStore { .map_err(StoreError::from) } - pub fn close_experiment( + pub fn open_experiment( &mut self, - request: CloseExperimentRequest, - ) -> Result<ExperimentReceipt, StoreError> { - let change_node = self - .get_node(request.change_node_id)? - .ok_or(StoreError::NodeNotFound(request.change_node_id))?; - if change_node.class != NodeClass::Change { - return Err(StoreError::NodeNotChange(request.change_node_id)); + request: OpenExperimentRequest, + ) -> Result<OpenExperimentSummary, StoreError> { + let hypothesis_node = self + .get_node(request.hypothesis_node_id)? + .ok_or(StoreError::NodeNotFound(request.hypothesis_node_id))?; + if hypothesis_node.class != NodeClass::Hypothesis { + return Err(StoreError::NodeNotHypothesis(request.hypothesis_node_id)); } - if change_node.frontier_id != Some(request.frontier_id) { + if hypothesis_node.frontier_id != Some(request.frontier_id) { return Err(StoreError::FrontierNotFound(request.frontier_id)); } let base_checkpoint = self @@ -1117,6 +1155,102 @@ impl ProjectStore { if base_checkpoint.frontier_id != request.frontier_id { return Err(StoreError::CheckpointNotFound(request.base_checkpoint_id)); } + let experiment = OpenExperiment { + id: fidget_spinner_core::ExperimentId::fresh(), + frontier_id: request.frontier_id, + base_checkpoint_id: request.base_checkpoint_id, + hypothesis_node_id: request.hypothesis_node_id, + title: request.title, + summary: request.summary, + created_at: OffsetDateTime::now_utc(), + }; + let tx = self.connection.transaction()?; + insert_open_experiment(&tx, &experiment)?; + touch_frontier(&tx, request.frontier_id)?; + insert_event( + &tx, + "experiment", + &experiment.id.to_string(), + "experiment.opened", + json!({ + "frontier_id": experiment.frontier_id, + "hypothesis_node_id": experiment.hypothesis_node_id, + "base_checkpoint_id": experiment.base_checkpoint_id, + }), + )?; + tx.commit()?; + Ok(summarize_open_experiment(&experiment)) + } + + pub fn list_open_experiments( + &self, + frontier_id: Option<fidget_spinner_core::FrontierId>, + ) -> Result<Vec<OpenExperimentSummary>, StoreError> { + let mut statement = self.connection.prepare( + "SELECT + id, + frontier_id, + base_checkpoint_id, + hypothesis_node_id, + title, + summary, + created_at + FROM open_experiments + WHERE (?1 IS NULL OR frontier_id = ?1) + ORDER BY created_at DESC", + )?; + let mut rows = statement.query(params![frontier_id.map(|id| id.to_string())])?; + let mut items = Vec::new(); + while let Some(row) = rows.next()? { + items.push(OpenExperimentSummary { + id: parse_experiment_id(&row.get::<_, String>(0)?)?, + frontier_id: parse_frontier_id(&row.get::<_, String>(1)?)?, + base_checkpoint_id: parse_checkpoint_id(&row.get::<_, String>(2)?)?, + hypothesis_node_id: parse_node_id(&row.get::<_, String>(3)?)?, + title: NonEmptyText::new(row.get::<_, String>(4)?)?, + summary: row + .get::<_, Option<String>>(5)? + .map(NonEmptyText::new) + .transpose()?, + created_at: decode_timestamp(&row.get::<_, String>(6)?)?, + }); + } + Ok(items) + } + + pub fn read_open_experiment( + &self, + experiment_id: fidget_spinner_core::ExperimentId, + ) -> Result<OpenExperimentSummary, StoreError> { + load_open_experiment(&self.connection, experiment_id)? + .map(|experiment| summarize_open_experiment(&experiment)) + .ok_or(StoreError::ExperimentNotFound(experiment_id)) + } + + pub fn close_experiment( + &mut self, + request: CloseExperimentRequest, + ) -> Result<ExperimentReceipt, StoreError> { + let open_experiment = load_open_experiment(&self.connection, request.experiment_id)? + .ok_or(StoreError::ExperimentNotFound(request.experiment_id))?; + let hypothesis_node = self + .get_node(open_experiment.hypothesis_node_id)? + .ok_or(StoreError::NodeNotFound(open_experiment.hypothesis_node_id))?; + if hypothesis_node.class != NodeClass::Hypothesis { + return Err(StoreError::NodeNotHypothesis( + open_experiment.hypothesis_node_id, + )); + } + let base_checkpoint = self + .load_checkpoint(open_experiment.base_checkpoint_id)? + .ok_or(StoreError::CheckpointNotFound( + open_experiment.base_checkpoint_id, + ))?; + if base_checkpoint.frontier_id != open_experiment.frontier_id { + return Err(StoreError::CheckpointNotFound( + open_experiment.base_checkpoint_id, + )); + } let tx = self.connection.transaction()?; let dimensions = validate_run_dimensions_tx(&tx, &request.dimensions)?; let primary_metric_definition = @@ -1144,7 +1278,7 @@ impl ProjectStore { let run_diagnostics = self.schema.validate_node(NodeClass::Run, &run_payload); let run_node = DagNode::new( NodeClass::Run, - Some(request.frontier_id), + Some(open_experiment.frontier_id), request.run_title, request.run_summary, run_payload, @@ -1155,7 +1289,7 @@ impl ProjectStore { let run = RunRecord { node_id: run_node.id, run_id, - frontier_id: Some(request.frontier_id), + frontier_id: Some(open_experiment.frontier_id), status: RunStatus::Succeeded, backend: request.backend, code_snapshot: request.code_snapshot, @@ -1165,6 +1299,27 @@ impl ProjectStore { finished_at: Some(now), }; + let analysis_node = request + .analysis + .map(|analysis| -> Result<DagNode, StoreError> { + let payload = NodePayload::with_schema( + self.schema.schema_ref(), + json_object(json!({ + "body": analysis.body.as_str(), + }))?, + ); + let diagnostics = self.schema.validate_node(NodeClass::Analysis, &payload); + Ok(DagNode::new( + NodeClass::Analysis, + Some(open_experiment.frontier_id), + analysis.title, + Some(analysis.summary), + payload, + diagnostics, + )) + }) + .transpose()?; + let decision_payload = NodePayload::with_schema( self.schema.schema_ref(), json_object(json!({ @@ -1177,7 +1332,7 @@ impl ProjectStore { .validate_node(NodeClass::Decision, &decision_payload); let decision_node = DagNode::new( NodeClass::Decision, - Some(request.frontier_id), + Some(open_experiment.frontier_id), request.decision_title, Some(request.decision_rationale.clone()), decision_payload, @@ -1186,7 +1341,7 @@ impl ProjectStore { let checkpoint = CheckpointRecord { id: fidget_spinner_core::CheckpointId::fresh(), - frontier_id: request.frontier_id, + frontier_id: open_experiment.frontier_id, node_id: run_node.id, snapshot: request.candidate_snapshot, disposition: match request.verdict { @@ -1202,15 +1357,17 @@ impl ProjectStore { }; let experiment = CompletedExperiment { - id: fidget_spinner_core::ExperimentId::fresh(), - frontier_id: request.frontier_id, - base_checkpoint_id: request.base_checkpoint_id, + id: open_experiment.id, + frontier_id: open_experiment.frontier_id, + base_checkpoint_id: open_experiment.base_checkpoint_id, candidate_checkpoint_id: checkpoint.id, - change_node_id: request.change_node_id, + hypothesis_node_id: open_experiment.hypothesis_node_id, run_node_id: run_node.id, run_id, - analysis_node_id: request.analysis_node_id, + analysis_node_id: analysis_node.as_ref().map(|node| node.id), decision_node_id: decision_node.id, + title: open_experiment.title.clone(), + summary: open_experiment.summary.clone(), result: ExperimentResult { dimensions: dimensions.clone(), primary_metric: request.primary_metric, @@ -1222,23 +1379,45 @@ impl ProjectStore { created_at: now, }; insert_node(&tx, &run_node)?; + if let Some(node) = analysis_node.as_ref() { + insert_node(&tx, node)?; + } insert_node(&tx, &decision_node)?; insert_edge( &tx, &DagEdge { - source_id: request.change_node_id, + source_id: open_experiment.hypothesis_node_id, target_id: run_node.id, kind: EdgeKind::Lineage, }, )?; - insert_edge( - &tx, - &DagEdge { - source_id: run_node.id, - target_id: decision_node.id, - kind: EdgeKind::Evidence, - }, - )?; + if let Some(node) = analysis_node.as_ref() { + insert_edge( + &tx, + &DagEdge { + source_id: run_node.id, + target_id: node.id, + kind: EdgeKind::Evidence, + }, + )?; + insert_edge( + &tx, + &DagEdge { + source_id: node.id, + target_id: decision_node.id, + kind: EdgeKind::Evidence, + }, + )?; + } else { + insert_edge( + &tx, + &DagEdge { + source_id: run_node.id, + target_id: decision_node.id, + kind: EdgeKind::Evidence, + }, + )?; + } insert_run( &tx, &run, @@ -1251,7 +1430,7 @@ impl ProjectStore { insert_run_dimensions(&tx, run.run_id, &dimensions)?; match request.verdict { FrontierVerdict::PromoteToChampion => { - demote_previous_champion(&tx, request.frontier_id)?; + demote_previous_champion(&tx, open_experiment.frontier_id)?; } FrontierVerdict::KeepOnFrontier | FrontierVerdict::NeedsMoreEvidence @@ -1260,14 +1439,16 @@ impl ProjectStore { } insert_checkpoint(&tx, &checkpoint)?; insert_experiment(&tx, &experiment)?; - touch_frontier(&tx, request.frontier_id)?; + delete_open_experiment(&tx, open_experiment.id)?; + touch_frontier(&tx, open_experiment.frontier_id)?; insert_event( &tx, "experiment", &experiment.id.to_string(), "experiment.closed", json!({ - "frontier_id": request.frontier_id, + "frontier_id": open_experiment.frontier_id, + "hypothesis_node_id": open_experiment.hypothesis_node_id, "verdict": format!("{:?}", request.verdict), "candidate_checkpoint_id": checkpoint.id, }), @@ -1275,9 +1456,11 @@ impl ProjectStore { tx.commit()?; Ok(ExperimentReceipt { + open_experiment, checkpoint, run_node, run, + analysis_node, decision_node, experiment, }) @@ -1363,7 +1546,7 @@ fn upgrade_store(connection: &mut Connection) -> Result<(), StoreError> { } fn validate_prose_node_request(request: &CreateNodeRequest) -> Result<(), StoreError> { - if !matches!(request.class, NodeClass::Note | NodeClass::Research) { + if !matches!(request.class, NodeClass::Note | NodeClass::Source) { return Ok(()); } if request.summary.is_none() { @@ -1382,8 +1565,8 @@ struct MetricSample { value: f64, frontier_id: fidget_spinner_core::FrontierId, experiment_id: fidget_spinner_core::ExperimentId, - change_node_id: fidget_spinner_core::NodeId, - change_title: NonEmptyText, + hypothesis_node_id: fidget_spinner_core::NodeId, + hypothesis_title: NonEmptyText, run_id: fidget_spinner_core::RunId, verdict: FrontierVerdict, candidate_checkpoint_id: fidget_spinner_core::CheckpointId, @@ -1402,8 +1585,8 @@ impl MetricSample { order, experiment_id: self.experiment_id, frontier_id: self.frontier_id, - change_node_id: self.change_node_id, - change_title: self.change_title, + hypothesis_node_id: self.hypothesis_node_id, + hypothesis_title: self.hypothesis_title, run_id: self.run_id, verdict: self.verdict, candidate_checkpoint_id: self.candidate_checkpoint_id, @@ -1558,7 +1741,7 @@ struct ExperimentMetricRow { run_id: fidget_spinner_core::RunId, verdict: FrontierVerdict, candidate_checkpoint: CheckpointRecord, - change_node: DagNode, + hypothesis_node: DagNode, run_node: DagNode, analysis_node: Option<DagNode>, decision_node: DagNode, @@ -1574,7 +1757,7 @@ fn load_experiment_rows(store: &ProjectStore) -> Result<Vec<ExperimentMetricRow> id, frontier_id, run_id, - change_node_id, + hypothesis_node_id, run_node_id, analysis_node_id, decision_node_id, @@ -1587,7 +1770,7 @@ fn load_experiment_rows(store: &ProjectStore) -> Result<Vec<ExperimentMetricRow> let mut rows = statement.query([])?; let mut items = Vec::new(); while let Some(row) = rows.next()? { - let change_node_id = parse_node_id(&row.get::<_, String>(3)?)?; + let hypothesis_node_id = parse_node_id(&row.get::<_, String>(3)?)?; let run_id = parse_run_id(&row.get::<_, String>(2)?)?; let run_node_id = parse_node_id(&row.get::<_, String>(4)?)?; let analysis_node_id = row @@ -1604,9 +1787,9 @@ fn load_experiment_rows(store: &ProjectStore) -> Result<Vec<ExperimentMetricRow> candidate_checkpoint: store .load_checkpoint(candidate_checkpoint_id)? .ok_or(StoreError::CheckpointNotFound(candidate_checkpoint_id))?, - change_node: store - .get_node(change_node_id)? - .ok_or(StoreError::NodeNotFound(change_node_id))?, + hypothesis_node: store + .get_node(hypothesis_node_id)? + .ok_or(StoreError::NodeNotFound(hypothesis_node_id))?, run_node: store .get_node(run_node_id)? .ok_or(StoreError::NodeNotFound(run_node_id))?, @@ -1647,7 +1830,11 @@ fn metric_samples_for_row( MetricFieldSource::RunMetric, ) })); - samples.extend(metric_samples_from_payload(schema, row, &row.change_node)); + samples.extend(metric_samples_from_payload( + schema, + row, + &row.hypothesis_node, + )); samples.extend(metric_samples_from_payload(schema, row, &row.run_node)); if let Some(node) = row.analysis_node.as_ref() { samples.extend(metric_samples_from_payload(schema, row, node)); @@ -1669,8 +1856,8 @@ fn metric_sample_from_observation( value: metric.value, frontier_id: row.frontier_id, experiment_id: row.experiment_id, - change_node_id: row.change_node.id, - change_title: row.change_node.title.clone(), + hypothesis_node_id: row.hypothesis_node.id, + hypothesis_title: row.hypothesis_node.title.clone(), run_id: row.run_id, verdict: row.verdict, candidate_checkpoint_id: row.candidate_checkpoint.id, @@ -1708,8 +1895,8 @@ fn metric_samples_from_payload( value, frontier_id: row.frontier_id, experiment_id: row.experiment_id, - change_node_id: row.change_node.id, - change_title: row.change_node.title.clone(), + hypothesis_node_id: row.hypothesis_node.id, + hypothesis_title: row.hypothesis_node.title.clone(), run_id: row.run_id, verdict: row.verdict, candidate_checkpoint_id: row.candidate_checkpoint.id, @@ -1847,16 +2034,28 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> { PRIMARY KEY (run_id, dimension_key) ); + CREATE TABLE IF NOT EXISTS open_experiments ( + id TEXT PRIMARY KEY, + frontier_id TEXT NOT NULL REFERENCES frontiers(id) ON DELETE CASCADE, + base_checkpoint_id TEXT NOT NULL REFERENCES checkpoints(id) ON DELETE RESTRICT, + hypothesis_node_id TEXT NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT, + title TEXT NOT NULL, + summary TEXT, + created_at TEXT NOT NULL + ); + CREATE TABLE IF NOT EXISTS experiments ( id TEXT PRIMARY KEY, frontier_id TEXT NOT NULL REFERENCES frontiers(id) ON DELETE CASCADE, base_checkpoint_id TEXT NOT NULL REFERENCES checkpoints(id) ON DELETE RESTRICT, candidate_checkpoint_id TEXT NOT NULL REFERENCES checkpoints(id) ON DELETE RESTRICT, - change_node_id TEXT NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT, + hypothesis_node_id TEXT NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT, run_node_id TEXT NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT, run_id TEXT NOT NULL REFERENCES runs(run_id) ON DELETE RESTRICT, analysis_node_id TEXT REFERENCES nodes(id) ON DELETE RESTRICT, decision_node_id TEXT NOT NULL REFERENCES nodes(id) ON DELETE RESTRICT, + title TEXT NOT NULL, + summary TEXT, benchmark_suite TEXT NOT NULL, primary_metric_json TEXT NOT NULL, supporting_metrics_json TEXT NOT NULL, @@ -1870,6 +2069,7 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> { CREATE INDEX IF NOT EXISTS run_dimensions_by_key_text ON run_dimensions(dimension_key, value_text); CREATE INDEX IF NOT EXISTS run_dimensions_by_key_numeric ON run_dimensions(dimension_key, value_numeric); CREATE INDEX IF NOT EXISTS run_dimensions_by_run ON run_dimensions(run_id, dimension_key); + CREATE INDEX IF NOT EXISTS open_experiments_by_frontier ON open_experiments(frontier_id, created_at DESC); CREATE INDEX IF NOT EXISTS experiments_by_frontier ON experiments(frontier_id, created_at DESC); CREATE TABLE IF NOT EXISTS events ( @@ -1889,7 +2089,7 @@ fn backfill_prose_summaries(connection: &Connection) -> Result<(), StoreError> { let mut statement = connection.prepare( "SELECT id, payload_json FROM nodes - WHERE class IN ('note', 'research') + WHERE class IN ('note', 'source') AND (summary IS NULL OR trim(summary) = '')", )?; let mut rows = statement.query([])?; @@ -2838,6 +3038,44 @@ fn insert_run( Ok(()) } +fn insert_open_experiment( + tx: &Transaction<'_>, + experiment: &OpenExperiment, +) -> Result<(), StoreError> { + let _ = tx.execute( + "INSERT INTO open_experiments ( + id, + frontier_id, + base_checkpoint_id, + hypothesis_node_id, + title, + summary, + created_at + ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)", + params![ + experiment.id.to_string(), + experiment.frontier_id.to_string(), + experiment.base_checkpoint_id.to_string(), + experiment.hypothesis_node_id.to_string(), + experiment.title.as_str(), + experiment.summary.as_ref().map(NonEmptyText::as_str), + encode_timestamp(experiment.created_at)?, + ], + )?; + Ok(()) +} + +fn delete_open_experiment( + tx: &Transaction<'_>, + experiment_id: fidget_spinner_core::ExperimentId, +) -> Result<(), StoreError> { + let _ = tx.execute( + "DELETE FROM open_experiments WHERE id = ?1", + params![experiment_id.to_string()], + )?; + Ok(()) +} + fn insert_experiment( tx: &Transaction<'_>, experiment: &CompletedExperiment, @@ -2848,11 +3086,13 @@ fn insert_experiment( frontier_id, base_checkpoint_id, candidate_checkpoint_id, - change_node_id, + hypothesis_node_id, run_node_id, run_id, analysis_node_id, decision_node_id, + title, + summary, benchmark_suite, primary_metric_json, supporting_metrics_json, @@ -2860,17 +3100,19 @@ fn insert_experiment( note_next_json, verdict, created_at - ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16)", + ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18)", params![ experiment.id.to_string(), experiment.frontier_id.to_string(), experiment.base_checkpoint_id.to_string(), experiment.candidate_checkpoint_id.to_string(), - experiment.change_node_id.to_string(), + experiment.hypothesis_node_id.to_string(), experiment.run_node_id.to_string(), experiment.run_id.to_string(), experiment.analysis_node_id.map(|id| id.to_string()), experiment.decision_node_id.to_string(), + experiment.title.as_str(), + experiment.summary.as_ref().map(NonEmptyText::as_str), benchmark_suite_label(&experiment.result.dimensions), encode_json(&experiment.result.primary_metric)?, encode_json(&experiment.result.supporting_metrics)?, @@ -2904,6 +3146,60 @@ fn insert_event( Ok(()) } +fn load_open_experiment( + connection: &Connection, + experiment_id: fidget_spinner_core::ExperimentId, +) -> Result<Option<OpenExperiment>, StoreError> { + let mut statement = connection.prepare( + "SELECT + id, + frontier_id, + base_checkpoint_id, + hypothesis_node_id, + title, + summary, + created_at + FROM open_experiments + WHERE id = ?1", + )?; + statement + .query_row(params![experiment_id.to_string()], |row| { + Ok(OpenExperiment { + id: parse_experiment_id(&row.get::<_, String>(0)?) + .map_err(to_sql_conversion_error)?, + frontier_id: parse_frontier_id(&row.get::<_, String>(1)?) + .map_err(to_sql_conversion_error)?, + base_checkpoint_id: parse_checkpoint_id(&row.get::<_, String>(2)?) + .map_err(to_sql_conversion_error)?, + hypothesis_node_id: parse_node_id(&row.get::<_, String>(3)?) + .map_err(to_sql_conversion_error)?, + title: NonEmptyText::new(row.get::<_, String>(4)?) + .map_err(core_to_sql_conversion_error)?, + summary: row + .get::<_, Option<String>>(5)? + .map(NonEmptyText::new) + .transpose() + .map_err(core_to_sql_conversion_error)?, + created_at: decode_timestamp(&row.get::<_, String>(6)?) + .map_err(to_sql_conversion_error)?, + }) + }) + .optional() + .map_err(StoreError::from) +} + +fn summarize_open_experiment(experiment: &OpenExperiment) -> OpenExperimentSummary { + OpenExperimentSummary { + id: experiment.id, + frontier_id: experiment.frontier_id, + base_checkpoint_id: experiment.base_checkpoint_id, + hypothesis_node_id: experiment.hypothesis_node_id, + title: experiment.title.clone(), + summary: experiment.summary.clone(), + created_at: experiment.created_at, + } +} + fn touch_frontier( tx: &Transaction<'_>, frontier_id: fidget_spinner_core::FrontierId, @@ -3188,12 +3484,11 @@ fn parse_annotation_id(raw: &str) -> Result<fidget_spinner_core::AnnotationId, S fn parse_node_class(raw: &str) -> Result<NodeClass, StoreError> { match raw { "contract" => Ok(NodeClass::Contract), - "change" => Ok(NodeClass::Change), + "hypothesis" => Ok(NodeClass::Hypothesis), "run" => Ok(NodeClass::Run), "analysis" => Ok(NodeClass::Analysis), "decision" => Ok(NodeClass::Decision), - "research" => Ok(NodeClass::Research), - "enabling" => Ok(NodeClass::Enabling), + "source" => Ok(NodeClass::Source), "note" => Ok(NodeClass::Note), other => Err(StoreError::Json(serde_json::Error::io(io::Error::new( io::ErrorKind::InvalidData, @@ -3486,7 +3781,7 @@ mod tests { use super::{ CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest, DefineRunDimensionRequest, ListNodesQuery, MetricBestQuery, MetricFieldSource, - MetricKeyQuery, MetricRankOrder, PROJECT_SCHEMA_NAME, ProjectStore, + MetricKeyQuery, MetricRankOrder, OpenExperimentRequest, PROJECT_SCHEMA_NAME, ProjectStore, RemoveSchemaFieldRequest, UpsertSchemaFieldRequest, }; use fidget_spinner_core::{ @@ -3528,7 +3823,7 @@ mod tests { NonEmptyText::new("local.test")?, )?; let node = store.add_node(CreateNodeRequest { - class: NodeClass::Research, + class: NodeClass::Source, frontier_id: None, title: NonEmptyText::new("feature sketch")?, summary: Some(NonEmptyText::new("research note")?), @@ -3747,7 +4042,7 @@ mod tests { )?; let missing_summary = store.add_node(CreateNodeRequest { - class: NodeClass::Research, + class: NodeClass::Source, frontier_id: None, title: NonEmptyText::new("research note")?, summary: None, @@ -3761,7 +4056,7 @@ mod tests { }); assert!(matches!( missing_summary, - Err(super::StoreError::ProseSummaryRequired(NodeClass::Research)) + Err(super::StoreError::ProseSummaryRequired(NodeClass::Source)) )); let missing_body = store.add_node(CreateNodeRequest { @@ -3790,7 +4085,7 @@ mod tests { NonEmptyText::new("local.test")?, )?; let node = store.add_node(CreateNodeRequest { - class: NodeClass::Research, + class: NodeClass::Source, frontier_id: None, title: NonEmptyText::new("research note")?, summary: Some(NonEmptyText::new("temporary summary")?), @@ -3838,7 +4133,7 @@ mod tests { let field = store.upsert_schema_field(UpsertSchemaFieldRequest { name: NonEmptyText::new("scenario")?, - node_classes: BTreeSet::from([NodeClass::Change, NodeClass::Analysis]), + node_classes: BTreeSet::from([NodeClass::Hypothesis, NodeClass::Analysis]), presence: FieldPresence::Recommended, severity: DiagnosticSeverity::Warning, role: FieldRole::ProjectionGate, @@ -3868,7 +4163,7 @@ mod tests { let removed = reopened.remove_schema_field(RemoveSchemaFieldRequest { name: NonEmptyText::new("scenario")?, - node_classes: Some(BTreeSet::from([NodeClass::Change, NodeClass::Analysis])), + node_classes: Some(BTreeSet::from([NodeClass::Hypothesis, NodeClass::Analysis])), })?; assert_eq!(removed, 1); assert_eq!(reopened.schema().version, initial_version + 2); @@ -3934,11 +4229,11 @@ mod tests { description: Some(NonEmptyText::new("time budget in seconds")?), })?; - let first_change = store.add_node(CreateNodeRequest { - class: NodeClass::Change, + let first_hypothesis = store.add_node(CreateNodeRequest { + class: NodeClass::Hypothesis, frontier_id: Some(frontier_id), - title: NonEmptyText::new("first change")?, - summary: Some(NonEmptyText::new("first change summary")?), + title: NonEmptyText::new("first hypothesis")?, + summary: Some(NonEmptyText::new("first hypothesis summary")?), tags: None, payload: NodePayload::with_schema( store.schema().schema_ref(), @@ -3947,11 +4242,11 @@ mod tests { annotations: Vec::new(), attachments: Vec::new(), })?; - let second_change = store.add_node(CreateNodeRequest { - class: NodeClass::Change, + let second_hypothesis = store.add_node(CreateNodeRequest { + class: NodeClass::Hypothesis, frontier_id: Some(frontier_id), - title: NonEmptyText::new("second change")?, - summary: Some(NonEmptyText::new("second change summary")?), + title: NonEmptyText::new("second hypothesis")?, + summary: Some(NonEmptyText::new("second hypothesis summary")?), tags: None, payload: NodePayload::with_schema( store.schema().schema_ref(), @@ -3960,12 +4255,22 @@ mod tests { annotations: Vec::new(), attachments: Vec::new(), })?; + let first_experiment = store.open_experiment(open_experiment_request( + frontier_id, + base_checkpoint_id, + first_hypothesis.id, + "first experiment", + )?)?; + let second_experiment = store.open_experiment(open_experiment_request( + frontier_id, + base_checkpoint_id, + second_hypothesis.id, + "second experiment", + )?)?; let first_receipt = store.close_experiment(experiment_request( &root, - frontier_id, - base_checkpoint_id, - first_change.id, + first_experiment.id, "bbbbbbbbbbbbbbbb", "first run", 10.0, @@ -3973,9 +4278,7 @@ mod tests { )?)?; let second_receipt = store.close_experiment(experiment_request( &root, - frontier_id, - base_checkpoint_id, - second_change.id, + second_experiment.id, "cccccccccccccccc", "second run", 5.0, @@ -3987,7 +4290,7 @@ mod tests { key.key.as_str() == "wall_clock_s" && key.source == MetricFieldSource::RunMetric })); assert!(keys.iter().any(|key| { - key.key.as_str() == "latency_hint" && key.source == MetricFieldSource::ChangePayload + key.key.as_str() == "latency_hint" && key.source == MetricFieldSource::HypothesisPayload })); assert!(keys.iter().any(|key| { key.key.as_str() == "wall_clock_s" @@ -4044,19 +4347,19 @@ mod tests { let payload_best = store.best_metrics(MetricBestQuery { key: NonEmptyText::new("latency_hint")?, frontier_id: Some(frontier_id), - source: Some(MetricFieldSource::ChangePayload), + source: Some(MetricFieldSource::HypothesisPayload), dimensions: run_dimensions("belt_4x5", 60.0)?, order: Some(MetricRankOrder::Asc), limit: 5, })?; assert_eq!(payload_best.len(), 1); assert_eq!(payload_best[0].value, 7.0); - assert_eq!(payload_best[0].change_node_id, second_change.id); + assert_eq!(payload_best[0].hypothesis_node_id, second_hypothesis.id); let missing_order = store.best_metrics(MetricBestQuery { key: NonEmptyText::new("latency_hint")?, frontier_id: Some(frontier_id), - source: Some(MetricFieldSource::ChangePayload), + source: Some(MetricFieldSource::HypothesisPayload), dimensions: BTreeMap::new(), order: None, limit: 5, @@ -4107,11 +4410,11 @@ mod tests { let base_checkpoint_id = projection .champion_checkpoint_id .ok_or_else(|| super::StoreError::MissingChampionCheckpoint { frontier_id })?; - let change = store.add_node(CreateNodeRequest { - class: NodeClass::Change, + let hypothesis = store.add_node(CreateNodeRequest { + class: NodeClass::Hypothesis, frontier_id: Some(frontier_id), - title: NonEmptyText::new("candidate change")?, - summary: Some(NonEmptyText::new("candidate change summary")?), + title: NonEmptyText::new("candidate hypothesis")?, + summary: Some(NonEmptyText::new("candidate hypothesis summary")?), tags: None, payload: NodePayload::with_schema( store.schema().schema_ref(), @@ -4120,11 +4423,15 @@ mod tests { annotations: Vec::new(), attachments: Vec::new(), })?; - let _ = store.close_experiment(experiment_request( - &root, + let experiment = store.open_experiment(open_experiment_request( frontier_id, base_checkpoint_id, - change.id, + hypothesis.id, + "migration experiment", + )?)?; + let _ = store.close_experiment(experiment_request( + &root, + experiment.id, "bbbbbbbbbbbbbbbb", "migration run", 11.0, @@ -4177,20 +4484,31 @@ mod tests { }) } - fn experiment_request( - root: &camino::Utf8Path, + fn open_experiment_request( frontier_id: fidget_spinner_core::FrontierId, base_checkpoint_id: fidget_spinner_core::CheckpointId, - change_node_id: fidget_spinner_core::NodeId, + hypothesis_node_id: fidget_spinner_core::NodeId, + title: &str, + ) -> Result<OpenExperimentRequest, super::StoreError> { + Ok(OpenExperimentRequest { + frontier_id, + base_checkpoint_id, + hypothesis_node_id, + title: NonEmptyText::new(title)?, + summary: Some(NonEmptyText::new(format!("{title} summary"))?), + }) + } + + fn experiment_request( + root: &camino::Utf8Path, + experiment_id: fidget_spinner_core::ExperimentId, candidate_commit: &str, run_title: &str, wall_clock_s: f64, dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, ) -> Result<CloseExperimentRequest, super::StoreError> { Ok(CloseExperimentRequest { - frontier_id, - base_checkpoint_id, - change_node_id, + experiment_id, candidate_summary: NonEmptyText::new(format!("candidate {candidate_commit}"))?, candidate_snapshot: checkpoint_snapshot(root, candidate_commit)?, run_title: NonEmptyText::new(run_title)?, @@ -4213,9 +4531,9 @@ mod tests { next_hypotheses: Vec::new(), }, verdict: FrontierVerdict::KeepOnFrontier, + analysis: None, decision_title: NonEmptyText::new("decision")?, decision_rationale: NonEmptyText::new("decision rationale")?, - analysis_node_id: None, }) } diff --git a/docs/architecture.md b/docs/architecture.md index 37f5c55..30d01fc 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -142,7 +142,7 @@ Specific actions may refuse incomplete records. Examples: - core-path experiment closure requires complete run/result/note/verdict state -- future promotion helpers may require a projection-ready change payload +- future promotion helpers may require a projection-ready hypothesis payload ## SQLite Schema @@ -256,7 +256,7 @@ Stores the atomic closure object for core-path work: - frontier id - base checkpoint id - candidate checkpoint id -- change node id +- hypothesis node id - run node id and run id - optional analysis node id - decision node id @@ -284,15 +284,15 @@ Stores durable audit events: Core path: - `contract` -- `change` +- `hypothesis` - `run` - `analysis` - `decision` Off path: -- `research` -- `enabling` +- `source` +- `source` - `note` ### Node tracks @@ -320,13 +320,13 @@ This projection is derived from canonical state and intentionally rebuildable. These are intentionally cheap: - `note.quick`, but only with explicit tags from the repo-local registry -- `research.record`, optionally tagged into the same repo-local taxonomy +- `source.record`, optionally tagged into the same repo-local taxonomy - generic `node.create` for escape-hatch use - `node.annotate` ### Low-ceremony core-path entry -`change.record` exists to capture intent before worktree state becomes muddy. +`hypothesis.record` exists to capture intent before worktree state becomes muddy. ### Atomic core-path closure @@ -423,8 +423,8 @@ Current policy: `frontier.status`, `node.list`, `node.read`, `skill.list`, `skill.show`, and resource reads are safe to replay once after a retryable worker fault -- mutating tools such as `tag.add`, `frontier.init`, `node.create`, `change.record`, - `node.annotate`, `node.archive`, `note.quick`, `research.record`, and +- mutating tools such as `tag.add`, `frontier.init`, `node.create`, `hypothesis.record`, + `node.annotate`, `node.archive`, `note.quick`, `source.record`, and `experiment.close` are never auto-replayed This is the hardening answer to side-effect safety. @@ -451,13 +451,13 @@ Implemented tools: - `frontier.status` - `frontier.init` - `node.create` -- `change.record` +- `hypothesis.record` - `node.list` - `node.read` - `node.annotate` - `node.archive` - `note.quick` -- `research.record` +- `source.record` - `metric.define` - `metric.keys` - `metric.best` @@ -537,7 +537,7 @@ Current commands: - `note quick` - `tag add` - `tag list` -- `research add` +- `source add` - `metric define` - `metric keys` - `metric best` diff --git a/docs/libgrid-dogfood.md b/docs/libgrid-dogfood.md index 5e13e51..59e214e 100644 --- a/docs/libgrid-dogfood.md +++ b/docs/libgrid-dogfood.md @@ -51,7 +51,7 @@ The root contract should state: ### Change node -Use `change.record` to capture: +Use `hypothesis.record` to capture: - what hypothesis is being tested - what base checkpoint it starts from @@ -83,8 +83,8 @@ The decision should make the verdict explicit: Use these freely: -- `research` for ideas, external references, algorithm sketches -- `enabling` for scaffolding that is not yet a benchmarked experiment +- `source` for ideas, external references, algorithm sketches +- `source` for scaffolding that is not yet a benchmarked experiment - `note` for quick observations This is how the system avoids forcing every useful thought into experiment @@ -99,10 +99,10 @@ The MVP does not need hard rejection. It does need meaningful warnings. Good first project fields: -- `hypothesis` on `change` -- `base_checkpoint_id` on `change` -- `benchmark_suite` on `change` and `run` -- `body` on `change`, `research`, and `note` +- `hypothesis` on `hypothesis` +- `base_checkpoint_id` on `hypothesis` +- `benchmark_suite` on `hypothesis` and `run` +- `body` on `hypothesis`, `source`, and `note` - `comparison_claim` on `analysis` - `rationale` on `decision` @@ -126,8 +126,8 @@ Good first metric vocabulary: ### 2. Start a line of attack 1. Read the current frontier and the recent DAG tail. -2. Record a `change`. -3. If needed, attach off-path `research` or `note` nodes first. +2. Record a `hypothesis`. +3. If needed, attach off-path `source` or `note` nodes first. ### 3. Execute one experiment @@ -183,7 +183,7 @@ That means we can already use it to test: - project initialization - schema visibility - frontier creation without a champion -- off-path research recording +- off-path source recording - hidden annotations - MCP read and write flows diff --git a/docs/product-spec.md b/docs/product-spec.md index 89d392c..efa57df 100644 --- a/docs/product-spec.md +++ b/docs/product-spec.md @@ -3,7 +3,7 @@ ## Thesis Fidget Spinner is a local-first, agent-first frontier machine for autonomous -program optimization and research. +program optimization, source capture, and experiment adjudication. The immediate target is brutally practical: replace gigantic freeform experiment markdown with a machine that preserves evidence as structure. @@ -96,7 +96,7 @@ Core-path work is disciplined and atomic. Off-path work is cheap and permissive. -The point is to avoid forcing every scrap of research through the full +The point is to avoid forcing every scrap of source digestion or note-taking through the full benchmark/decision bureaucracy while still preserving it in the DAG. ### 6. Completed core-path experiments are atomic @@ -117,7 +117,7 @@ low-level calls. Dirty worktree snapshots are useful as descriptive context, but a completed core-path experiment should anchor to a committed candidate checkpoint. -Off-path notes and research can remain lightweight and non-committal. +Off-path notes and source captures can remain lightweight and non-committal. ## Node Model @@ -180,7 +180,7 @@ the spine or project payload. These are the disciplined frontier-loop classes: - `contract` -- `change` +- `hypothesis` - `run` - `analysis` - `decision` @@ -189,8 +189,8 @@ These are the disciplined frontier-loop classes: These are deliberately low-ceremony: -- `research` -- `enabling` +- `source` +- `source` - `note` They exist so the product can absorb real thinking instead of forcing users and @@ -237,8 +237,8 @@ done. - disposable MCP worker execution runtime - bundled `fidget-spinner` base skill - bundled `frontier-loop` skill -- low-ceremony off-path note and research recording -- atomic core-path experiment closure +- low-ceremony off-path note and source recording +- explicit experiment open/close lifecycle for the core path ### Explicitly deferred from the MVP @@ -267,13 +267,16 @@ The initial tools should be: - `frontier.status` - `frontier.init` - `node.create` -- `change.record` +- `hypothesis.record` - `node.list` - `node.read` - `node.annotate` - `node.archive` - `note.quick` -- `research.record` +- `source.record` +- `experiment.open` +- `experiment.list` +- `experiment.read` - `experiment.close` - `skill.list` - `skill.show` @@ -282,8 +285,8 @@ The important point is not the exact names. The important point is the shape: - cheap read access to project and frontier context - cheap off-path writes -- low-ceremony change capture -- one atomic "close the experiment" tool +- low-ceremony hypothesis capture +- one explicit experiment-open step plus one experiment-close step - explicit operational introspection for long-lived agent sessions - explicit replay boundaries so side effects are never duplicated by accident @@ -295,11 +298,12 @@ The bundled skills should instruct agents to: 2. bind the MCP session to the target project before project-local reads or writes 3. read project schema, tag registry, and frontier state 4. pull context from the DAG instead of giant prose dumps -5. use `note.quick` and `research.record` freely off path, but always pass an explicit tag list for notes -6. use `change.record` before worktree thrash becomes ambiguous -7. use `experiment.close` to atomically seal core-path work -8. archive detritus instead of deleting it -9. use the base `fidget-spinner` skill for ordinary DAG work and add +5. use `note.quick` and `source.record` freely off path, but always pass an explicit tag list for notes +6. use `hypothesis.record` before worktree thrash becomes ambiguous +7. use `experiment.open` before running a live hypothesis-owned line +8. use `experiment.close` to seal that line with measured evidence +9. archive detritus instead of deleting it +10. use the base `fidget-spinner` skill for ordinary DAG work and add `frontier-loop` only when the task becomes a true autonomous frontier push ### MVP acceptance bar @@ -309,7 +313,7 @@ The MVP is successful when: - a project can be initialized locally with no hosted dependencies - an agent can inspect frontier state through MCP - an agent can inspect MCP health and telemetry through MCP -- an agent can record off-path research without bureaucratic pain +- an agent can record off-path sources and notes without bureaucratic pain - the project schema can softly declare whether payload fields are strings, numbers, booleans, or timestamps - an operator can inspect recent nodes through a minimal localhost web navigator filtered by tag - a git-backed project can close a real core-path experiment atomically |