diff options
| -rw-r--r-- | README.md | 99 | ||||
| -rw-r--r-- | assets/codex-skills/fidget-spinner/SKILL.md | 18 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/main.rs | 533 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/mcp/catalog.rs | 209 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/mcp/host/runtime.rs | 22 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/mcp/service.rs | 925 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/src/mcp/telemetry.rs | 1 | ||||
| -rw-r--r-- | crates/fidget-spinner-cli/tests/mcp_hardening.rs | 1040 | ||||
| -rw-r--r-- | crates/fidget-spinner-core/src/lib.rs | 7 | ||||
| -rw-r--r-- | crates/fidget-spinner-core/src/model.rs | 144 | ||||
| -rw-r--r-- | crates/fidget-spinner-store-sqlite/src/lib.rs | 2257 | ||||
| -rw-r--r-- | docs/architecture.md | 41 |
12 files changed, 5083 insertions, 213 deletions
@@ -63,6 +63,47 @@ cargo run -p fidget-spinner-cli -- frontier init \ --primary-metric-objective maximize ``` +Register project-level metric and run-dimension vocabulary before recording a +lot of experiments: + +```bash +cargo run -p fidget-spinner-cli -- schema upsert-field \ + --project . \ + --name scenario \ + --class change \ + --class analysis \ + --presence recommended \ + --severity warning \ + --role projection-gate \ + --inference manual-only \ + --type string +``` + +```bash +cargo run -p fidget-spinner-cli -- metric define \ + --project . \ + --key wall_clock_s \ + --unit seconds \ + --objective minimize \ + --description "elapsed wall time" +``` + +```bash +cargo run -p fidget-spinner-cli -- dimension define \ + --project . \ + --key scenario \ + --type string \ + --description "workload family" +``` + +```bash +cargo run -p fidget-spinner-cli -- dimension define \ + --project . \ + --key duration_s \ + --type numeric \ + --description "time budget in seconds" +``` + Record low-ceremony off-path work: ```bash @@ -76,17 +117,33 @@ cargo run -p fidget-spinner-cli -- tag add \ cargo run -p fidget-spinner-cli -- research add \ --project . \ --title "next feature slate" \ - --body "Investigate pruning, richer projections, and libgrid schema presets." + --summary "Investigate the next tranche of high-value product work." \ + --body "Investigate pruning, richer projections, and libgrid schema presets." \ + --tag dogfood/mvp ``` ```bash cargo run -p fidget-spinner-cli -- note quick \ --project . \ --title "first tagged note" \ + --summary "Tag-aware note capture is live." \ --body "Tag-aware note capture is live." \ --tag dogfood/mvp ``` +```bash +cargo run -p fidget-spinner-cli -- metric keys --project . +``` + +```bash +cargo run -p fidget-spinner-cli -- metric best \ + --project . \ + --key wall_clock_s \ + --dimension scenario=belt_4x5 \ + --dimension duration_s=60 \ + --source run-metric +``` + Serve the local MCP surface in unbound mode: ```bash @@ -145,6 +202,8 @@ The current MCP tools are: - `project.bind` - `project.status` - `project.schema` +- `schema.field.upsert` +- `schema.field.remove` - `tag.add` - `tag.list` - `frontier.list` @@ -158,10 +217,22 @@ The current MCP tools are: - `node.archive` - `note.quick` - `research.record` +- `metric.define` +- `metric.keys` +- `metric.best` +- `metric.migrate` +- `run.dimension.define` +- `run.dimension.list` - `experiment.close` - `skill.list` - `skill.show` +Nontrivial MCP tools follow the shared presentation contract: + +- `render=porcelain|json` chooses terse text vs structured JSON rendering +- `detail=concise|full` chooses triage payload vs widened detail +- porcelain is default and is intentionally not just pretty-printed JSON + Operationally, the MCP now runs as a stable host process that owns the public JSON-RPC session and delegates tool execution to an internal worker subprocess. Safe replay is only allowed for explicitly read-only operations and resources. @@ -172,6 +243,31 @@ created with `tag.add`, each with a required human description. `note.quick` accepts `tags: []` when no existing tag applies, but the field itself is still mandatory so note classification is always conscious. +`research.record` now also accepts optional `tags`, so rich imported documents +can join the same campaign/subsystem index as terse notes without falling back +to the generic escape hatch. + +`note.quick`, `research.record`, and generic `node create` for `note`/`research` +now enforce the same strict prose split: `title` is terse identity, `summary` +is the triage/search layer, and `body` holds the full text. List-like surfaces +stay on `title` + `summary`; full prose is for explicit reads only. + +Schema authoring no longer has to happen by hand in `.fidget_spinner/schema.json`. +The CLI exposes `schema upsert-field` / `schema remove-field`, and the MCP +surface exposes the corresponding `schema.field.upsert` / `schema.field.remove` +tools. The CLI uses space-separated subcommands; the MCP uses dotted tool names. + +Metrics and run dimensions are now project-level registries. Frontier contracts +still declare the evaluation metric vocabulary, but closed experiments report +only thin `key=value` metrics plus typed run dimensions. `metric.define` can +enrich metric descriptions, CLI `dimension define` / MCP `run.dimension.define` +preregister slicers such as `scenario` or `duration_s`, `metric.keys` +discovers rankable numeric signals, and `metric.best` ranks one key within +optional exact dimension filters. +Legacy `benchmark_suite` data is normalized into a builtin string dimension on +store open, and `metric.migrate` can be invoked explicitly as an idempotent +repair pass. + The intended flow is: 1. inspect `system.health` @@ -190,6 +286,7 @@ Off-path work does not require git. You can initialize a local project and use: - `research add` - `tag add` - `note quick` +- `metric keys` - `node annotate` - `mcp serve` diff --git a/assets/codex-skills/fidget-spinner/SKILL.md b/assets/codex-skills/fidget-spinner/SKILL.md index e4cb7b7..1e4c2a3 100644 --- a/assets/codex-skills/fidget-spinner/SKILL.md +++ b/assets/codex-skills/fidget-spinner/SKILL.md @@ -43,8 +43,15 @@ If you need more context, pull it from: - `tag.add` when a new note taxonomy token is genuinely needed; every tag must carry a description - `tag.list` before inventing note tags by memory -- `research.record` for exploratory work, design notes, dead ends, and enabling ideas -- `note.quick` for terse state pushes, always with an explicit `tags` list; use `[]` only when no registered tag applies +- `schema.field.upsert` when one project payload field needs to become canonical without hand-editing `schema.json` +- `schema.field.remove` when one project payload field definition should be purged cleanly +- `research.record` for exploratory work, design notes, dead ends, and enabling ideas; always pass `title`, `summary`, and `body`, and pass `tags` when the research belongs in a campaign/subsystem index +- `note.quick` for terse state pushes, always with an explicit `tags` list plus `title`, `summary`, and `body`; use `[]` only when no registered tag applies +- `metric.define` when a project-level metric key needs a canonical unit, objective, or human description +- `run.dimension.define` when a new experiment slicer such as `scenario` or `duration_s` becomes query-worthy +- `run.dimension.list` before guessing which run dimensions actually exist in the store +- `metric.keys` before guessing which numeric signals are actually rankable; pass exact run-dimension filters when narrowing to one workload slice +- `metric.best` when you need the best closed experiments by one numeric key; pass `order` for noncanonical payload fields and exact run-dimension filters when comparing one slice - `node.annotate` for scratch text that should stay off the main path - `change.record` before core-path work - `experiment.close` only when you have checkpoint, measured result, note, and verdict @@ -60,5 +67,10 @@ If you need more context, pull it from: and `system.telemetry` before pushing further. 5. Keep fetches narrow by default; widen only when stale or archived context is actually needed. -6. When the task becomes a true indefinite optimization push, pair this skill +6. Treat metric keys as project-level registry entries and run dimensions as the + first-class slice surface for experiment comparison; do not encode scenario + context into the metric key itself. +7. Porcelain is the terse triage surface. Use `detail=full` only when concise + output stops being decision-sufficient. +8. When the task becomes a true indefinite optimization push, pair this skill with `frontier-loop`. diff --git a/crates/fidget-spinner-cli/src/main.rs b/crates/fidget-spinner-cli/src/main.rs index fe4cb5f..3ad9534 100644 --- a/crates/fidget-spinner-cli/src/main.rs +++ b/crates/fidget-spinner-cli/src/main.rs @@ -10,13 +10,16 @@ use std::path::{Path, PathBuf}; use camino::{Utf8Path, Utf8PathBuf}; use clap::{Args, Parser, Subcommand, ValueEnum}; use fidget_spinner_core::{ - AnnotationVisibility, CodeSnapshotRef, CommandRecipe, ExecutionBackend, FrontierContract, - FrontierNote, FrontierVerdict, GitCommitHash, MetricObservation, MetricSpec, MetricUnit, - NodeAnnotation, NodeClass, NodePayload, NonEmptyText, OptimizationObjective, TagName, + AnnotationVisibility, CodeSnapshotRef, CommandRecipe, DiagnosticSeverity, ExecutionBackend, + FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote, FrontierVerdict, + GitCommitHash, InferencePolicy, MetricSpec, MetricUnit, MetricValue, NodeAnnotation, NodeClass, + NodePayload, NonEmptyText, OptimizationObjective, ProjectFieldSpec, TagName, }; use fidget_spinner_store_sqlite::{ - CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, EdgeAttachment, - EdgeAttachmentDirection, ListNodesQuery, ProjectStore, StoreError, + CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest, + DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ListNodesQuery, + MetricBestQuery, MetricFieldSource, MetricKeyQuery, MetricRankOrder, ProjectStore, + RemoveSchemaFieldRequest, StoreError, UpsertSchemaFieldRequest, }; use serde::Serialize; use serde_json::{Map, Value, json}; @@ -61,6 +64,16 @@ enum Command { }, /// Record off-path research and enabling work. Research(ResearchCommand), + /// Inspect rankable metrics across closed experiments. + Metric { + #[command(subcommand)] + command: MetricCommand, + }, + /// Define and inspect run dimensions used to slice experiment metrics. + Dimension { + #[command(subcommand)] + command: DimensionCommand, + }, /// Close a core-path experiment atomically. Experiment { #[command(subcommand)] @@ -100,6 +113,10 @@ struct InitArgs { enum SchemaCommand { /// Show the current project schema as JSON. Show(ProjectArg), + /// Add or replace one project schema field definition. + UpsertField(SchemaFieldUpsertArgs), + /// Remove one project schema field definition. + RemoveField(SchemaFieldRemoveArgs), } #[derive(Subcommand)] @@ -169,8 +186,10 @@ struct NodeAddArgs { #[arg(long)] title: String, #[arg(long)] + /// Required for `note` and `research` nodes. summary: Option<String>, #[arg(long = "payload-json")] + /// JSON object payload. `note` and `research` nodes require a non-empty `body` string. payload_json: Option<String>, #[arg(long = "payload-file")] payload_file: Option<PathBuf>, @@ -270,6 +289,74 @@ enum ResearchSubcommand { Add(QuickResearchArgs), } +#[derive(Subcommand)] +enum MetricCommand { + /// Register a project-level metric definition. + Define(MetricDefineArgs), + /// List rankable numeric keys observed in completed experiments. + Keys(MetricKeysArgs), + /// Rank completed experiments by one numeric key. + Best(MetricBestArgs), + /// Re-run the idempotent legacy metric-plane normalization. + Migrate(ProjectArg), +} + +#[derive(Subcommand)] +enum DimensionCommand { + /// Register a project-level run dimension definition. + Define(DimensionDefineArgs), + /// List run dimensions and sample values observed in completed runs. + List(ProjectArg), +} + +#[derive(Args)] +struct MetricDefineArgs { + #[command(flatten)] + project: ProjectArg, + /// Metric key used in experiment closure and ranking. + #[arg(long)] + key: String, + /// Canonical unit for this metric key. + #[arg(long, value_enum)] + unit: CliMetricUnit, + /// Optimization direction for this metric key. + #[arg(long, value_enum)] + objective: CliOptimizationObjective, + /// Optional human description shown in metric listings. + #[arg(long)] + description: Option<String>, +} + +#[derive(Args)] +struct MetricKeysArgs { + #[command(flatten)] + project: ProjectArg, + /// Restrict results to one frontier. + #[arg(long)] + frontier: Option<String>, + /// Restrict results to one metric source. + #[arg(long, value_enum)] + source: Option<CliMetricSource>, + /// Exact run-dimension filter in the form `key=value`. + #[arg(long = "dimension")] + dimensions: Vec<String>, +} + +#[derive(Args)] +struct DimensionDefineArgs { + #[command(flatten)] + project: ProjectArg, + /// Run-dimension key used to slice experiments. + #[arg(long)] + key: String, + /// Canonical value type for this run dimension. + #[arg(long = "type", value_enum)] + value_type: CliFieldValueType, + /// Optional human description shown in dimension listings. + #[arg(long)] + description: Option<String>, +} + #[derive(Args)] struct QuickNoteArgs { #[command(flatten)] @@ -279,6 +366,8 @@ struct QuickNoteArgs { #[arg(long)] title: String, #[arg(long)] + summary: String, + #[arg(long)] body: String, #[command(flatten)] tag_selection: ExplicitTagSelectionArgs, @@ -305,13 +394,69 @@ struct QuickResearchArgs { #[arg(long)] title: String, #[arg(long)] - body: String, + summary: String, #[arg(long)] - summary: Option<String>, + body: String, + #[command(flatten)] + tag_selection: ExplicitTagSelectionArgs, #[arg(long = "parent")] parents: Vec<String>, } +#[derive(Args)] +struct SchemaFieldUpsertArgs { + #[command(flatten)] + project: ProjectArg, + #[arg(long)] + name: String, + #[arg(long = "class", value_enum)] + classes: Vec<CliNodeClass>, + #[arg(long, value_enum)] + presence: CliFieldPresence, + #[arg(long, value_enum)] + severity: CliDiagnosticSeverity, + #[arg(long, value_enum)] + role: CliFieldRole, + #[arg(long = "inference", value_enum)] + inference_policy: CliInferencePolicy, + #[arg(long = "type", value_enum)] + value_type: Option<CliFieldValueType>, +} + +#[derive(Args)] +struct SchemaFieldRemoveArgs { + #[command(flatten)] + project: ProjectArg, + #[arg(long)] + name: String, + #[arg(long = "class", value_enum)] + classes: Vec<CliNodeClass>, +} + +#[derive(Args)] +struct MetricBestArgs { + #[command(flatten)] + project: ProjectArg, + /// Metric key to rank on. + #[arg(long)] + key: String, + /// Restrict results to one frontier. + #[arg(long)] + frontier: Option<String>, + /// Restrict results to one metric source. + #[arg(long, value_enum)] + source: Option<CliMetricSource>, + /// Explicit ordering for sources whose objective cannot be inferred. + #[arg(long, value_enum)] + order: Option<CliMetricOrder>, + /// Exact run-dimension filter in the form `key=value`. + #[arg(long = "dimension")] + dimensions: Vec<String>, + /// Maximum number of ranked experiments to return. + #[arg(long, default_value_t = 10)] + limit: u32, +} + #[derive(Subcommand)] enum ExperimentCommand { /// Close a core-path experiment with checkpoint, run, note, and verdict. @@ -348,24 +493,23 @@ struct ExperimentCloseArgs { run_title: String, #[arg(long = "run-summary")] run_summary: Option<String>, - #[arg(long = "benchmark-suite")] - benchmark_suite: String, + /// Repeat for each run dimension as `key=value`. + #[arg(long = "dimension")] + dimensions: Vec<String>, #[arg(long = "backend", value_enum, default_value_t = CliExecutionBackend::Worktree)] backend: CliExecutionBackend, #[arg(long = "cwd")] working_directory: Option<PathBuf>, + /// Repeat for each argv token passed to the recorded command. #[arg(long = "argv")] argv: Vec<String>, + /// Repeat for each environment override as `KEY=VALUE`. #[arg(long = "env")] env: Vec<String>, - #[arg(long = "primary-metric-key")] - primary_metric_key: String, - #[arg(long = "primary-metric-unit", value_enum)] - primary_metric_unit: CliMetricUnit, - #[arg(long = "primary-metric-objective", value_enum)] - primary_metric_objective: CliOptimizationObjective, - #[arg(long = "primary-metric-value")] - primary_metric_value: f64, + /// Primary metric in the form `key=value`; key must be preregistered. + #[arg(long = "primary-metric")] + primary_metric: String, + /// Supporting metric in the form `key=value`; repeat as needed. #[arg(long = "metric")] metrics: Vec<String>, #[arg(long)] @@ -475,6 +619,57 @@ enum CliExecutionBackend { } #[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +enum CliMetricSource { + RunMetric, + ChangePayload, + RunPayload, + AnalysisPayload, + DecisionPayload, +} + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +enum CliMetricOrder { + Asc, + Desc, +} + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +enum CliFieldValueType { + String, + Numeric, + Boolean, + Timestamp, +} + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +enum CliDiagnosticSeverity { + Error, + Warning, + Info, +} + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +enum CliFieldPresence { + Required, + Recommended, + Optional, +} + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +enum CliFieldRole { + Index, + ProjectionGate, + RenderOnly, + Opaque, +} + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +enum CliInferencePolicy { + ManualOnly, + ModelMayInfer, +} + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] enum CliFrontierVerdict { PromoteToChampion, KeepOnFrontier, @@ -499,6 +694,8 @@ fn run() -> Result<(), StoreError> { let store = open_store(&project.project)?; print_json(store.schema()) } + SchemaCommand::UpsertField(args) => run_schema_field_upsert(args), + SchemaCommand::RemoveField(args) => run_schema_field_remove(args), }, Command::Frontier { command } => match command { FrontierCommand::Init(args) => run_frontier_init(args), @@ -521,6 +718,16 @@ fn run() -> Result<(), StoreError> { Command::Research(command) => match command.command { ResearchSubcommand::Add(args) => run_quick_research(args), }, + Command::Metric { command } => match command { + MetricCommand::Define(args) => run_metric_define(args), + MetricCommand::Keys(args) => run_metric_keys(args), + MetricCommand::Best(args) => run_metric_best(args), + MetricCommand::Migrate(project) => run_metric_migrate(project), + }, + Command::Dimension { command } => match command { + DimensionCommand::Define(args) => run_dimension_define(args), + DimensionCommand::List(project) => run_dimension_list(project), + }, Command::Experiment { command } => match command { ExperimentCommand::Close(args) => run_experiment_close(args), }, @@ -597,27 +804,58 @@ fn run_frontier_status(args: FrontierStatusArgs) -> Result<(), StoreError> { print_json(&frontiers) } +fn run_schema_field_upsert(args: SchemaFieldUpsertArgs) -> Result<(), StoreError> { + let mut store = open_store(&args.project.project)?; + let field = store.upsert_schema_field(UpsertSchemaFieldRequest { + name: NonEmptyText::new(args.name)?, + node_classes: parse_node_class_set(args.classes), + presence: args.presence.into(), + severity: args.severity.into(), + role: args.role.into(), + inference_policy: args.inference_policy.into(), + value_type: args.value_type.map(Into::into), + })?; + print_json(&json!({ + "schema": store.schema().schema_ref(), + "field": schema_field_json(&field), + })) +} + +fn run_schema_field_remove(args: SchemaFieldRemoveArgs) -> Result<(), StoreError> { + let mut store = open_store(&args.project.project)?; + let removed_count = store.remove_schema_field(RemoveSchemaFieldRequest { + name: NonEmptyText::new(args.name)?, + node_classes: (!args.classes.is_empty()).then(|| parse_node_class_set(args.classes)), + })?; + print_json(&json!({ + "schema": store.schema().schema_ref(), + "removed_count": removed_count, + })) +} + fn run_node_add(args: NodeAddArgs) -> Result<(), StoreError> { let mut store = open_store(&args.project.project)?; + let class: NodeClass = args.class.into(); let frontier_id = args .frontier .as_deref() .map(parse_frontier_id) .transpose()?; - let tags = optional_cli_tags(args.tag_selection, args.class == CliNodeClass::Note)?; + let tags = optional_cli_tags(args.tag_selection, class == NodeClass::Note)?; let payload = load_payload( store.schema().schema_ref(), args.payload_json, args.payload_file, args.fields, )?; + validate_cli_prose_payload(class, args.summary.as_deref(), &payload)?; let annotations = args .annotations .into_iter() .map(|body| Ok(NodeAnnotation::hidden(NonEmptyText::new(body)?))) .collect::<Result<Vec<_>, StoreError>>()?; let node = store.add_node(CreateNodeRequest { - class: args.class.into(), + class, frontier_id, title: NonEmptyText::new(args.title)?, summary: args.summary.map(NonEmptyText::new).transpose()?, @@ -693,7 +931,7 @@ fn run_quick_note(args: QuickNoteArgs) -> Result<(), StoreError> { .map(parse_frontier_id) .transpose()?, title: NonEmptyText::new(args.title)?, - summary: None, + summary: Some(NonEmptyText::new(args.summary)?), tags: Some(explicit_cli_tags(args.tag_selection)?), payload, annotations: Vec::new(), @@ -730,8 +968,8 @@ fn run_quick_research(args: QuickResearchArgs) -> Result<(), StoreError> { .map(parse_frontier_id) .transpose()?, title: NonEmptyText::new(args.title)?, - summary: args.summary.map(NonEmptyText::new).transpose()?, - tags: None, + summary: Some(NonEmptyText::new(args.summary)?), + tags: optional_cli_tags(args.tag_selection, false)?, payload, annotations: Vec::new(), attachments: lineage_attachments(args.parents)?, @@ -739,6 +977,69 @@ fn run_quick_research(args: QuickResearchArgs) -> Result<(), StoreError> { print_json(&node) } +fn run_metric_define(args: MetricDefineArgs) -> Result<(), StoreError> { + let mut store = open_store(&args.project.project)?; + let record = store.define_metric(DefineMetricRequest { + key: NonEmptyText::new(args.key)?, + unit: args.unit.into(), + objective: args.objective.into(), + description: args.description.map(NonEmptyText::new).transpose()?, + })?; + print_json(&record) +} + +fn run_metric_keys(args: MetricKeysArgs) -> Result<(), StoreError> { + let store = open_store(&args.project.project)?; + print_json( + &store.list_metric_keys_filtered(MetricKeyQuery { + frontier_id: args + .frontier + .as_deref() + .map(parse_frontier_id) + .transpose()?, + source: args.source.map(Into::into), + dimensions: coerce_cli_dimension_filters(&store, args.dimensions)?, + })?, + ) +} + +fn run_metric_best(args: MetricBestArgs) -> Result<(), StoreError> { + let store = open_store(&args.project.project)?; + let entries = store.best_metrics(MetricBestQuery { + key: NonEmptyText::new(args.key)?, + frontier_id: args + .frontier + .as_deref() + .map(parse_frontier_id) + .transpose()?, + source: args.source.map(Into::into), + dimensions: coerce_cli_dimension_filters(&store, args.dimensions)?, + order: args.order.map(Into::into), + limit: args.limit, + })?; + print_json(&entries) +} + +fn run_metric_migrate(args: ProjectArg) -> Result<(), StoreError> { + let mut store = open_store(&args.project)?; + print_json(&store.migrate_metric_plane()?) +} + +fn run_dimension_define(args: DimensionDefineArgs) -> Result<(), StoreError> { + let mut store = open_store(&args.project.project)?; + let record = store.define_run_dimension(DefineRunDimensionRequest { + key: NonEmptyText::new(args.key)?, + value_type: args.value_type.into(), + description: args.description.map(NonEmptyText::new).transpose()?, + })?; + print_json(&record) +} + +fn run_dimension_list(args: ProjectArg) -> Result<(), StoreError> { + let store = open_store(&args.project)?; + print_json(&store.list_run_dimensions()?) +} + fn run_experiment_close(args: ExperimentCloseArgs) -> Result<(), StoreError> { let mut store = open_store(&args.project.project)?; let frontier_id = parse_frontier_id(&args.frontier)?; @@ -764,19 +1065,14 @@ fn run_experiment_close(args: ExperimentCloseArgs) -> Result<(), StoreError> { run_title: NonEmptyText::new(args.run_title)?, run_summary: args.run_summary.map(NonEmptyText::new).transpose()?, backend: args.backend.into(), - benchmark_suite: NonEmptyText::new(args.benchmark_suite)?, + dimensions: coerce_cli_dimension_filters(&store, args.dimensions)?, command, code_snapshot: Some(capture_code_snapshot(store.project_root())?), - primary_metric: MetricObservation { - metric_key: NonEmptyText::new(args.primary_metric_key)?, - unit: args.primary_metric_unit.into(), - objective: args.primary_metric_objective.into(), - value: args.primary_metric_value, - }, + primary_metric: parse_metric_value(args.primary_metric)?, supporting_metrics: args .metrics .into_iter() - .map(parse_metric_observation) + .map(parse_metric_value) .collect::<Result<Vec<_>, _>>()?, note: FrontierNote { summary: NonEmptyText::new(args.note)?, @@ -1011,6 +1307,23 @@ fn load_payload( Ok(NodePayload::with_schema(schema, map)) } +fn validate_cli_prose_payload( + class: NodeClass, + summary: Option<&str>, + payload: &NodePayload, +) -> Result<(), StoreError> { + if !matches!(class, NodeClass::Note | NodeClass::Research) { + return Ok(()); + } + if summary.is_none() { + return Err(StoreError::ProseSummaryRequired(class)); + } + match payload.field("body") { + Some(Value::String(body)) if !body.trim().is_empty() => Ok(()), + _ => Err(StoreError::ProseBodyRequired(class)), + } +} + fn json_object(value: Value) -> Result<Map<String, Value>, StoreError> { match value { Value::Object(map) => Ok(map), @@ -1020,6 +1333,22 @@ fn json_object(value: Value) -> Result<Map<String, Value>, StoreError> { } } +fn schema_field_json(field: &ProjectFieldSpec) -> Value { + json!({ + "name": field.name, + "node_classes": field.node_classes.iter().map(ToString::to_string).collect::<Vec<_>>(), + "presence": field.presence.as_str(), + "severity": field.severity.as_str(), + "role": field.role.as_str(), + "inference_policy": field.inference_policy.as_str(), + "value_type": field.value_type.map(FieldValueType::as_str), + }) +} + +fn parse_node_class_set(classes: Vec<CliNodeClass>) -> BTreeSet<NodeClass> { + classes.into_iter().map(Into::into).collect() +} + fn capture_code_snapshot(project_root: &Utf8Path) -> Result<CodeSnapshotRef, StoreError> { let head_commit = run_git(project_root, &["rev-parse", "HEAD"])?; let dirty_paths = run_git(project_root, &["status", "--porcelain"])? @@ -1084,23 +1413,71 @@ fn maybe_print_gitignore_hint(project_root: &Utf8Path) -> Result<(), StoreError> } } -fn parse_metric_observation(raw: String) -> Result<MetricObservation, StoreError> { - let parts = raw.split(':').collect::<Vec<_>>(); - if parts.len() != 4 { - return Err(invalid_input( - "metrics must look like key:unit:objective:value", - )); - } - Ok(MetricObservation { - metric_key: NonEmptyText::new(parts[0])?, - unit: parse_metric_unit(parts[1])?, - objective: parse_optimization_objective(parts[2])?, - value: parts[3] +fn parse_metric_value(raw: String) -> Result<MetricValue, StoreError> { + let Some((key, value)) = raw.split_once('=') else { + return Err(invalid_input("metrics must look like key=value")); + }; + Ok(MetricValue { + key: NonEmptyText::new(key)?, + value: value .parse::<f64>() .map_err(|error| invalid_input(format!("invalid metric value: {error}")))?, }) } +fn coerce_cli_dimension_filters( + store: &ProjectStore, + raw_dimensions: Vec<String>, +) -> Result<BTreeMap<NonEmptyText, fidget_spinner_core::RunDimensionValue>, StoreError> { + let definitions = store + .list_run_dimensions()? + .into_iter() + .map(|summary| (summary.key.to_string(), summary.value_type)) + .collect::<BTreeMap<_, _>>(); + let raw_dimensions = parse_dimension_assignments(raw_dimensions)? + .into_iter() + .map(|(key, raw_value)| { + let Some(value_type) = definitions.get(&key) else { + return Err(invalid_input(format!( + "unknown run dimension `{key}`; register it first" + ))); + }; + Ok((key, parse_cli_dimension_value(*value_type, &raw_value)?)) + }) + .collect::<Result<BTreeMap<_, _>, StoreError>>()?; + store.coerce_run_dimensions(raw_dimensions) +} + +fn parse_dimension_assignments( + raw_dimensions: Vec<String>, +) -> Result<BTreeMap<String, String>, StoreError> { + raw_dimensions + .into_iter() + .map(|raw| { + let Some((key, value)) = raw.split_once('=') else { + return Err(invalid_input("dimensions must look like key=value")); + }; + Ok((key.to_owned(), value.to_owned())) + }) + .collect() +} + +fn parse_cli_dimension_value(value_type: FieldValueType, raw: &str) -> Result<Value, StoreError> { + match value_type { + FieldValueType::String | FieldValueType::Timestamp => Ok(Value::String(raw.to_owned())), + FieldValueType::Numeric => Ok(json!(raw.parse::<f64>().map_err(|error| { + invalid_input(format!("invalid numeric dimension value: {error}")) + })?)), + FieldValueType::Boolean => match raw { + "true" => Ok(Value::Bool(true)), + "false" => Ok(Value::Bool(false)), + other => Err(invalid_input(format!( + "invalid boolean dimension value `{other}`" + ))), + }, + } +} + fn parse_metric_unit(raw: &str) -> Result<MetricUnit, StoreError> { match raw { "seconds" => Ok(MetricUnit::Seconds), @@ -1204,6 +1581,78 @@ impl From<CliExecutionBackend> for ExecutionBackend { } } +impl From<CliMetricSource> for MetricFieldSource { + fn from(value: CliMetricSource) -> Self { + match value { + CliMetricSource::RunMetric => Self::RunMetric, + CliMetricSource::ChangePayload => Self::ChangePayload, + CliMetricSource::RunPayload => Self::RunPayload, + CliMetricSource::AnalysisPayload => Self::AnalysisPayload, + CliMetricSource::DecisionPayload => Self::DecisionPayload, + } + } +} + +impl From<CliMetricOrder> for MetricRankOrder { + fn from(value: CliMetricOrder) -> Self { + match value { + CliMetricOrder::Asc => Self::Asc, + CliMetricOrder::Desc => Self::Desc, + } + } +} + +impl From<CliFieldValueType> for FieldValueType { + fn from(value: CliFieldValueType) -> Self { + match value { + CliFieldValueType::String => Self::String, + CliFieldValueType::Numeric => Self::Numeric, + CliFieldValueType::Boolean => Self::Boolean, + CliFieldValueType::Timestamp => Self::Timestamp, + } + } +} + +impl From<CliDiagnosticSeverity> for DiagnosticSeverity { + fn from(value: CliDiagnosticSeverity) -> Self { + match value { + CliDiagnosticSeverity::Error => Self::Error, + CliDiagnosticSeverity::Warning => Self::Warning, + CliDiagnosticSeverity::Info => Self::Info, + } + } +} + +impl From<CliFieldPresence> for FieldPresence { + fn from(value: CliFieldPresence) -> Self { + match value { + CliFieldPresence::Required => Self::Required, + CliFieldPresence::Recommended => Self::Recommended, + CliFieldPresence::Optional => Self::Optional, + } + } +} + +impl From<CliFieldRole> for FieldRole { + fn from(value: CliFieldRole) -> Self { + match value { + CliFieldRole::Index => Self::Index, + CliFieldRole::ProjectionGate => Self::ProjectionGate, + CliFieldRole::RenderOnly => Self::RenderOnly, + CliFieldRole::Opaque => Self::Opaque, + } + } +} + +impl From<CliInferencePolicy> for InferencePolicy { + fn from(value: CliInferencePolicy) -> Self { + match value { + CliInferencePolicy::ManualOnly => Self::ManualOnly, + CliInferencePolicy::ModelMayInfer => Self::ModelMayInfer, + } + } +} + impl From<CliFrontierVerdict> for FrontierVerdict { fn from(value: CliFrontierVerdict) -> Self { match value { diff --git a/crates/fidget-spinner-cli/src/mcp/catalog.rs b/crates/fidget-spinner-cli/src/mcp/catalog.rs index b23cb31..0831ba4 100644 --- a/crates/fidget-spinner-cli/src/mcp/catalog.rs +++ b/crates/fidget-spinner-cli/src/mcp/catalog.rs @@ -67,6 +67,18 @@ pub(crate) fn tool_spec(name: &str) -> Option<ToolSpec> { dispatch: DispatchTarget::Worker, replay: ReplayContract::Convergent, }), + "schema.field.upsert" => Some(ToolSpec { + name: "schema.field.upsert", + description: "Add or replace one project-local payload schema field definition.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::NeverReplay, + }), + "schema.field.remove" => Some(ToolSpec { + name: "schema.field.remove", + description: "Remove one project-local payload schema field definition, optionally narrowed by node-class set.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::NeverReplay, + }), "tag.add" => Some(ToolSpec { name: "tag.add", description: "Register one repo-local tag with a required description. Notes may only reference tags from this registry.", @@ -145,9 +157,45 @@ pub(crate) fn tool_spec(name: &str) -> Option<ToolSpec> { dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), + "metric.define" => Some(ToolSpec { + name: "metric.define", + description: "Register one project-level metric definition so experiment ingestion only has to send key/value observations.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::NeverReplay, + }), + "run.dimension.define" => Some(ToolSpec { + name: "run.dimension.define", + description: "Register one project-level run dimension used to slice metrics across scenarios, budgets, and flags.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::NeverReplay, + }), + "run.dimension.list" => Some(ToolSpec { + name: "run.dimension.list", + description: "List registered run dimensions together with observed value counts and sample values.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::Convergent, + }), + "metric.keys" => Some(ToolSpec { + name: "metric.keys", + description: "List rankable metric keys, including registered run metrics and observed payload-derived numeric fields.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::Convergent, + }), + "metric.best" => Some(ToolSpec { + name: "metric.best", + description: "Rank completed experiments by one numeric key, with optional run-dimension filters and candidate commit surfacing.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::Convergent, + }), + "metric.migrate" => Some(ToolSpec { + name: "metric.migrate", + description: "Re-run the idempotent legacy metric-plane normalization that registers canonical metrics and backfills benchmark_suite dimensions.", + dispatch: DispatchTarget::Worker, + replay: ReplayContract::NeverReplay, + }), "experiment.close" => Some(ToolSpec { name: "experiment.close", - description: "Atomically close a core-path experiment with candidate checkpoint capture, measured result, note, and verdict.", + description: "Atomically close a core-path experiment with typed run dimensions, preregistered metric observations, candidate checkpoint capture, note, and verdict.", dispatch: DispatchTarget::Worker, replay: ReplayContract::NeverReplay, }), @@ -212,6 +260,8 @@ pub(crate) fn tool_definitions() -> Vec<Value> { "project.bind", "project.status", "project.schema", + "schema.field.upsert", + "schema.field.remove", "tag.add", "tag.list", "frontier.list", @@ -225,6 +275,12 @@ pub(crate) fn tool_definitions() -> Vec<Value> { "node.archive", "note.quick", "research.record", + "metric.define", + "run.dimension.define", + "run.dimension.list", + "metric.keys", + "metric.best", + "metric.migrate", "experiment.close", "skill.list", "skill.show", @@ -277,7 +333,32 @@ pub(crate) fn list_resources() -> Vec<Value> { fn input_schema(name: &str) -> Value { match name { "project.status" | "project.schema" | "tag.list" | "skill.list" | "system.health" - | "system.telemetry" => json!({"type":"object","additionalProperties":false}), + | "system.telemetry" | "run.dimension.list" | "metric.migrate" => { + json!({"type":"object","additionalProperties":false}) + } + "schema.field.upsert" => json!({ + "type": "object", + "properties": { + "name": { "type": "string", "description": "Project payload field name." }, + "node_classes": { "type": "array", "items": node_class_schema(), "description": "Optional node-class scope. Omit or pass [] for all classes." }, + "presence": field_presence_schema(), + "severity": diagnostic_severity_schema(), + "role": field_role_schema(), + "inference_policy": inference_policy_schema(), + "value_type": field_value_type_schema(), + }, + "required": ["name", "presence", "severity", "role", "inference_policy"], + "additionalProperties": false + }), + "schema.field.remove" => json!({ + "type": "object", + "properties": { + "name": { "type": "string", "description": "Project payload field name." }, + "node_classes": { "type": "array", "items": node_class_schema(), "description": "Optional exact node-class scope to remove." } + }, + "required": ["name"], + "additionalProperties": false + }), "project.bind" => json!({ "type": "object", "properties": { @@ -333,9 +414,9 @@ fn input_schema(name: &str) -> Value { "class": node_class_schema(), "frontier_id": { "type": "string" }, "title": { "type": "string" }, - "summary": { "type": "string" }, - "tags": { "type": "array", "items": tag_name_schema() }, - "payload": { "type": "object" }, + "summary": { "type": "string", "description": "Required for `note` and `research` nodes." }, + "tags": { "type": "array", "items": tag_name_schema(), "description": "Required for `note` nodes; optional for other classes." }, + "payload": { "type": "object", "description": "`note` and `research` nodes require a non-empty string `body` field." }, "annotations": { "type": "array", "items": annotation_schema() }, "parents": { "type": "array", "items": { "type": "string" } } }, @@ -393,12 +474,13 @@ fn input_schema(name: &str) -> Value { "properties": { "frontier_id": { "type": "string" }, "title": { "type": "string" }, + "summary": { "type": "string" }, "body": { "type": "string" }, "tags": { "type": "array", "items": tag_name_schema() }, "annotations": { "type": "array", "items": annotation_schema() }, "parents": { "type": "array", "items": { "type": "string" } } }, - "required": ["title", "body", "tags"], + "required": ["title", "summary", "body", "tags"], "additionalProperties": false }), "research.record" => json!({ @@ -408,10 +490,54 @@ fn input_schema(name: &str) -> Value { "title": { "type": "string" }, "summary": { "type": "string" }, "body": { "type": "string" }, + "tags": { "type": "array", "items": tag_name_schema() }, "annotations": { "type": "array", "items": annotation_schema() }, "parents": { "type": "array", "items": { "type": "string" } } }, - "required": ["title", "body"], + "required": ["title", "summary", "body"], + "additionalProperties": false + }), + "metric.define" => json!({ + "type": "object", + "properties": { + "key": { "type": "string" }, + "unit": metric_unit_schema(), + "objective": optimization_objective_schema(), + "description": { "type": "string" } + }, + "required": ["key", "unit", "objective"], + "additionalProperties": false + }), + "run.dimension.define" => json!({ + "type": "object", + "properties": { + "key": { "type": "string" }, + "value_type": field_value_type_schema(), + "description": { "type": "string" } + }, + "required": ["key", "value_type"], + "additionalProperties": false + }), + "metric.keys" => json!({ + "type": "object", + "properties": { + "frontier_id": { "type": "string" }, + "source": metric_source_schema(), + "dimensions": { "type": "object" } + }, + "additionalProperties": false + }), + "metric.best" => json!({ + "type": "object", + "properties": { + "key": { "type": "string" }, + "frontier_id": { "type": "string" }, + "source": metric_source_schema(), + "dimensions": { "type": "object" }, + "order": metric_order_schema(), + "limit": { "type": "integer", "minimum": 1, "maximum": 500 } + }, + "required": ["key"], "additionalProperties": false }), "experiment.close" => json!({ @@ -422,8 +548,8 @@ fn input_schema(name: &str) -> Value { "change_node_id": { "type": "string" }, "candidate_summary": { "type": "string" }, "run": run_schema(), - "primary_metric": metric_observation_schema(), - "supporting_metrics": { "type": "array", "items": metric_observation_schema() }, + "primary_metric": metric_value_schema(), + "supporting_metrics": { "type": "array", "items": metric_value_schema() }, "note": note_schema(), "verdict": verdict_schema(), "decision_title": { "type": "string" }, @@ -461,16 +587,14 @@ fn metric_spec_schema() -> Value { }) } -fn metric_observation_schema() -> Value { +fn metric_value_schema() -> Value { json!({ "type": "object", "properties": { "key": { "type": "string" }, - "unit": metric_unit_schema(), - "objective": optimization_objective_schema(), "value": { "type": "number" } }, - "required": ["key", "unit", "objective", "value"], + "required": ["key", "value"], "additionalProperties": false }) } @@ -509,6 +633,61 @@ fn metric_unit_schema() -> Value { }) } +fn metric_source_schema() -> Value { + json!({ + "type": "string", + "enum": [ + "run_metric", + "change_payload", + "run_payload", + "analysis_payload", + "decision_payload" + ] + }) +} + +fn metric_order_schema() -> Value { + json!({ + "type": "string", + "enum": ["asc", "desc"] + }) +} + +fn field_value_type_schema() -> Value { + json!({ + "type": "string", + "enum": ["string", "numeric", "boolean", "timestamp"] + }) +} + +fn diagnostic_severity_schema() -> Value { + json!({ + "type": "string", + "enum": ["error", "warning", "info"] + }) +} + +fn field_presence_schema() -> Value { + json!({ + "type": "string", + "enum": ["required", "recommended", "optional"] + }) +} + +fn field_role_schema() -> Value { + json!({ + "type": "string", + "enum": ["index", "projection_gate", "render_only", "opaque"] + }) +} + +fn inference_policy_schema() -> Value { + json!({ + "type": "string", + "enum": ["manual_only", "model_may_infer"] + }) +} + fn optimization_objective_schema() -> Value { json!({ "type": "string", @@ -539,7 +718,7 @@ fn run_schema() -> Value { "type": "string", "enum": ["local_process", "worktree_process", "ssh_process"] }, - "benchmark_suite": { "type": "string" }, + "dimensions": { "type": "object" }, "command": { "type": "object", "properties": { @@ -554,7 +733,7 @@ fn run_schema() -> Value { "additionalProperties": false } }, - "required": ["title", "backend", "benchmark_suite", "command"], + "required": ["title", "backend", "dimensions", "command"], "additionalProperties": false }) } diff --git a/crates/fidget-spinner-cli/src/mcp/host/runtime.rs b/crates/fidget-spinner-cli/src/mcp/host/runtime.rs index f84f604..d57a21e 100644 --- a/crates/fidget-spinner-cli/src/mcp/host/runtime.rs +++ b/crates/fidget-spinner-cli/src/mcp/host/runtime.rs @@ -838,20 +838,6 @@ fn system_health_output(health: &HealthSnapshot) -> Result<ToolOutput, FaultReco "rollout_pending".to_owned(), json!(health.binary.rollout_pending), ); - if let Some(fault) = health.last_fault.as_ref() { - let _ = concise.insert( - "last_fault".to_owned(), - json!({ - "kind": format!("{:?}", fault.kind).to_ascii_lowercase(), - "stage": format!("{:?}", fault.stage).to_ascii_lowercase(), - "operation": fault.operation, - "message": fault.message, - "retryable": fault.retryable, - "retried": fault.retried, - }), - ); - } - let mut lines = vec![format!( "{} | {}", if health.initialization.ready && health.initialization.seed_captured { @@ -886,14 +872,6 @@ fn system_health_output(health: &HealthSnapshot) -> Result<ToolOutput, FaultReco "" } )); - if let Some(fault) = health.last_fault.as_ref() { - lines.push(format!( - "fault: {} {} {}", - format!("{:?}", fault.kind).to_ascii_lowercase(), - fault.operation, - fault.message, - )); - } detailed_tool_output( &Value::Object(concise), health, diff --git a/crates/fidget-spinner-cli/src/mcp/service.rs b/crates/fidget-spinner-cli/src/mcp/service.rs index aee53e0..62e3641 100644 --- a/crates/fidget-spinner-cli/src/mcp/service.rs +++ b/crates/fidget-spinner-cli/src/mcp/service.rs @@ -3,15 +3,18 @@ use std::fs; use camino::{Utf8Path, Utf8PathBuf}; use fidget_spinner_core::{ - AdmissionState, AnnotationVisibility, CodeSnapshotRef, CommandRecipe, ExecutionBackend, - FrontierContract, FrontierNote, FrontierProjection, FrontierRecord, FrontierVerdict, - MetricObservation, MetricSpec, MetricUnit, NodeAnnotation, NodeClass, NodePayload, - NonEmptyText, ProjectSchema, TagName, TagRecord, + AdmissionState, AnnotationVisibility, CodeSnapshotRef, CommandRecipe, DiagnosticSeverity, + ExecutionBackend, FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote, + FrontierProjection, FrontierRecord, FrontierVerdict, InferencePolicy, MetricSpec, MetricUnit, + MetricValue, NodeAnnotation, NodeClass, NodePayload, NonEmptyText, ProjectFieldSpec, + ProjectSchema, RunDimensionValue, TagName, TagRecord, }; use fidget_spinner_store_sqlite::{ - CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, EdgeAttachment, - EdgeAttachmentDirection, ExperimentReceipt, ListNodesQuery, NodeSummary, ProjectStore, - StoreError, + CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest, + DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ExperimentReceipt, + ListNodesQuery, MetricBestQuery, MetricFieldSource, MetricKeyQuery, MetricKeySummary, + MetricRankOrder, NodeSummary, ProjectStore, RemoveSchemaFieldRequest, StoreError, + UpsertSchemaFieldRequest, }; use serde::Deserialize; use serde_json::{Map, Value, json}; @@ -74,6 +77,73 @@ impl WorkerService { FaultStage::Worker, "tools/call:project.schema", ), + "schema.field.upsert" => { + let args = deserialize::<SchemaFieldUpsertToolArgs>(arguments)?; + let field = self + .store + .upsert_schema_field(UpsertSchemaFieldRequest { + name: NonEmptyText::new(args.name) + .map_err(store_fault("tools/call:schema.field.upsert"))?, + node_classes: args + .node_classes + .unwrap_or_default() + .into_iter() + .map(|class| { + parse_node_class_name(&class) + .map_err(store_fault("tools/call:schema.field.upsert")) + }) + .collect::<Result<_, _>>()?, + presence: parse_field_presence_name(&args.presence) + .map_err(store_fault("tools/call:schema.field.upsert"))?, + severity: parse_diagnostic_severity_name(&args.severity) + .map_err(store_fault("tools/call:schema.field.upsert"))?, + role: parse_field_role_name(&args.role) + .map_err(store_fault("tools/call:schema.field.upsert"))?, + inference_policy: parse_inference_policy_name(&args.inference_policy) + .map_err(store_fault("tools/call:schema.field.upsert"))?, + value_type: args + .value_type + .as_deref() + .map(parse_field_value_type_name) + .transpose() + .map_err(store_fault("tools/call:schema.field.upsert"))?, + }) + .map_err(store_fault("tools/call:schema.field.upsert"))?; + tool_success( + schema_field_upsert_output(self.store.schema(), &field)?, + presentation, + FaultStage::Worker, + "tools/call:schema.field.upsert", + ) + } + "schema.field.remove" => { + let args = deserialize::<SchemaFieldRemoveToolArgs>(arguments)?; + let removed_count = self + .store + .remove_schema_field(RemoveSchemaFieldRequest { + name: NonEmptyText::new(args.name) + .map_err(store_fault("tools/call:schema.field.remove"))?, + node_classes: args + .node_classes + .map(|node_classes| { + node_classes + .into_iter() + .map(|class| { + parse_node_class_name(&class) + .map_err(store_fault("tools/call:schema.field.remove")) + }) + .collect::<Result<_, _>>() + }) + .transpose()?, + }) + .map_err(store_fault("tools/call:schema.field.remove"))?; + tool_success( + schema_field_remove_output(self.store.schema(), removed_count)?, + presentation, + FaultStage::Worker, + "tools/call:schema.field.remove", + ) + } "tag.add" => { let args = deserialize::<TagAddToolArgs>(arguments)?; let tag = self @@ -402,7 +472,10 @@ impl WorkerService { .map_err(store_fault("tools/call:note.quick"))?, title: NonEmptyText::new(args.title) .map_err(store_fault("tools/call:note.quick"))?, - summary: None, + summary: Some( + NonEmptyText::new(args.summary) + .map_err(store_fault("tools/call:note.quick"))?, + ), tags: Some( parse_tag_set(args.tags) .map_err(store_fault("tools/call:note.quick"))?, @@ -439,12 +512,14 @@ impl WorkerService { .map_err(store_fault("tools/call:research.record"))?, title: NonEmptyText::new(args.title) .map_err(store_fault("tools/call:research.record"))?, - summary: args - .summary - .map(NonEmptyText::new) - .transpose() - .map_err(store_fault("tools/call:research.record"))?, - tags: None, + summary: Some( + NonEmptyText::new(args.summary) + .map_err(store_fault("tools/call:research.record"))?, + ), + tags: Some( + parse_tag_set(args.tags) + .map_err(store_fault("tools/call:research.record"))?, + ), payload: NodePayload::with_schema( self.store.schema().schema_ref(), crate::json_object(json!({ "body": args.body })) @@ -463,6 +538,170 @@ impl WorkerService { "tools/call:research.record", ) } + "metric.define" => { + let args = deserialize::<MetricDefineToolArgs>(arguments)?; + let metric = self + .store + .define_metric(DefineMetricRequest { + key: NonEmptyText::new(args.key) + .map_err(store_fault("tools/call:metric.define"))?, + unit: parse_metric_unit_name(&args.unit) + .map_err(store_fault("tools/call:metric.define"))?, + objective: crate::parse_optimization_objective(&args.objective) + .map_err(store_fault("tools/call:metric.define"))?, + description: args + .description + .map(NonEmptyText::new) + .transpose() + .map_err(store_fault("tools/call:metric.define"))?, + }) + .map_err(store_fault("tools/call:metric.define"))?; + tool_success( + json_created_output( + "registered metric", + json!({ + "key": metric.key, + "unit": metric_unit_name(metric.unit), + "objective": metric_objective_name(metric.objective), + "description": metric.description, + }), + "tools/call:metric.define", + )?, + presentation, + FaultStage::Worker, + "tools/call:metric.define", + ) + } + "run.dimension.define" => { + let args = deserialize::<RunDimensionDefineToolArgs>(arguments)?; + let dimension = self + .store + .define_run_dimension(DefineRunDimensionRequest { + key: NonEmptyText::new(args.key) + .map_err(store_fault("tools/call:run.dimension.define"))?, + value_type: parse_field_value_type_name(&args.value_type) + .map_err(store_fault("tools/call:run.dimension.define"))?, + description: args + .description + .map(NonEmptyText::new) + .transpose() + .map_err(store_fault("tools/call:run.dimension.define"))?, + }) + .map_err(store_fault("tools/call:run.dimension.define"))?; + tool_success( + json_created_output( + "registered run dimension", + json!({ + "key": dimension.key, + "value_type": dimension.value_type.as_str(), + "description": dimension.description, + }), + "tools/call:run.dimension.define", + )?, + presentation, + FaultStage::Worker, + "tools/call:run.dimension.define", + ) + } + "run.dimension.list" => { + let items = self + .store + .list_run_dimensions() + .map_err(store_fault("tools/call:run.dimension.list"))?; + tool_success( + run_dimension_list_output(items.as_slice())?, + presentation, + FaultStage::Worker, + "tools/call:run.dimension.list", + ) + } + "metric.keys" => { + let args = deserialize::<MetricKeysToolArgs>(arguments)?; + let keys = self + .store + .list_metric_keys_filtered(MetricKeyQuery { + frontier_id: args + .frontier_id + .as_deref() + .map(crate::parse_frontier_id) + .transpose() + .map_err(store_fault("tools/call:metric.keys"))?, + source: args + .source + .as_deref() + .map(parse_metric_source_name) + .transpose() + .map_err(store_fault("tools/call:metric.keys"))?, + dimensions: coerce_tool_dimensions( + &self.store, + args.dimensions.unwrap_or_default(), + "tools/call:metric.keys", + )?, + }) + .map_err(store_fault("tools/call:metric.keys"))?; + tool_success( + metric_keys_output(keys.as_slice())?, + presentation, + FaultStage::Worker, + "tools/call:metric.keys", + ) + } + "metric.best" => { + let args = deserialize::<MetricBestToolArgs>(arguments)?; + let items = self + .store + .best_metrics(MetricBestQuery { + key: NonEmptyText::new(args.key) + .map_err(store_fault("tools/call:metric.best"))?, + frontier_id: args + .frontier_id + .as_deref() + .map(crate::parse_frontier_id) + .transpose() + .map_err(store_fault("tools/call:metric.best"))?, + source: args + .source + .as_deref() + .map(parse_metric_source_name) + .transpose() + .map_err(store_fault("tools/call:metric.best"))?, + dimensions: coerce_tool_dimensions( + &self.store, + args.dimensions.unwrap_or_default(), + "tools/call:metric.best", + )?, + order: args + .order + .as_deref() + .map(parse_metric_order_name) + .transpose() + .map_err(store_fault("tools/call:metric.best"))?, + limit: args.limit.unwrap_or(10), + }) + .map_err(store_fault("tools/call:metric.best"))?; + tool_success( + metric_best_output(items.as_slice())?, + presentation, + FaultStage::Worker, + "tools/call:metric.best", + ) + } + "metric.migrate" => { + let report = self + .store + .migrate_metric_plane() + .map_err(store_fault("tools/call:metric.migrate"))?; + tool_success( + json_created_output( + "normalized legacy metric plane", + json!(report), + "tools/call:metric.migrate", + )?, + presentation, + FaultStage::Worker, + "tools/call:metric.migrate", + ) + } "experiment.close" => { let args = deserialize::<ExperimentCloseToolArgs>(arguments)?; let frontier_id = crate::parse_frontier_id(&args.frontier_id) @@ -507,8 +746,11 @@ impl WorkerService { .map_err(store_fault("tools/call:experiment.close"))?, backend: parse_backend_name(&args.run.backend) .map_err(store_fault("tools/call:experiment.close"))?, - benchmark_suite: NonEmptyText::new(args.run.benchmark_suite) - .map_err(store_fault("tools/call:experiment.close"))?, + dimensions: coerce_tool_dimensions( + &self.store, + args.run.dimensions, + "tools/call:experiment.close", + )?, command: command_recipe_from_wire( args.run.command, self.store.project_root(), @@ -518,12 +760,12 @@ impl WorkerService { capture_code_snapshot(self.store.project_root()) .map_err(store_fault("tools/call:experiment.close"))?, ), - primary_metric: metric_observation_from_wire(args.primary_metric) + primary_metric: metric_value_from_wire(args.primary_metric) .map_err(store_fault("tools/call:experiment.close"))?, supporting_metrics: args .supporting_metrics .into_iter() - .map(metric_observation_from_wire) + .map(metric_value_from_wire) .collect::<Result<Vec<_>, _>>() .map_err(store_fault("tools/call:experiment.close"))?, note: FrontierNote { @@ -547,7 +789,7 @@ impl WorkerService { }) .map_err(store_fault("tools/call:experiment.close"))?; tool_success( - experiment_close_output(&receipt)?, + experiment_close_output(&self.store, &receipt)?, presentation, FaultStage::Worker, "tools/call:experiment.close", @@ -692,8 +934,8 @@ fn project_schema_output(schema: &ProjectSchema) -> Result<ToolOutput, FaultReco .collect::<Vec<_>>() .join(",") }, - format!("{:?}", field.presence).to_ascii_lowercase(), - format!("{:?}", field.role).to_ascii_lowercase(), + field.presence.as_str(), + field.role.as_str(), )); } if schema.fields.len() > 8 { @@ -709,6 +951,59 @@ fn project_schema_output(schema: &ProjectSchema) -> Result<ToolOutput, FaultReco ) } +fn schema_field_upsert_output( + schema: &ProjectSchema, + field: &ProjectFieldSpec, +) -> Result<ToolOutput, FaultRecord> { + let concise = json!({ + "schema": schema.schema_ref(), + "field": project_schema_field_value(field), + }); + detailed_tool_output( + &concise, + &concise, + format!( + "upserted schema field {}\nschema: {}\nclasses: {}\npresence: {}\nseverity: {}\nrole: {}\ninference: {}{}", + field.name, + schema_label(schema), + render_schema_node_classes(&field.node_classes), + field.presence.as_str(), + field.severity.as_str(), + field.role.as_str(), + field.inference_policy.as_str(), + field + .value_type + .map(|value_type| format!("\nvalue_type: {}", value_type.as_str())) + .unwrap_or_default(), + ), + None, + FaultStage::Worker, + "tools/call:schema.field.upsert", + ) +} + +fn schema_field_remove_output( + schema: &ProjectSchema, + removed_count: u64, +) -> Result<ToolOutput, FaultRecord> { + let concise = json!({ + "schema": schema.schema_ref(), + "removed_count": removed_count, + }); + detailed_tool_output( + &concise, + &concise, + format!( + "removed {} schema field definition(s)\nschema: {}", + removed_count, + schema_label(schema), + ), + None, + FaultStage::Worker, + "tools/call:schema.field.remove", + ) +} + fn tag_add_output(tag: &TagRecord) -> Result<ToolOutput, FaultRecord> { let concise = json!({ "name": tag.name, @@ -892,14 +1187,33 @@ fn node_read_output(node: &fidget_spinner_core::DagNode) -> Result<ToolOutput, F ); } if !node.payload.fields.is_empty() { - let _ = concise.insert( - "payload_field_count".to_owned(), - json!(node.payload.fields.len()), - ); - let _ = concise.insert( - "payload_preview".to_owned(), - payload_preview_value(&node.payload.fields), - ); + let filtered_fields = + filtered_payload_fields(node.class, &node.payload.fields).collect::<Vec<_>>(); + if !filtered_fields.is_empty() { + let _ = concise.insert( + "payload_field_count".to_owned(), + json!(filtered_fields.len()), + ); + if is_prose_node(node.class) { + let _ = concise.insert( + "payload_fields".to_owned(), + json!( + filtered_fields + .iter() + .take(6) + .map(|(name, _)| (*name).clone()) + .collect::<Vec<_>>() + ), + ); + } else { + let payload_preview = payload_preview_value(node.class, &node.payload.fields); + if let Value::Object(object) = &payload_preview + && !object.is_empty() + { + let _ = concise.insert("payload_preview".to_owned(), payload_preview); + } + } + } } if !node.diagnostics.items.is_empty() { let _ = concise.insert( @@ -930,7 +1244,7 @@ fn node_read_output(node: &fidget_spinner_core::DagNode) -> Result<ToolOutput, F if !node.tags.is_empty() { lines.push(format!("tags: {}", format_tags(&node.tags))); } - lines.extend(payload_preview_lines(&node.payload.fields)); + lines.extend(payload_preview_lines(node.class, &node.payload.fields)); if !node.diagnostics.items.is_empty() { lines.push(format!( "diagnostics: {}", @@ -972,7 +1286,10 @@ fn node_read_output(node: &fidget_spinner_core::DagNode) -> Result<ToolOutput, F ) } -fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, FaultRecord> { +fn experiment_close_output( + store: &ProjectStore, + receipt: &ExperimentReceipt, +) -> Result<ToolOutput, FaultRecord> { let concise = json!({ "experiment_id": receipt.experiment.id, "frontier_id": receipt.experiment.frontier_id, @@ -980,7 +1297,8 @@ fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, Fa "verdict": format!("{:?}", receipt.experiment.verdict).to_ascii_lowercase(), "run_id": receipt.run.run_id, "decision_node_id": receipt.decision_node.id, - "primary_metric": metric_value(&receipt.experiment.result.primary_metric), + "dimensions": run_dimensions_value(&receipt.experiment.result.dimensions), + "primary_metric": metric_value(store, &receipt.experiment.result.primary_metric)?, }); detailed_tool_output( &concise, @@ -997,7 +1315,11 @@ fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, Fa ), format!( "primary metric: {}", - metric_text(&receipt.experiment.result.primary_metric) + metric_text(store, &receipt.experiment.result.primary_metric)? + ), + format!( + "dimensions: {}", + render_dimension_kv(&receipt.experiment.result.dimensions) ), format!("run: {}", receipt.run.run_id), ] @@ -1008,7 +1330,181 @@ fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, Fa ) } -fn project_schema_field_value(field: &fidget_spinner_core::ProjectFieldSpec) -> Value { +fn metric_keys_output(keys: &[MetricKeySummary]) -> Result<ToolOutput, FaultRecord> { + let concise = keys + .iter() + .map(|key| { + json!({ + "key": key.key, + "source": key.source.as_str(), + "experiment_count": key.experiment_count, + "unit": key.unit.map(metric_unit_name), + "objective": key.objective.map(metric_objective_name), + "description": key.description, + "requires_order": key.requires_order, + }) + }) + .collect::<Vec<_>>(); + let mut lines = vec![format!("{} metric key(s)", keys.len())]; + lines.extend(keys.iter().map(|key| { + let mut line = format!( + "{} [{}] experiments={}", + key.key, + key.source.as_str(), + key.experiment_count + ); + if let Some(unit) = key.unit { + line.push_str(format!(" unit={}", metric_unit_name(unit)).as_str()); + } + if let Some(objective) = key.objective { + line.push_str(format!(" objective={}", metric_objective_name(objective)).as_str()); + } + if let Some(description) = key.description.as_ref() { + line.push_str(format!(" | {description}").as_str()); + } + if key.requires_order { + line.push_str(" order=required"); + } + line + })); + detailed_tool_output( + &concise, + &keys, + lines.join("\n"), + None, + FaultStage::Worker, + "tools/call:metric.keys", + ) +} + +fn metric_best_output( + items: &[fidget_spinner_store_sqlite::MetricBestEntry], +) -> Result<ToolOutput, FaultRecord> { + let concise = items + .iter() + .enumerate() + .map(|(index, item)| { + json!({ + "rank": index + 1, + "key": item.key, + "source": item.source.as_str(), + "value": item.value, + "order": item.order.as_str(), + "experiment_id": item.experiment_id, + "frontier_id": item.frontier_id, + "change_node_id": item.change_node_id, + "change_title": item.change_title, + "verdict": metric_verdict_name(item.verdict), + "candidate_checkpoint_id": item.candidate_checkpoint_id, + "candidate_commit_hash": item.candidate_commit_hash, + "run_id": item.run_id, + "unit": item.unit.map(metric_unit_name), + "objective": item.objective.map(metric_objective_name), + "dimensions": run_dimensions_value(&item.dimensions), + }) + }) + .collect::<Vec<_>>(); + let mut lines = vec![format!("{} ranked experiment(s)", items.len())]; + lines.extend(items.iter().enumerate().map(|(index, item)| { + format!( + "{}. {}={} [{}] {} | verdict={} | commit={} | checkpoint={}", + index + 1, + item.key, + item.value, + item.source.as_str(), + item.change_title, + metric_verdict_name(item.verdict), + item.candidate_commit_hash, + item.candidate_checkpoint_id, + ) + })); + lines.extend( + items + .iter() + .map(|item| format!(" dims: {}", render_dimension_kv(&item.dimensions))), + ); + detailed_tool_output( + &concise, + &items, + lines.join("\n"), + None, + FaultStage::Worker, + "tools/call:metric.best", + ) +} + +fn run_dimension_list_output( + items: &[fidget_spinner_store_sqlite::RunDimensionSummary], +) -> Result<ToolOutput, FaultRecord> { + let concise = items + .iter() + .map(|item| { + json!({ + "key": item.key, + "value_type": item.value_type.as_str(), + "description": item.description, + "observed_run_count": item.observed_run_count, + "distinct_value_count": item.distinct_value_count, + "sample_values": item.sample_values, + }) + }) + .collect::<Vec<_>>(); + let mut lines = vec![format!("{} run dimension(s)", items.len())]; + lines.extend(items.iter().map(|item| { + let mut line = format!( + "{} [{}] runs={} distinct={}", + item.key, + item.value_type.as_str(), + item.observed_run_count, + item.distinct_value_count + ); + if let Some(description) = item.description.as_ref() { + line.push_str(format!(" | {description}").as_str()); + } + if !item.sample_values.is_empty() { + line.push_str( + format!( + " | samples={}", + item.sample_values + .iter() + .map(value_summary) + .collect::<Vec<_>>() + .join(", ") + ) + .as_str(), + ); + } + line + })); + detailed_tool_output( + &concise, + &items, + lines.join("\n"), + None, + FaultStage::Worker, + "tools/call:run.dimension.list", + ) +} + +fn json_created_output( + headline: &str, + structured: Value, + operation: &'static str, +) -> Result<ToolOutput, FaultRecord> { + detailed_tool_output( + &structured, + &structured, + format!( + "{headline}\n{}", + crate::to_pretty_json(&structured).map_err(store_fault(operation))? + ), + None, + FaultStage::Worker, + operation, + ) +} + +fn project_schema_field_value(field: &ProjectFieldSpec) -> Value { let mut value = Map::new(); let _ = value.insert("name".to_owned(), json!(field.name)); if !field.node_classes.is_empty() { @@ -1023,21 +1519,12 @@ fn project_schema_field_value(field: &fidget_spinner_core::ProjectFieldSpec) -> ), ); } - let _ = value.insert( - "presence".to_owned(), - json!(format!("{:?}", field.presence).to_ascii_lowercase()), - ); - let _ = value.insert( - "severity".to_owned(), - json!(format!("{:?}", field.severity).to_ascii_lowercase()), - ); - let _ = value.insert( - "role".to_owned(), - json!(format!("{:?}", field.role).to_ascii_lowercase()), - ); + let _ = value.insert("presence".to_owned(), json!(field.presence.as_str())); + let _ = value.insert("severity".to_owned(), json!(field.severity.as_str())); + let _ = value.insert("role".to_owned(), json!(field.role.as_str())); let _ = value.insert( "inference_policy".to_owned(), - json!(format!("{:?}", field.inference_policy).to_ascii_lowercase()), + json!(field.inference_policy.as_str()), ); if let Some(value_type) = field.value_type { let _ = value.insert("value_type".to_owned(), json!(value_type.as_str())); @@ -1045,6 +1532,17 @@ fn project_schema_field_value(field: &fidget_spinner_core::ProjectFieldSpec) -> Value::Object(value) } +fn render_schema_node_classes(node_classes: &BTreeSet<NodeClass>) -> String { + if node_classes.is_empty() { + return "any".to_owned(); + } + node_classes + .iter() + .map(ToString::to_string) + .collect::<Vec<_>>() + .join(", ") +} + fn frontier_projection_summary_value(projection: &FrontierProjection) -> Value { json!({ "frontier_id": projection.frontier.id, @@ -1211,17 +1709,17 @@ fn diagnostic_tally(diagnostics: &fidget_spinner_core::NodeDiagnostics) -> Diagn .fold(DiagnosticTally::default(), |mut tally, item| { tally.total += 1; match item.severity { - fidget_spinner_core::DiagnosticSeverity::Error => tally.errors += 1, - fidget_spinner_core::DiagnosticSeverity::Warning => tally.warnings += 1, - fidget_spinner_core::DiagnosticSeverity::Info => tally.infos += 1, + DiagnosticSeverity::Error => tally.errors += 1, + DiagnosticSeverity::Warning => tally.warnings += 1, + DiagnosticSeverity::Info => tally.infos += 1, } tally }) } -fn payload_preview_value(fields: &Map<String, Value>) -> Value { +fn payload_preview_value(class: NodeClass, fields: &Map<String, Value>) -> Value { let mut preview = Map::new(); - for (index, (name, value)) in fields.iter().enumerate() { + for (index, (name, value)) in filtered_payload_fields(class, fields).enumerate() { if index == 6 { let _ = preview.insert( "...".to_owned(), @@ -1234,14 +1732,33 @@ fn payload_preview_value(fields: &Map<String, Value>) -> Value { Value::Object(preview) } -fn payload_preview_lines(fields: &Map<String, Value>) -> Vec<String> { - if fields.is_empty() { +fn payload_preview_lines(class: NodeClass, fields: &Map<String, Value>) -> Vec<String> { + let filtered = filtered_payload_fields(class, fields).collect::<Vec<_>>(); + if filtered.is_empty() { return Vec::new(); } - let mut lines = vec![format!("payload fields: {}", fields.len())]; - for (index, (name, value)) in fields.iter().enumerate() { + if is_prose_node(class) { + let preview_names = filtered + .iter() + .take(6) + .map(|(name, _)| (*name).clone()) + .collect::<Vec<_>>(); + let mut lines = vec![format!("payload fields: {}", preview_names.join(", "))]; + if filtered.len() > preview_names.len() { + lines.push(format!( + "payload fields: +{} more field(s)", + filtered.len() - preview_names.len() + )); + } + return lines; + } + let mut lines = vec![format!("payload fields: {}", filtered.len())]; + for (index, (name, value)) in filtered.iter().enumerate() { if index == 6 { - lines.push(format!("payload: +{} more field(s)", fields.len() - index)); + lines.push(format!( + "payload: +{} more field(s)", + filtered.len() - index + )); break; } lines.push(format!( @@ -1253,10 +1770,19 @@ fn payload_preview_lines(fields: &Map<String, Value>) -> Vec<String> { lines } +fn filtered_payload_fields( + class: NodeClass, + fields: &Map<String, Value>, +) -> impl Iterator<Item = (&String, &Value)> + '_ { + fields.iter().filter(move |(name, _)| { + !matches!(class, NodeClass::Note | NodeClass::Research) || name.as_str() != "body" + }) +} + fn payload_value_preview(value: &Value) -> Value { match value { Value::Null | Value::Bool(_) | Value::Number(_) => value.clone(), - Value::String(text) => Value::String(libmcp::collapse_inline_whitespace(text)), + Value::String(text) => Value::String(truncated_inline_preview(text, 96)), Value::Array(items) => { let preview = items .iter() @@ -1290,25 +1816,89 @@ fn payload_value_preview(value: &Value) -> Value { } } -fn metric_value(metric: &MetricObservation) -> Value { - json!({ - "key": metric.metric_key, +fn is_prose_node(class: NodeClass) -> bool { + matches!(class, NodeClass::Note | NodeClass::Research) +} + +fn truncated_inline_preview(text: &str, limit: usize) -> String { + let collapsed = libmcp::collapse_inline_whitespace(text); + let truncated = libmcp::render::truncate_chars(&collapsed, Some(limit)); + if truncated.truncated { + format!("{}...", truncated.text) + } else { + truncated.text + } +} + +fn metric_value(store: &ProjectStore, metric: &MetricValue) -> Result<Value, FaultRecord> { + let definition = metric_definition(store, &metric.key)?; + Ok(json!({ + "key": metric.key, "value": metric.value, - "unit": format!("{:?}", metric.unit).to_ascii_lowercase(), - "objective": format!("{:?}", metric.objective).to_ascii_lowercase(), - }) + "unit": metric_unit_name(definition.unit), + "objective": metric_objective_name(definition.objective), + })) } -fn metric_text(metric: &MetricObservation) -> String { - format!( +fn metric_text(store: &ProjectStore, metric: &MetricValue) -> Result<String, FaultRecord> { + let definition = metric_definition(store, &metric.key)?; + Ok(format!( "{}={} {} ({})", - metric.metric_key, + metric.key, metric.value, - format!("{:?}", metric.unit).to_ascii_lowercase(), - format!("{:?}", metric.objective).to_ascii_lowercase(), + metric_unit_name(definition.unit), + metric_objective_name(definition.objective), + )) +} + +fn metric_unit_name(unit: MetricUnit) -> &'static str { + match unit { + MetricUnit::Seconds => "seconds", + MetricUnit::Bytes => "bytes", + MetricUnit::Count => "count", + MetricUnit::Ratio => "ratio", + MetricUnit::Custom => "custom", + } +} + +fn metric_objective_name(objective: fidget_spinner_core::OptimizationObjective) -> &'static str { + match objective { + fidget_spinner_core::OptimizationObjective::Minimize => "minimize", + fidget_spinner_core::OptimizationObjective::Maximize => "maximize", + fidget_spinner_core::OptimizationObjective::Target => "target", + } +} + +fn metric_verdict_name(verdict: FrontierVerdict) -> &'static str { + match verdict { + FrontierVerdict::PromoteToChampion => "promote_to_champion", + FrontierVerdict::KeepOnFrontier => "keep_on_frontier", + FrontierVerdict::RevertToChampion => "revert_to_champion", + FrontierVerdict::ArchiveDeadEnd => "archive_dead_end", + FrontierVerdict::NeedsMoreEvidence => "needs_more_evidence", + } +} + +fn run_dimensions_value(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> Value { + Value::Object( + dimensions + .iter() + .map(|(key, value)| (key.to_string(), value.as_json())) + .collect::<Map<String, Value>>(), ) } +fn render_dimension_kv(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> String { + if dimensions.is_empty() { + return "none".to_owned(); + } + dimensions + .iter() + .map(|(key, value)| format!("{key}={}", value_summary(&value.as_json()))) + .collect::<Vec<_>>() + .join(", ") +} + fn format_tags(tags: &BTreeSet<TagName>) -> String { tags.iter() .map(ToString::to_string) @@ -1360,6 +1950,12 @@ fn classify_fault_kind(message: &str) -> FaultKind { || message.contains("empty") || message.contains("already exists") || message.contains("require an explicit tag list") + || message.contains("requires a non-empty summary") + || message.contains("requires a non-empty string payload field `body`") + || message.contains("requires an explicit order") + || message.contains("is ambiguous across sources") + || message.contains("has conflicting semantics") + || message.contains("conflicts with existing definition") { FaultKind::InvalidInput } else { @@ -1414,17 +2010,44 @@ fn metric_spec_from_wire(raw: WireMetricSpec) -> Result<MetricSpec, StoreError> }) } -fn metric_observation_from_wire( - raw: WireMetricObservation, -) -> Result<MetricObservation, StoreError> { - Ok(MetricObservation { - metric_key: NonEmptyText::new(raw.key)?, - unit: parse_metric_unit_name(&raw.unit)?, - objective: crate::parse_optimization_objective(&raw.objective)?, +fn metric_value_from_wire(raw: WireMetricValue) -> Result<MetricValue, StoreError> { + Ok(MetricValue { + key: NonEmptyText::new(raw.key)?, value: raw.value, }) } +fn metric_definition(store: &ProjectStore, key: &NonEmptyText) -> Result<MetricSpec, FaultRecord> { + store + .list_metric_definitions() + .map_err(store_fault("tools/call:experiment.close"))? + .into_iter() + .find(|definition| definition.key == *key) + .map(|definition| MetricSpec { + metric_key: definition.key, + unit: definition.unit, + objective: definition.objective, + }) + .ok_or_else(|| { + FaultRecord::new( + FaultKind::InvalidInput, + FaultStage::Store, + "tools/call:experiment.close", + format!("metric `{key}` is not registered"), + ) + }) +} + +fn coerce_tool_dimensions( + store: &ProjectStore, + raw_dimensions: BTreeMap<String, Value>, + operation: &'static str, +) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, FaultRecord> { + store + .coerce_run_dimensions(raw_dimensions) + .map_err(store_fault(operation)) +} + fn command_recipe_from_wire( raw: WireRunCommand, project_root: &Utf8Path, @@ -1465,6 +2088,91 @@ fn parse_metric_unit_name(raw: &str) -> Result<MetricUnit, StoreError> { crate::parse_metric_unit(raw) } +fn parse_metric_source_name(raw: &str) -> Result<MetricFieldSource, StoreError> { + match raw { + "run_metric" => Ok(MetricFieldSource::RunMetric), + "change_payload" => Ok(MetricFieldSource::ChangePayload), + "run_payload" => Ok(MetricFieldSource::RunPayload), + "analysis_payload" => Ok(MetricFieldSource::AnalysisPayload), + "decision_payload" => Ok(MetricFieldSource::DecisionPayload), + other => Err(StoreError::Json(serde_json::Error::io( + std::io::Error::new( + std::io::ErrorKind::InvalidInput, + format!("unknown metric source `{other}`"), + ), + ))), + } +} + +fn parse_metric_order_name(raw: &str) -> Result<MetricRankOrder, StoreError> { + match raw { + "asc" => Ok(MetricRankOrder::Asc), + "desc" => Ok(MetricRankOrder::Desc), + other => Err(StoreError::Json(serde_json::Error::io( + std::io::Error::new( + std::io::ErrorKind::InvalidInput, + format!("unknown metric order `{other}`"), + ), + ))), + } +} + +fn parse_field_value_type_name(raw: &str) -> Result<FieldValueType, StoreError> { + match raw { + "string" => Ok(FieldValueType::String), + "numeric" => Ok(FieldValueType::Numeric), + "boolean" => Ok(FieldValueType::Boolean), + "timestamp" => Ok(FieldValueType::Timestamp), + other => Err(crate::invalid_input(format!( + "unknown field value type `{other}`" + ))), + } +} + +fn parse_diagnostic_severity_name(raw: &str) -> Result<DiagnosticSeverity, StoreError> { + match raw { + "error" => Ok(DiagnosticSeverity::Error), + "warning" => Ok(DiagnosticSeverity::Warning), + "info" => Ok(DiagnosticSeverity::Info), + other => Err(crate::invalid_input(format!( + "unknown diagnostic severity `{other}`" + ))), + } +} + +fn parse_field_presence_name(raw: &str) -> Result<FieldPresence, StoreError> { + match raw { + "required" => Ok(FieldPresence::Required), + "recommended" => Ok(FieldPresence::Recommended), + "optional" => Ok(FieldPresence::Optional), + other => Err(crate::invalid_input(format!( + "unknown field presence `{other}`" + ))), + } +} + +fn parse_field_role_name(raw: &str) -> Result<FieldRole, StoreError> { + match raw { + "index" => Ok(FieldRole::Index), + "projection_gate" => Ok(FieldRole::ProjectionGate), + "render_only" => Ok(FieldRole::RenderOnly), + "opaque" => Ok(FieldRole::Opaque), + other => Err(crate::invalid_input(format!( + "unknown field role `{other}`" + ))), + } +} + +fn parse_inference_policy_name(raw: &str) -> Result<InferencePolicy, StoreError> { + match raw { + "manual_only" => Ok(InferencePolicy::ManualOnly), + "model_may_infer" => Ok(InferencePolicy::ModelMayInfer), + other => Err(crate::invalid_input(format!( + "unknown inference policy `{other}`" + ))), + } +} + fn parse_backend_name(raw: &str) -> Result<ExecutionBackend, StoreError> { match raw { "local_process" => Ok(ExecutionBackend::LocalProcess), @@ -1574,6 +2282,7 @@ struct NodeArchiveToolArgs { struct QuickNoteToolArgs { frontier_id: Option<String>, title: String, + summary: String, body: String, tags: Vec<String>, #[serde(default)] @@ -1586,24 +2295,75 @@ struct QuickNoteToolArgs { struct ResearchRecordToolArgs { frontier_id: Option<String>, title: String, - summary: Option<String>, + summary: String, body: String, #[serde(default)] + tags: Vec<String>, + #[serde(default)] annotations: Vec<WireAnnotation>, #[serde(default)] parents: Vec<String>, } #[derive(Debug, Deserialize)] +struct SchemaFieldUpsertToolArgs { + name: String, + node_classes: Option<Vec<String>>, + presence: String, + severity: String, + role: String, + inference_policy: String, + value_type: Option<String>, +} + +#[derive(Debug, Deserialize)] +struct SchemaFieldRemoveToolArgs { + name: String, + node_classes: Option<Vec<String>>, +} + +#[derive(Debug, Deserialize)] +struct MetricDefineToolArgs { + key: String, + unit: String, + objective: String, + description: Option<String>, +} + +#[derive(Debug, Deserialize)] +struct RunDimensionDefineToolArgs { + key: String, + value_type: String, + description: Option<String>, +} + +#[derive(Debug, Deserialize, Default)] +struct MetricKeysToolArgs { + frontier_id: Option<String>, + source: Option<String>, + dimensions: Option<BTreeMap<String, Value>>, +} + +#[derive(Debug, Deserialize)] +struct MetricBestToolArgs { + key: String, + frontier_id: Option<String>, + source: Option<String>, + dimensions: Option<BTreeMap<String, Value>>, + order: Option<String>, + limit: Option<u32>, +} + +#[derive(Debug, Deserialize)] struct ExperimentCloseToolArgs { frontier_id: String, base_checkpoint_id: String, change_node_id: String, candidate_summary: String, run: WireRun, - primary_metric: WireMetricObservation, + primary_metric: WireMetricValue, #[serde(default)] - supporting_metrics: Vec<WireMetricObservation>, + supporting_metrics: Vec<WireMetricValue>, note: WireFrontierNote, verdict: String, decision_title: String, @@ -1627,10 +2387,8 @@ struct WireMetricSpec { } #[derive(Debug, Deserialize)] -struct WireMetricObservation { +struct WireMetricValue { key: String, - unit: String, - objective: String, value: f64, } @@ -1639,7 +2397,8 @@ struct WireRun { title: String, summary: Option<String>, backend: String, - benchmark_suite: String, + #[serde(default)] + dimensions: BTreeMap<String, Value>, command: WireRunCommand, } diff --git a/crates/fidget-spinner-cli/src/mcp/telemetry.rs b/crates/fidget-spinner-cli/src/mcp/telemetry.rs index 7206f76..93fbe71 100644 --- a/crates/fidget-spinner-cli/src/mcp/telemetry.rs +++ b/crates/fidget-spinner-cli/src/mcp/telemetry.rs @@ -36,6 +36,7 @@ impl ServerTelemetry { pub fn record_success(&mut self, operation: &str, latency_ms: u128) { self.successes += 1; + self.last_fault = None; let entry = self.operations.entry(operation.to_owned()).or_default(); entry.successes += 1; entry.last_latency_ms = Some(latency_ms); diff --git a/crates/fidget-spinner-cli/tests/mcp_hardening.rs b/crates/fidget-spinner-cli/tests/mcp_hardening.rs index 6c81d7f..f076d74 100644 --- a/crates/fidget-spinner-cli/tests/mcp_hardening.rs +++ b/crates/fidget-spinner-cli/tests/mcp_hardening.rs @@ -20,7 +20,10 @@ use uuid as _; type TestResult<T = ()> = Result<T, Box<dyn std::error::Error>>; -fn must<T, E: std::fmt::Display>(result: Result<T, E>, context: &str) -> TestResult<T> { +fn must<T, E: std::fmt::Display, C: std::fmt::Display>( + result: Result<T, E>, + context: C, +) -> TestResult<T> { result.map_err(|error| io::Error::other(format!("{context}: {error}")).into()) } @@ -54,6 +57,48 @@ fn init_project(root: &Utf8PathBuf) -> TestResult { Ok(()) } +fn run_command(root: &Utf8PathBuf, program: &str, args: &[&str]) -> TestResult<String> { + let output = must( + Command::new(program) + .current_dir(root.as_std_path()) + .args(args) + .output(), + format!("{program} spawn"), + )?; + if !output.status.success() { + return Err(io::Error::other(format!( + "{program} {:?} failed: {}", + args, + String::from_utf8_lossy(&output.stderr) + )) + .into()); + } + Ok(String::from_utf8_lossy(&output.stdout).trim().to_owned()) +} + +fn run_git(root: &Utf8PathBuf, args: &[&str]) -> TestResult<String> { + run_command(root, "git", args) +} + +fn init_git_project(root: &Utf8PathBuf) -> TestResult<String> { + let _ = run_git(root, &["init", "-b", "main"])?; + let _ = run_git(root, &["config", "user.name", "main"])?; + let _ = run_git(root, &["config", "user.email", "main@swarm.moe"])?; + let _ = run_git(root, &["add", "-A"])?; + let _ = run_git(root, &["commit", "-m", "initial state"])?; + run_git(root, &["rev-parse", "HEAD"]) +} + +fn commit_project_state(root: &Utf8PathBuf, marker: &str, message: &str) -> TestResult<String> { + must( + fs::write(root.join(marker).as_std_path(), message), + format!("write marker {marker}"), + )?; + let _ = run_git(root, &["add", "-A"])?; + let _ = run_git(root, &["commit", "-m", message])?; + run_git(root, &["rev-parse", "HEAD"]) +} + fn binary_path() -> PathBuf { PathBuf::from(env!("CARGO_BIN_EXE_fidget-spinner-cli")) } @@ -172,6 +217,10 @@ fn tool_text(response: &Value) -> Option<&str> { .and_then(|entry| entry["text"].as_str()) } +fn fault_message(response: &Value) -> Option<&str> { + response["result"]["structuredContent"]["message"].as_str() +} + #[test] fn cold_start_exposes_health_and_telemetry() -> TestResult { let project_root = temp_project_root("cold_start")?; @@ -361,6 +410,7 @@ fn side_effecting_request_is_not_replayed_after_worker_crash() -> TestResult { "research.record", json!({ "title": "should not duplicate", + "summary": "dedupe check", "body": "host crash before worker execution", }), )?; @@ -463,6 +513,57 @@ fn bind_rejects_nonempty_uninitialized_root() -> TestResult { } #[test] +fn successful_bind_clears_stale_fault_from_health() -> TestResult { + let bad_root = temp_project_root("bind_fault_bad")?; + must( + fs::write(bad_root.join("README.txt").as_std_path(), "occupied"), + "seed bad bind root", + )?; + let good_root = temp_project_root("bind_fault_good")?; + init_project(&good_root)?; + + let mut harness = McpHarness::spawn(None, &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + + let failed_bind = harness.bind_project(301, &bad_root)?; + assert_eq!(failed_bind["result"]["isError"].as_bool(), Some(true)); + + let failed_health = harness.call_tool(302, "system.health", json!({ "detail": "full" }))?; + assert_eq!( + tool_content(&failed_health)["last_fault"]["operation"].as_str(), + Some("tools/call:project.bind") + ); + + let good_bind = harness.bind_project(303, &good_root)?; + assert_eq!(good_bind["result"]["isError"].as_bool(), Some(false)); + + let recovered_health = harness.call_tool(304, "system.health", json!({}))?; + assert_eq!(recovered_health["result"]["isError"].as_bool(), Some(false)); + assert!(tool_content(&recovered_health).get("last_fault").is_none()); + assert!(!must_some(tool_text(&recovered_health), "recovered health text")?.contains("fault:")); + + let recovered_health_full = + harness.call_tool(306, "system.health", json!({ "detail": "full" }))?; + assert_eq!( + tool_content(&recovered_health_full)["last_fault"], + Value::Null, + ); + + let recovered_telemetry = harness.call_tool(305, "system.telemetry", json!({}))?; + assert_eq!( + recovered_telemetry["result"]["isError"].as_bool(), + Some(false) + ); + assert_eq!( + tool_content(&recovered_telemetry)["errors"].as_u64(), + Some(1) + ); + assert!(tool_content(&recovered_telemetry)["last_fault"].is_null()); + Ok(()) +} + +#[test] fn bind_retargets_writes_to_sibling_project_root() -> TestResult { let spinner_root = temp_project_root("spinner_root")?; let libgrid_root = temp_project_root("libgrid_root")?; @@ -502,6 +603,7 @@ fn bind_retargets_writes_to_sibling_project_root() -> TestResult { "note.quick", json!({ "title": "libgrid dogfood note", + "summary": "rebind summary", "body": "rebind should redirect writes", "tags": [], }), @@ -545,6 +647,7 @@ fn tag_registry_drives_note_creation_and_lookup() -> TestResult { "note.quick", json!({ "title": "untagged", + "summary": "should fail without explicit tags", "body": "should fail", }), )?; @@ -570,6 +673,7 @@ fn tag_registry_drives_note_creation_and_lookup() -> TestResult { "note.quick", json!({ "title": "tagged note", + "summary": "tagged lookup summary", "body": "tagged lookup should work", "tags": ["dogfood/mcp"], }), @@ -582,3 +686,937 @@ fn tag_registry_drives_note_creation_and_lookup() -> TestResult { assert_eq!(nodes[0]["tags"][0].as_str(), Some("dogfood/mcp")); Ok(()) } + +#[test] +fn research_record_accepts_tags_and_filtering() -> TestResult { + let project_root = temp_project_root("research_tags")?; + init_project(&project_root)?; + + let mut harness = McpHarness::spawn(None, &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + let bind = harness.bind_project(451, &project_root)?; + assert_eq!(bind["result"]["isError"].as_bool(), Some(false)); + + let tag = harness.call_tool( + 452, + "tag.add", + json!({ + "name": "campaign/libgrid", + "description": "libgrid migration campaign", + }), + )?; + assert_eq!(tag["result"]["isError"].as_bool(), Some(false)); + + let research = harness.call_tool( + 453, + "research.record", + json!({ + "title": "ingest tranche", + "summary": "Import the next libgrid tranche.", + "body": "Full import notes live here.", + "tags": ["campaign/libgrid"], + }), + )?; + assert_eq!(research["result"]["isError"].as_bool(), Some(false)); + + let filtered = harness.call_tool(454, "node.list", json!({"tags": ["campaign/libgrid"]}))?; + let nodes = must_some( + tool_content(&filtered).as_array(), + "filtered research nodes", + )?; + assert_eq!(nodes.len(), 1); + assert_eq!(nodes[0]["class"].as_str(), Some("research")); + assert_eq!(nodes[0]["tags"][0].as_str(), Some("campaign/libgrid")); + Ok(()) +} + +#[test] +fn prose_tools_reject_invalid_shapes_over_mcp() -> TestResult { + let project_root = temp_project_root("prose_invalid")?; + init_project(&project_root)?; + + let mut harness = McpHarness::spawn(None, &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + let bind = harness.bind_project(46, &project_root)?; + assert_eq!(bind["result"]["isError"].as_bool(), Some(false)); + + let missing_note_summary = harness.call_tool( + 47, + "note.quick", + json!({ + "title": "untagged", + "body": "body only", + "tags": [], + }), + )?; + assert_eq!( + missing_note_summary["result"]["isError"].as_bool(), + Some(true) + ); + assert!( + fault_message(&missing_note_summary) + .is_some_and(|message| message.contains("summary") || message.contains("missing field")) + ); + + let missing_research_summary = harness.call_tool( + 48, + "research.record", + json!({ + "title": "research only", + "body": "body only", + }), + )?; + assert_eq!( + missing_research_summary["result"]["isError"].as_bool(), + Some(true) + ); + assert!( + fault_message(&missing_research_summary) + .is_some_and(|message| message.contains("summary") || message.contains("missing field")) + ); + + let note_without_body = harness.call_tool( + 49, + "node.create", + json!({ + "class": "note", + "title": "missing body", + "summary": "triage layer", + "tags": [], + "payload": {}, + }), + )?; + assert_eq!(note_without_body["result"]["isError"].as_bool(), Some(true)); + assert!( + fault_message(¬e_without_body) + .is_some_and(|message| message.contains("payload field `body`")) + ); + + let research_without_summary = harness.call_tool( + 50, + "node.create", + json!({ + "class": "research", + "title": "missing summary", + "payload": { "body": "full research body" }, + }), + )?; + assert_eq!( + research_without_summary["result"]["isError"].as_bool(), + Some(true) + ); + assert!( + fault_message(&research_without_summary) + .is_some_and(|message| message.contains("non-empty summary")) + ); + Ok(()) +} + +#[test] +fn concise_note_reads_do_not_leak_body_text() -> TestResult { + let project_root = temp_project_root("concise_note_read")?; + init_project(&project_root)?; + + let mut harness = McpHarness::spawn(None, &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + let bind = harness.bind_project(50, &project_root)?; + assert_eq!(bind["result"]["isError"].as_bool(), Some(false)); + + let note = harness.call_tool( + 51, + "note.quick", + json!({ + "title": "tagged note", + "summary": "triage layer", + "body": "full note body should stay out of concise reads", + "tags": [], + }), + )?; + assert_eq!(note["result"]["isError"].as_bool(), Some(false)); + let node_id = must_some(tool_content(¬e)["id"].as_str(), "created note id")?.to_owned(); + + let concise = harness.call_tool(52, "node.read", json!({ "node_id": node_id }))?; + let concise_structured = tool_content(&concise); + assert_eq!(concise_structured["summary"].as_str(), Some("triage layer")); + assert!(concise_structured["payload_preview"].get("body").is_none()); + assert!( + !must_some(tool_text(&concise), "concise note.read text")? + .contains("full note body should stay out of concise reads") + ); + + let full = harness.call_tool( + 53, + "node.read", + json!({ "node_id": node_id, "detail": "full" }), + )?; + assert_eq!( + tool_content(&full)["payload"]["fields"]["body"].as_str(), + Some("full note body should stay out of concise reads") + ); + Ok(()) +} + +#[test] +fn concise_prose_reads_only_surface_payload_field_names() -> TestResult { + let project_root = temp_project_root("concise_prose_field_names")?; + init_project(&project_root)?; + + let mut harness = McpHarness::spawn(None, &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + let bind = harness.bind_project(531, &project_root)?; + assert_eq!(bind["result"]["isError"].as_bool(), Some(false)); + + let research = harness.call_tool( + 532, + "node.create", + json!({ + "class": "research", + "title": "rich import", + "summary": "triage layer only", + "payload": { + "body": "Body stays out of concise output.", + "source_excerpt": "This imported excerpt is intentionally long and should never reappear in concise node reads as a value preview.", + "verbatim_snippet": "Another long snippet that belongs in full payload inspection only, not in triage surfaces." + } + }), + )?; + assert_eq!(research["result"]["isError"].as_bool(), Some(false)); + let node_id = must_some( + tool_content(&research)["id"].as_str(), + "created research id", + )? + .to_owned(); + + let concise = harness.call_tool(533, "node.read", json!({ "node_id": node_id }))?; + let concise_structured = tool_content(&concise); + assert_eq!(concise_structured["payload_field_count"].as_u64(), Some(2)); + let payload_fields = must_some( + concise_structured["payload_fields"].as_array(), + "concise prose payload fields", + )?; + assert!( + payload_fields + .iter() + .any(|field| field.as_str() == Some("source_excerpt")) + ); + assert!(concise_structured.get("payload_preview").is_none()); + let concise_text = must_some(tool_text(&concise), "concise prose read text")?; + assert!(!concise_text.contains("This imported excerpt is intentionally long")); + assert!(concise_text.contains("payload fields: source_excerpt, verbatim_snippet")); + Ok(()) +} + +#[test] +fn node_list_does_not_enumerate_full_prose_bodies() -> TestResult { + let project_root = temp_project_root("node_list_no_body_leak")?; + init_project(&project_root)?; + + let mut harness = McpHarness::spawn(None, &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + let bind = harness.bind_project(54, &project_root)?; + assert_eq!(bind["result"]["isError"].as_bool(), Some(false)); + + let note = harness.call_tool( + 55, + "note.quick", + json!({ + "title": "tagged note", + "summary": "triage summary", + "body": "full note body should never appear in list-like surfaces", + "tags": [], + }), + )?; + assert_eq!(note["result"]["isError"].as_bool(), Some(false)); + + let listed = harness.call_tool(56, "node.list", json!({ "class": "note" }))?; + let listed_rows = must_some(tool_content(&listed).as_array(), "listed note rows")?; + assert_eq!(listed_rows.len(), 1); + assert_eq!(listed_rows[0]["summary"].as_str(), Some("triage summary")); + assert!(listed_rows[0].get("body").is_none()); + assert!( + !must_some(tool_text(&listed), "node.list text")? + .contains("full note body should never appear in list-like surfaces") + ); + Ok(()) +} + +#[test] +fn metric_tools_are_listed_for_discovery() -> TestResult { + let project_root = temp_project_root("metric_tool_list")?; + init_project(&project_root)?; + + let mut harness = McpHarness::spawn(Some(&project_root), &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + let tools = harness.tools_list()?; + let names = must_some(tools["result"]["tools"].as_array(), "tool list")? + .iter() + .filter_map(|tool| tool["name"].as_str()) + .collect::<Vec<_>>(); + assert!(names.contains(&"metric.define")); + assert!(names.contains(&"metric.keys")); + assert!(names.contains(&"metric.best")); + assert!(names.contains(&"metric.migrate")); + assert!(names.contains(&"run.dimension.define")); + assert!(names.contains(&"run.dimension.list")); + assert!(names.contains(&"schema.field.upsert")); + assert!(names.contains(&"schema.field.remove")); + Ok(()) +} + +#[test] +fn schema_field_tools_mutate_project_schema() -> TestResult { + let project_root = temp_project_root("schema_field_tools")?; + init_project(&project_root)?; + + let mut harness = McpHarness::spawn(Some(&project_root), &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + + let upsert = harness.call_tool( + 861, + "schema.field.upsert", + json!({ + "name": "scenario", + "node_classes": ["change", "analysis"], + "presence": "recommended", + "severity": "warning", + "role": "projection_gate", + "inference_policy": "manual_only", + "value_type": "string" + }), + )?; + assert_eq!(upsert["result"]["isError"].as_bool(), Some(false)); + assert_eq!( + tool_content(&upsert)["field"]["name"].as_str(), + Some("scenario") + ); + assert_eq!( + tool_content(&upsert)["field"]["node_classes"], + json!(["change", "analysis"]) + ); + + let schema = harness.call_tool(862, "project.schema", json!({ "detail": "full" }))?; + assert_eq!(schema["result"]["isError"].as_bool(), Some(false)); + let fields = must_some(tool_content(&schema)["fields"].as_array(), "schema fields")?; + assert!(fields.iter().any(|field| { + field["name"].as_str() == Some("scenario") && field["value_type"].as_str() == Some("string") + })); + + let remove = harness.call_tool( + 863, + "schema.field.remove", + json!({ + "name": "scenario", + "node_classes": ["change", "analysis"] + }), + )?; + assert_eq!(remove["result"]["isError"].as_bool(), Some(false)); + assert_eq!(tool_content(&remove)["removed_count"].as_u64(), Some(1)); + + let schema_after = harness.call_tool(864, "project.schema", json!({ "detail": "full" }))?; + let fields_after = must_some( + tool_content(&schema_after)["fields"].as_array(), + "schema fields after remove", + )?; + assert!( + !fields_after + .iter() + .any(|field| field["name"].as_str() == Some("scenario")) + ); + Ok(()) +} + +#[test] +fn bind_open_backfills_legacy_missing_summary() -> TestResult { + let project_root = temp_project_root("bind_backfill")?; + init_project(&project_root)?; + + let node_id = { + let mut store = must(ProjectStore::open(&project_root), "open project store")?; + let node = must( + store.add_node(fidget_spinner_store_sqlite::CreateNodeRequest { + class: fidget_spinner_core::NodeClass::Research, + frontier_id: None, + title: must(NonEmptyText::new("legacy research"), "legacy title")?, + summary: Some(must( + NonEmptyText::new("temporary summary"), + "temporary summary", + )?), + tags: None, + payload: fidget_spinner_core::NodePayload::with_schema( + store.schema().schema_ref(), + serde_json::from_value(json!({ + "body": "Derived summary first paragraph.\n\nLonger body follows." + })) + .map_err(|error| io::Error::other(format!("payload object: {error}")))?, + ), + annotations: Vec::new(), + attachments: Vec::new(), + }), + "create legacy research node", + )?; + node.id.to_string() + }; + + let database_path = project_root.join(".fidget_spinner").join("state.sqlite"); + let clear_output = must( + Command::new("sqlite3") + .current_dir(project_root.as_std_path()) + .arg(database_path.as_str()) + .arg(format!( + "UPDATE nodes SET summary = NULL WHERE id = '{node_id}';" + )) + .output(), + "spawn sqlite3 for direct summary clear", + )?; + if !clear_output.status.success() { + return Err(io::Error::other(format!( + "sqlite3 summary clear failed: {}", + String::from_utf8_lossy(&clear_output.stderr) + )) + .into()); + } + + let mut harness = McpHarness::spawn(None, &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + let bind = harness.bind_project(60, &project_root)?; + assert_eq!(bind["result"]["isError"].as_bool(), Some(false)); + + let read = harness.call_tool(61, "node.read", json!({ "node_id": node_id }))?; + assert_eq!(read["result"]["isError"].as_bool(), Some(false)); + assert_eq!( + tool_content(&read)["summary"].as_str(), + Some("Derived summary first paragraph.") + ); + + let listed = harness.call_tool(62, "node.list", json!({ "class": "research" }))?; + let items = must_some(tool_content(&listed).as_array(), "research node list")?; + assert_eq!(items.len(), 1); + assert_eq!( + items[0]["summary"].as_str(), + Some("Derived summary first paragraph.") + ); + Ok(()) +} + +#[test] +fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResult { + let project_root = temp_project_root("metric_rank_e2e")?; + init_project(&project_root)?; + let _initial_head = init_git_project(&project_root)?; + + let mut harness = McpHarness::spawn(Some(&project_root), &[])?; + let _ = harness.initialize()?; + harness.notify_initialized()?; + + let frontier = harness.call_tool( + 70, + "frontier.init", + json!({ + "label": "metric frontier", + "objective": "exercise metric ranking", + "contract_title": "metric contract", + "benchmark_suites": ["smoke"], + "promotion_criteria": ["rank by one key"], + "primary_metric": { + "key": "wall_clock_s", + "unit": "seconds", + "objective": "minimize" + } + }), + )?; + assert_eq!(frontier["result"]["isError"].as_bool(), Some(false)); + let frontier_id = must_some( + tool_content(&frontier)["frontier_id"].as_str(), + "frontier id", + )? + .to_owned(); + let base_checkpoint_id = must_some( + tool_content(&frontier)["champion_checkpoint_id"].as_str(), + "base checkpoint id", + )? + .to_owned(); + let metric_define = harness.call_tool( + 701, + "metric.define", + json!({ + "key": "wall_clock_s", + "unit": "seconds", + "objective": "minimize", + "description": "elapsed wall time" + }), + )?; + assert_eq!(metric_define["result"]["isError"].as_bool(), Some(false)); + + let scenario_dimension = harness.call_tool( + 702, + "run.dimension.define", + json!({ + "key": "scenario", + "value_type": "string", + "description": "workload family" + }), + )?; + assert_eq!( + scenario_dimension["result"]["isError"].as_bool(), + Some(false) + ); + + let duration_dimension = harness.call_tool( + 703, + "run.dimension.define", + json!({ + "key": "duration_s", + "value_type": "numeric", + "description": "time budget in seconds" + }), + )?; + assert_eq!( + duration_dimension["result"]["isError"].as_bool(), + Some(false) + ); + + let dimensions = harness.call_tool(704, "run.dimension.list", json!({}))?; + assert_eq!(dimensions["result"]["isError"].as_bool(), Some(false)); + let dimension_rows = must_some(tool_content(&dimensions).as_array(), "run dimension rows")?; + assert!(dimension_rows.iter().any(|row| { + row["key"].as_str() == Some("benchmark_suite") + && row["value_type"].as_str() == Some("string") + })); + assert!(dimension_rows.iter().any(|row| { + row["key"].as_str() == Some("scenario") + && row["description"].as_str() == Some("workload family") + })); + assert!(dimension_rows.iter().any(|row| { + row["key"].as_str() == Some("duration_s") && row["value_type"].as_str() == Some("numeric") + })); + + let first_change = harness.call_tool( + 71, + "node.create", + json!({ + "class": "change", + "frontier_id": frontier_id, + "title": "first change", + "summary": "first change summary", + "payload": { + "body": "first change body", + "wall_clock_s": 14.0 + } + }), + )?; + assert_eq!(first_change["result"]["isError"].as_bool(), Some(false)); + let first_change_id = must_some( + tool_content(&first_change)["id"].as_str(), + "first change id", + )?; + let _first_commit = commit_project_state(&project_root, "candidate-one.txt", "candidate one")?; + + let first_close = harness.call_tool( + 72, + "experiment.close", + json!({ + "frontier_id": frontier_id, + "base_checkpoint_id": base_checkpoint_id, + "change_node_id": first_change_id, + "candidate_summary": "candidate one", + "run": { + "title": "first run", + "summary": "first run summary", + "backend": "worktree_process", + "dimensions": { + "benchmark_suite": "smoke", + "scenario": "belt_4x5", + "duration_s": 20.0 + }, + "command": { + "working_directory": project_root.as_str(), + "argv": ["true"] + } + }, + "primary_metric": { + "key": "wall_clock_s", + "value": 10.0 + }, + "note": { + "summary": "first run note" + }, + "verdict": "keep_on_frontier", + "decision_title": "first decision", + "decision_rationale": "keep first candidate around" + }), + )?; + assert_eq!(first_close["result"]["isError"].as_bool(), Some(false)); + + let first_candidate_checkpoint_id = must_some( + tool_content(&first_close)["candidate_checkpoint_id"].as_str(), + "first candidate checkpoint id", + )? + .to_owned(); + + let second_change = harness.call_tool( + 73, + "node.create", + json!({ + "class": "change", + "frontier_id": frontier_id, + "title": "second change", + "summary": "second change summary", + "payload": { + "body": "second change body", + "wall_clock_s": 7.0 + } + }), + )?; + assert_eq!(second_change["result"]["isError"].as_bool(), Some(false)); + let second_change_id = must_some( + tool_content(&second_change)["id"].as_str(), + "second change id", + )?; + let second_commit = commit_project_state(&project_root, "candidate-two.txt", "candidate two")?; + + let second_close = harness.call_tool( + 74, + "experiment.close", + json!({ + "frontier_id": frontier_id, + "base_checkpoint_id": base_checkpoint_id, + "change_node_id": second_change_id, + "candidate_summary": "candidate two", + "run": { + "title": "second run", + "summary": "second run summary", + "backend": "worktree_process", + "dimensions": { + "benchmark_suite": "smoke", + "scenario": "belt_4x5", + "duration_s": 60.0 + }, + "command": { + "working_directory": project_root.as_str(), + "argv": ["true"] + } + }, + "primary_metric": { + "key": "wall_clock_s", + "value": 5.0 + }, + "note": { + "summary": "second run note" + }, + "verdict": "keep_on_frontier", + "decision_title": "second decision", + "decision_rationale": "second candidate looks stronger" + }), + )?; + assert_eq!(second_close["result"]["isError"].as_bool(), Some(false)); + let second_candidate_checkpoint_id = must_some( + tool_content(&second_close)["candidate_checkpoint_id"].as_str(), + "second candidate checkpoint id", + )? + .to_owned(); + + let second_frontier = harness.call_tool( + 80, + "frontier.init", + json!({ + "label": "metric frontier two", + "objective": "exercise frontier filtering", + "contract_title": "metric contract two", + "benchmark_suites": ["smoke"], + "promotion_criteria": ["frontier filters should isolate rankings"], + "primary_metric": { + "key": "wall_clock_s", + "unit": "seconds", + "objective": "minimize" + } + }), + )?; + assert_eq!(second_frontier["result"]["isError"].as_bool(), Some(false)); + let second_frontier_id = must_some( + tool_content(&second_frontier)["frontier_id"].as_str(), + "second frontier id", + )? + .to_owned(); + let second_base_checkpoint_id = must_some( + tool_content(&second_frontier)["champion_checkpoint_id"].as_str(), + "second frontier base checkpoint id", + )? + .to_owned(); + + let third_change = harness.call_tool( + 81, + "node.create", + json!({ + "class": "change", + "frontier_id": second_frontier_id, + "title": "third change", + "summary": "third change summary", + "payload": { + "body": "third change body", + "wall_clock_s": 3.0 + } + }), + )?; + assert_eq!(third_change["result"]["isError"].as_bool(), Some(false)); + let third_change_id = must_some( + tool_content(&third_change)["id"].as_str(), + "third change id", + )?; + let third_commit = + commit_project_state(&project_root, "candidate-three.txt", "candidate three")?; + + let third_close = harness.call_tool( + 82, + "experiment.close", + json!({ + "frontier_id": second_frontier_id, + "base_checkpoint_id": second_base_checkpoint_id, + "change_node_id": third_change_id, + "candidate_summary": "candidate three", + "run": { + "title": "third run", + "summary": "third run summary", + "backend": "worktree_process", + "dimensions": { + "benchmark_suite": "smoke", + "scenario": "belt_4x5_alt", + "duration_s": 60.0 + }, + "command": { + "working_directory": project_root.as_str(), + "argv": ["true"] + } + }, + "primary_metric": { + "key": "wall_clock_s", + "value": 3.0 + }, + "note": { + "summary": "third run note" + }, + "verdict": "keep_on_frontier", + "decision_title": "third decision", + "decision_rationale": "third candidate is best overall but not in the first frontier" + }), + )?; + assert_eq!(third_close["result"]["isError"].as_bool(), Some(false)); + let third_candidate_checkpoint_id = must_some( + tool_content(&third_close)["candidate_checkpoint_id"].as_str(), + "third candidate checkpoint id", + )? + .to_owned(); + + let keys = harness.call_tool(75, "metric.keys", json!({}))?; + assert_eq!(keys["result"]["isError"].as_bool(), Some(false)); + let key_rows = must_some(tool_content(&keys).as_array(), "metric keys array")?; + assert!(key_rows.iter().any(|row| { + row["key"].as_str() == Some("wall_clock_s") && row["source"].as_str() == Some("run_metric") + })); + assert!(key_rows.iter().any(|row| { + row["key"].as_str() == Some("wall_clock_s") + && row["source"].as_str() == Some("run_metric") + && row["description"].as_str() == Some("elapsed wall time") + && row["requires_order"].as_bool() == Some(false) + })); + assert!(key_rows.iter().any(|row| { + row["key"].as_str() == Some("wall_clock_s") + && row["source"].as_str() == Some("change_payload") + })); + + let filtered_keys = harness.call_tool( + 750, + "metric.keys", + json!({ + "source": "run_metric", + "dimensions": { + "scenario": "belt_4x5", + "duration_s": 60.0 + } + }), + )?; + assert_eq!(filtered_keys["result"]["isError"].as_bool(), Some(false)); + let filtered_key_rows = must_some( + tool_content(&filtered_keys).as_array(), + "filtered metric keys array", + )?; + assert_eq!(filtered_key_rows.len(), 1); + assert_eq!(filtered_key_rows[0]["key"].as_str(), Some("wall_clock_s")); + assert_eq!(filtered_key_rows[0]["experiment_count"].as_u64(), Some(1)); + + let ambiguous = harness.call_tool(76, "metric.best", json!({ "key": "wall_clock_s" }))?; + assert_eq!(ambiguous["result"]["isError"].as_bool(), Some(true)); + assert!( + fault_message(&ambiguous) + .is_some_and(|message| message.contains("ambiguous across sources")) + ); + + let run_metric_best = harness.call_tool( + 77, + "metric.best", + json!({ + "key": "wall_clock_s", + "source": "run_metric", + "dimensions": { + "scenario": "belt_4x5", + "duration_s": 60.0 + }, + "limit": 5 + }), + )?; + assert_eq!(run_metric_best["result"]["isError"].as_bool(), Some(false)); + let run_best_rows = must_some( + tool_content(&run_metric_best).as_array(), + "run metric best array", + )?; + assert_eq!(run_best_rows[0]["value"].as_f64(), Some(5.0)); + assert_eq!(run_best_rows.len(), 1); + assert_eq!( + run_best_rows[0]["candidate_checkpoint_id"].as_str(), + Some(second_candidate_checkpoint_id.as_str()) + ); + assert_eq!( + run_best_rows[0]["candidate_commit_hash"].as_str(), + Some(second_commit.as_str()) + ); + assert_eq!( + run_best_rows[0]["dimensions"]["scenario"].as_str(), + Some("belt_4x5") + ); + assert_eq!( + run_best_rows[0]["dimensions"]["duration_s"].as_f64(), + Some(60.0) + ); + assert!(must_some(tool_text(&run_metric_best), "run metric best text")?.contains("commit=")); + assert!(must_some(tool_text(&run_metric_best), "run metric best text")?.contains("dims:")); + + let payload_requires_order = harness.call_tool( + 78, + "metric.best", + json!({ + "key": "wall_clock_s", + "source": "change_payload" + }), + )?; + assert_eq!( + payload_requires_order["result"]["isError"].as_bool(), + Some(true) + ); + assert!( + fault_message(&payload_requires_order) + .is_some_and(|message| message.contains("explicit order")) + ); + + let payload_best = harness.call_tool( + 79, + "metric.best", + json!({ + "key": "wall_clock_s", + "source": "change_payload", + "dimensions": { + "scenario": "belt_4x5", + "duration_s": 60.0 + }, + "order": "asc" + }), + )?; + assert_eq!(payload_best["result"]["isError"].as_bool(), Some(false)); + let payload_best_rows = must_some( + tool_content(&payload_best).as_array(), + "payload metric best array", + )?; + assert_eq!(payload_best_rows[0]["value"].as_f64(), Some(7.0)); + assert_eq!(payload_best_rows.len(), 1); + assert_eq!( + payload_best_rows[0]["candidate_checkpoint_id"].as_str(), + Some(second_candidate_checkpoint_id.as_str()) + ); + assert_eq!( + payload_best_rows[0]["candidate_commit_hash"].as_str(), + Some(second_commit.as_str()) + ); + + let filtered_best = harness.call_tool( + 83, + "metric.best", + json!({ + "key": "wall_clock_s", + "source": "run_metric", + "frontier_id": frontier_id, + "dimensions": { + "scenario": "belt_4x5" + }, + "limit": 5 + }), + )?; + assert_eq!(filtered_best["result"]["isError"].as_bool(), Some(false)); + let filtered_rows = must_some( + tool_content(&filtered_best).as_array(), + "filtered metric best array", + )?; + assert_eq!(filtered_rows.len(), 2); + assert_eq!( + filtered_rows[0]["candidate_checkpoint_id"].as_str(), + Some(second_candidate_checkpoint_id.as_str()) + ); + assert!( + filtered_rows + .iter() + .all(|row| row["frontier_id"].as_str() == Some(frontier_id.as_str())) + ); + + let global_best = harness.call_tool( + 84, + "metric.best", + json!({ + "key": "wall_clock_s", + "source": "run_metric", + "limit": 5 + }), + )?; + assert_eq!(global_best["result"]["isError"].as_bool(), Some(false)); + let global_rows = must_some( + tool_content(&global_best).as_array(), + "global metric best array", + )?; + assert_eq!( + global_rows[0]["candidate_checkpoint_id"].as_str(), + Some(third_candidate_checkpoint_id.as_str()) + ); + assert_eq!( + global_rows[0]["candidate_commit_hash"].as_str(), + Some(third_commit.as_str()) + ); + + let migrate = harness.call_tool(85, "metric.migrate", json!({}))?; + assert_eq!(migrate["result"]["isError"].as_bool(), Some(false)); + assert_eq!( + tool_content(&migrate)["inserted_metric_definitions"].as_u64(), + Some(0) + ); + assert_eq!( + tool_content(&migrate)["inserted_dimension_definitions"].as_u64(), + Some(0) + ); + assert_eq!( + tool_content(&migrate)["inserted_dimension_values"].as_u64(), + Some(0) + ); + + assert_ne!( + first_candidate_checkpoint_id, + second_candidate_checkpoint_id + ); + assert_ne!( + second_candidate_checkpoint_id, + third_candidate_checkpoint_id + ); + Ok(()) +} diff --git a/crates/fidget-spinner-core/src/lib.rs b/crates/fidget-spinner-core/src/lib.rs index b5e2b23..c0d6fe2 100644 --- a/crates/fidget-spinner-core/src/lib.rs +++ b/crates/fidget-spinner-core/src/lib.rs @@ -20,8 +20,9 @@ pub use crate::model::{ DagEdge, DagNode, DiagnosticSeverity, EdgeKind, EvaluationProtocol, ExecutionBackend, ExperimentResult, FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote, FrontierProjection, FrontierRecord, FrontierStatus, FrontierVerdict, GitCommitHash, - InferencePolicy, JsonObject, MetricObservation, MetricSpec, MetricUnit, NodeAnnotation, - NodeClass, NodeDiagnostics, NodePayload, NodeTrack, NonEmptyText, OptimizationObjective, - PayloadSchemaRef, ProjectFieldSpec, ProjectSchema, RunRecord, RunStatus, TagName, TagRecord, + InferencePolicy, JsonObject, MetricDefinition, MetricObservation, MetricSpec, MetricUnit, + MetricValue, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NodeTrack, NonEmptyText, + OptimizationObjective, PayloadSchemaRef, ProjectFieldSpec, ProjectSchema, + RunDimensionDefinition, RunDimensionValue, RunRecord, RunStatus, TagName, TagRecord, ValidationDiagnostic, }; diff --git a/crates/fidget-spinner-core/src/model.rs b/crates/fidget-spinner-core/src/model.rs index 2de3705..a77566f 100644 --- a/crates/fidget-spinner-core/src/model.rs +++ b/crates/fidget-spinner-core/src/model.rs @@ -177,6 +177,17 @@ pub enum DiagnosticSeverity { Info, } +impl DiagnosticSeverity { + #[must_use] + pub const fn as_str(self) -> &'static str { + match self { + Self::Error => "error", + Self::Warning => "warning", + Self::Info => "info", + } + } +} + #[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)] pub enum FieldPresence { Required, @@ -184,6 +195,17 @@ pub enum FieldPresence { Optional, } +impl FieldPresence { + #[must_use] + pub const fn as_str(self) -> &'static str { + match self { + Self::Required => "required", + Self::Recommended => "recommended", + Self::Optional => "optional", + } + } +} + #[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)] pub enum FieldRole { Index, @@ -192,12 +214,34 @@ pub enum FieldRole { Opaque, } +impl FieldRole { + #[must_use] + pub const fn as_str(self) -> &'static str { + match self { + Self::Index => "index", + Self::ProjectionGate => "projection_gate", + Self::RenderOnly => "render_only", + Self::Opaque => "opaque", + } + } +} + #[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)] pub enum InferencePolicy { ManualOnly, ModelMayInfer, } +impl InferencePolicy { + #[must_use] + pub const fn as_str(self) -> &'static str { + match self { + Self::ManualOnly => "manual_only", + Self::ModelMayInfer => "model_may_infer", + } + } +} + #[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)] #[serde(rename_all = "snake_case")] pub enum FieldValueType { @@ -269,6 +313,96 @@ pub enum OptimizationObjective { Target, } +#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +pub struct MetricDefinition { + pub key: NonEmptyText, + pub unit: MetricUnit, + pub objective: OptimizationObjective, + pub description: Option<NonEmptyText>, + pub created_at: OffsetDateTime, +} + +impl MetricDefinition { + #[must_use] + pub fn new( + key: NonEmptyText, + unit: MetricUnit, + objective: OptimizationObjective, + description: Option<NonEmptyText>, + ) -> Self { + Self { + key, + unit, + objective, + description, + created_at: OffsetDateTime::now_utc(), + } + } +} + +#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] +#[serde(rename_all = "snake_case", tag = "type", content = "value")] +pub enum RunDimensionValue { + String(NonEmptyText), + Numeric(f64), + Boolean(bool), + Timestamp(NonEmptyText), +} + +impl RunDimensionValue { + #[must_use] + pub const fn value_type(&self) -> FieldValueType { + match self { + Self::String(_) => FieldValueType::String, + Self::Numeric(_) => FieldValueType::Numeric, + Self::Boolean(_) => FieldValueType::Boolean, + Self::Timestamp(_) => FieldValueType::Timestamp, + } + } + + #[must_use] + pub fn as_json(&self) -> Value { + match self { + Self::String(value) | Self::Timestamp(value) => Value::String(value.to_string()), + Self::Numeric(value) => { + serde_json::Number::from_f64(*value).map_or(Value::Null, Value::Number) + } + Self::Boolean(value) => Value::Bool(*value), + } + } +} + +#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +pub struct RunDimensionDefinition { + pub key: NonEmptyText, + pub value_type: FieldValueType, + pub description: Option<NonEmptyText>, + pub created_at: OffsetDateTime, +} + +impl RunDimensionDefinition { + #[must_use] + pub fn new( + key: NonEmptyText, + value_type: FieldValueType, + description: Option<NonEmptyText>, + ) -> Self { + Self { + key, + value_type, + description, + created_at: OffsetDateTime::now_utc(), + } + } +} + +#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] +pub struct MetricValue { + #[serde(alias = "metric_key")] + pub key: NonEmptyText, + pub value: f64, +} + #[derive(Clone, Copy, Debug, Deserialize, Eq, PartialEq, Serialize)] pub enum RunStatus { Queued, @@ -710,7 +844,7 @@ pub struct CheckpointRecord { pub created_at: OffsetDateTime, } -#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] pub struct RunRecord { pub node_id: NodeId, pub run_id: RunId, @@ -718,7 +852,7 @@ pub struct RunRecord { pub status: RunStatus, pub backend: ExecutionBackend, pub code_snapshot: Option<CodeSnapshotRef>, - pub benchmark_suite: Option<NonEmptyText>, + pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, pub command: CommandRecipe, pub started_at: Option<OffsetDateTime>, pub finished_at: Option<OffsetDateTime>, @@ -726,9 +860,9 @@ pub struct RunRecord { #[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] pub struct ExperimentResult { - pub benchmark_suite: NonEmptyText, - pub primary_metric: MetricObservation, - pub supporting_metrics: Vec<MetricObservation>, + pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, + pub primary_metric: MetricValue, + pub supporting_metrics: Vec<MetricValue>, pub benchmark_bundle: Option<ArtifactId>, } diff --git a/crates/fidget-spinner-store-sqlite/src/lib.rs b/crates/fidget-spinner-store-sqlite/src/lib.rs index da9fa42..b970629 100644 --- a/crates/fidget-spinner-store-sqlite/src/lib.rs +++ b/crates/fidget-spinner-store-sqlite/src/lib.rs @@ -1,4 +1,5 @@ -use std::collections::BTreeSet; +use std::cmp::Ordering; +use std::collections::{BTreeMap, BTreeSet}; use std::fmt::Write as _; use std::fs; use std::io; @@ -7,11 +8,13 @@ use std::process::Command; use camino::{Utf8Path, Utf8PathBuf}; use fidget_spinner_core::{ AnnotationVisibility, CheckpointDisposition, CheckpointRecord, CheckpointSnapshotRef, - CodeSnapshotRef, CommandRecipe, CompletedExperiment, DagEdge, DagNode, EdgeKind, - ExecutionBackend, ExperimentResult, FrontierContract, FrontierNote, FrontierProjection, - FrontierRecord, FrontierStatus, FrontierVerdict, GitCommitHash, JsonObject, MetricObservation, - MetricSpec, MetricUnit, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NonEmptyText, - OptimizationObjective, ProjectSchema, RunRecord, RunStatus, TagName, TagRecord, + CodeSnapshotRef, CommandRecipe, CompletedExperiment, DagEdge, DagNode, DiagnosticSeverity, + EdgeKind, ExecutionBackend, ExperimentResult, FieldPresence, FieldRole, FieldValueType, + FrontierContract, FrontierNote, FrontierProjection, FrontierRecord, FrontierStatus, + FrontierVerdict, GitCommitHash, InferencePolicy, JsonObject, MetricDefinition, MetricSpec, + MetricUnit, MetricValue, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NonEmptyText, + OptimizationObjective, ProjectFieldSpec, ProjectSchema, RunDimensionDefinition, + RunDimensionValue, RunRecord, RunStatus, TagName, TagRecord, }; use rusqlite::types::Value as SqlValue; use rusqlite::{Connection, OptionalExtension, Transaction, params, params_from_iter}; @@ -63,8 +66,50 @@ pub enum StoreError { DuplicateTag(TagName), #[error("note nodes require an explicit tag list; use an empty list if no tags apply")] NoteTagsRequired, + #[error("{0} nodes require a non-empty summary")] + ProseSummaryRequired(NodeClass), + #[error("{0} nodes require a non-empty string payload field `body`")] + ProseBodyRequired(NodeClass), #[error("git repository inspection failed for {0}")] GitInspectionFailed(Utf8PathBuf), + #[error("metric `{0}` is not registered")] + UnknownMetricDefinition(NonEmptyText), + #[error( + "metric `{key}` conflicts with existing definition ({existing_unit}/{existing_objective} vs {new_unit}/{new_objective})" + )] + ConflictingMetricDefinition { + key: String, + existing_unit: String, + existing_objective: String, + new_unit: String, + new_objective: String, + }, + #[error("run dimension `{0}` is not registered")] + UnknownRunDimension(NonEmptyText), + #[error("run dimension `{0}` already exists")] + DuplicateRunDimension(NonEmptyText), + #[error( + "run dimension `{key}` conflicts with existing definition ({existing_type} vs {new_type})" + )] + ConflictingRunDimensionDefinition { + key: String, + existing_type: String, + new_type: String, + }, + #[error("run dimension `{key}` expects {expected} values, got {observed}")] + InvalidRunDimensionValue { + key: String, + expected: String, + observed: String, + }, + #[error("schema field `{0}` was not found")] + SchemaFieldNotFound(String), + #[error("metric key `{key}` is ambiguous across sources: {sources}")] + AmbiguousMetricKey { key: String, sources: String }, + #[error("metric key `{key}` for source `{metric_source}` requires an explicit order")] + MetricOrderRequired { key: String, metric_source: String }, + #[error("metric key `{key}` for source `{metric_source}` has conflicting semantics")] + MetricSemanticsAmbiguous { key: String, metric_source: String }, } #[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] @@ -165,6 +210,155 @@ pub struct NodeSummary { pub updated_at: OffsetDateTime, } +#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)] +#[serde(rename_all = "snake_case")] +pub enum MetricFieldSource { + RunMetric, + ChangePayload, + RunPayload, + AnalysisPayload, + DecisionPayload, +} + +impl MetricFieldSource { + #[must_use] + pub const fn as_str(self) -> &'static str { + match self { + Self::RunMetric => "run_metric", + Self::ChangePayload => "change_payload", + Self::RunPayload => "run_payload", + Self::AnalysisPayload => "analysis_payload", + Self::DecisionPayload => "decision_payload", + } + } + + #[must_use] + pub const fn from_payload_class(class: NodeClass) -> Option<Self> { + match class { + NodeClass::Change => Some(Self::ChangePayload), + NodeClass::Run => Some(Self::RunPayload), + NodeClass::Analysis => Some(Self::AnalysisPayload), + NodeClass::Decision => Some(Self::DecisionPayload), + NodeClass::Contract | NodeClass::Research | NodeClass::Enabling | NodeClass::Note => { + None + } + } + } +} + +#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)] +#[serde(rename_all = "snake_case")] +pub enum MetricRankOrder { + Asc, + Desc, +} + +impl MetricRankOrder { + #[must_use] + pub const fn as_str(self) -> &'static str { + match self { + Self::Asc => "asc", + Self::Desc => "desc", + } + } +} + +#[derive(Clone, Debug)] +pub struct MetricBestQuery { + pub key: NonEmptyText, + pub frontier_id: Option<fidget_spinner_core::FrontierId>, + pub source: Option<MetricFieldSource>, + pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, + pub order: Option<MetricRankOrder>, + pub limit: u32, +} + +#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +pub struct MetricKeySummary { + pub key: NonEmptyText, + pub source: MetricFieldSource, + pub experiment_count: u64, + pub unit: Option<MetricUnit>, + pub objective: Option<OptimizationObjective>, + pub description: Option<NonEmptyText>, + pub requires_order: bool, +} + +#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] +pub struct MetricBestEntry { + pub key: NonEmptyText, + pub source: MetricFieldSource, + pub value: f64, + pub order: MetricRankOrder, + pub experiment_id: fidget_spinner_core::ExperimentId, + pub frontier_id: fidget_spinner_core::FrontierId, + pub change_node_id: fidget_spinner_core::NodeId, + pub change_title: NonEmptyText, + pub run_id: fidget_spinner_core::RunId, + pub verdict: FrontierVerdict, + pub candidate_checkpoint_id: fidget_spinner_core::CheckpointId, + pub candidate_commit_hash: GitCommitHash, + pub unit: Option<MetricUnit>, + pub objective: Option<OptimizationObjective>, + pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, +} + +#[derive(Clone, Debug, Default)] +pub struct MetricKeyQuery { + pub frontier_id: Option<fidget_spinner_core::FrontierId>, + pub source: Option<MetricFieldSource>, + pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, +} + +#[derive(Clone, Debug)] +pub struct DefineMetricRequest { + pub key: NonEmptyText, + pub unit: MetricUnit, + pub objective: OptimizationObjective, + pub description: Option<NonEmptyText>, +} + +#[derive(Clone, Debug)] +pub struct DefineRunDimensionRequest { + pub key: NonEmptyText, + pub value_type: FieldValueType, + pub description: Option<NonEmptyText>, +} + +#[derive(Clone, Debug)] +pub struct UpsertSchemaFieldRequest { + pub name: NonEmptyText, + pub node_classes: BTreeSet<NodeClass>, + pub presence: FieldPresence, + pub severity: DiagnosticSeverity, + pub role: FieldRole, + pub inference_policy: InferencePolicy, + pub value_type: Option<FieldValueType>, +} + +#[derive(Clone, Debug)] +pub struct RemoveSchemaFieldRequest { + pub name: NonEmptyText, + pub node_classes: Option<BTreeSet<NodeClass>>, +} + +#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)] +pub struct RunDimensionSummary { + pub key: NonEmptyText, + pub value_type: FieldValueType, + pub description: Option<NonEmptyText>, + pub observed_run_count: u64, + pub distinct_value_count: u64, + pub sample_values: Vec<Value>, +} + +#[derive(Clone, Debug, Default, Deserialize, Eq, PartialEq, Serialize)] +pub struct MetricPlaneMigrationReport { + pub inserted_metric_definitions: u64, + pub inserted_dimension_definitions: u64, + pub inserted_dimension_values: u64, +} + #[derive(Clone, Debug)] pub struct CreateFrontierRequest { pub label: NonEmptyText, @@ -190,11 +384,11 @@ pub struct CloseExperimentRequest { pub run_title: NonEmptyText, pub run_summary: Option<NonEmptyText>, pub backend: ExecutionBackend, - pub benchmark_suite: NonEmptyText, + pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, pub command: CommandRecipe, pub code_snapshot: Option<CodeSnapshotRef>, - pub primary_metric: MetricObservation, - pub supporting_metrics: Vec<MetricObservation>, + pub primary_metric: MetricValue, + pub supporting_metrics: Vec<MetricValue>, pub note: FrontierNote, pub verdict: FrontierVerdict, pub decision_title: NonEmptyText, @@ -233,8 +427,8 @@ impl ProjectStore { let schema = ProjectSchema::default_with_namespace(schema_namespace); write_json_file(&state_root.join(PROJECT_SCHEMA_NAME), &schema)?; - let connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?; - migrate(&connection)?; + let mut connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?; + upgrade_store(&mut connection)?; Ok(Self { project_root, @@ -252,8 +446,8 @@ impl ProjectStore { let state_root = state_root(&project_root); let config = read_json_file::<ProjectConfig>(&state_root.join(PROJECT_CONFIG_NAME))?; let schema = read_json_file::<ProjectSchema>(&state_root.join(PROJECT_SCHEMA_NAME))?; - let connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?; - migrate(&connection)?; + let mut connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?; + upgrade_store(&mut connection)?; Ok(Self { project_root, state_root, @@ -273,6 +467,59 @@ impl ProjectStore { &self.schema } + pub fn upsert_schema_field( + &mut self, + request: UpsertSchemaFieldRequest, + ) -> Result<ProjectFieldSpec, StoreError> { + let field = ProjectFieldSpec { + name: request.name, + node_classes: request.node_classes, + presence: request.presence, + severity: request.severity, + role: request.role, + inference_policy: request.inference_policy, + value_type: request.value_type, + }; + if let Some(existing) = self.schema.fields.iter_mut().find(|existing| { + existing.name == field.name && existing.node_classes == field.node_classes + }) { + if *existing == field { + return Ok(field); + } + *existing = field.clone(); + } else { + self.schema.fields.push(field.clone()); + } + sort_schema_fields(&mut self.schema.fields); + self.bump_schema_version(); + self.save_schema()?; + Ok(field) + } + + pub fn remove_schema_field( + &mut self, + request: RemoveSchemaFieldRequest, + ) -> Result<u64, StoreError> { + let before = self.schema.fields.len(); + self.schema.fields.retain(|field| { + field.name != request.name + || request + .node_classes + .as_ref() + .is_some_and(|node_classes| field.node_classes != *node_classes) + }); + let removed = before.saturating_sub(self.schema.fields.len()) as u64; + if removed == 0 { + return Err(StoreError::SchemaFieldNotFound( + request.name.as_str().to_owned(), + )); + } + sort_schema_fields(&mut self.schema.fields); + self.bump_schema_version(); + self.save_schema()?; + Ok(removed) + } + #[must_use] pub fn project_root(&self) -> &Utf8Path { &self.project_root @@ -283,6 +530,14 @@ impl ProjectStore { &self.state_root } + fn bump_schema_version(&mut self) { + self.schema.version = self.schema.version.saturating_add(1); + } + + fn save_schema(&self) -> Result<(), StoreError> { + write_json_file(&self.state_root.join(PROJECT_SCHEMA_NAME), &self.schema) + } + pub fn create_frontier( &mut self, request: CreateFrontierRequest, @@ -304,6 +559,31 @@ impl ProjectStore { let frontier = FrontierRecord::with_id(frontier_id, request.label, contract_node.id); let tx = self.connection.transaction()?; + let _ = upsert_metric_definition_tx( + &tx, + &MetricDefinition::new( + request + .contract + .evaluation + .primary_metric + .metric_key + .clone(), + request.contract.evaluation.primary_metric.unit, + request.contract.evaluation.primary_metric.objective, + None, + ), + )?; + for metric in &request.contract.evaluation.supporting_metrics { + let _ = upsert_metric_definition_tx( + &tx, + &MetricDefinition::new( + metric.metric_key.clone(), + metric.unit, + metric.objective, + None, + ), + )?; + } insert_node(&tx, &contract_node)?; insert_frontier(&tx, &frontier)?; if let Some(seed) = request.initial_checkpoint { @@ -330,6 +610,75 @@ impl ProjectStore { self.frontier_projection(frontier.id) } + pub fn define_metric( + &mut self, + request: DefineMetricRequest, + ) -> Result<MetricDefinition, StoreError> { + let record = MetricDefinition::new( + request.key, + request.unit, + request.objective, + request.description, + ); + let tx = self.connection.transaction()?; + let _ = upsert_metric_definition_tx(&tx, &record)?; + tx.commit()?; + Ok(record) + } + + pub fn list_metric_definitions(&self) -> Result<Vec<MetricDefinition>, StoreError> { + let mut statement = self.connection.prepare( + "SELECT metric_key, unit, objective, description, created_at + FROM metric_definitions + ORDER BY metric_key ASC", + )?; + let mut rows = statement.query([])?; + let mut items = Vec::new(); + while let Some(row) = rows.next()? { + items.push(MetricDefinition { + key: NonEmptyText::new(row.get::<_, String>(0)?)?, + unit: decode_metric_unit(&row.get::<_, String>(1)?)?, + objective: decode_optimization_objective(&row.get::<_, String>(2)?)?, + description: row + .get::<_, Option<String>>(3)? + .map(NonEmptyText::new) + .transpose()?, + created_at: decode_timestamp(&row.get::<_, String>(4)?)?, + }); + } + Ok(items) + } + + pub fn define_run_dimension( + &mut self, + request: DefineRunDimensionRequest, + ) -> Result<RunDimensionDefinition, StoreError> { + let record = + RunDimensionDefinition::new(request.key, request.value_type, request.description); + let tx = self.connection.transaction()?; + let _ = insert_run_dimension_definition_tx(&tx, &record)?; + tx.commit()?; + Ok(record) + } + + pub fn list_run_dimensions(&self) -> Result<Vec<RunDimensionSummary>, StoreError> { + load_run_dimension_summaries(self) + } + + pub fn coerce_run_dimensions( + &self, + raw_dimensions: BTreeMap<String, Value>, + ) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, StoreError> { + coerce_run_dimension_map(&run_dimension_definitions_by_key(self)?, raw_dimensions) + } + + pub fn migrate_metric_plane(&mut self) -> Result<MetricPlaneMigrationReport, StoreError> { + let tx = self.connection.transaction()?; + let report = normalize_metric_plane_tx(&tx)?; + tx.commit()?; + Ok(report) + } + pub fn add_tag( &mut self, name: TagName, @@ -372,6 +721,7 @@ impl ProjectStore { } pub fn add_node(&mut self, request: CreateNodeRequest) -> Result<DagNode, StoreError> { + validate_prose_node_request(&request)?; let diagnostics = self.schema.validate_node(request.class, &request.payload); let mut node = DagNode::new( request.class, @@ -406,6 +756,100 @@ impl ProjectStore { Ok(node) } + pub fn list_metric_keys(&self) -> Result<Vec<MetricKeySummary>, StoreError> { + self.list_metric_keys_filtered(MetricKeyQuery::default()) + } + + pub fn list_metric_keys_filtered( + &self, + query: MetricKeyQuery, + ) -> Result<Vec<MetricKeySummary>, StoreError> { + let mut summaries = collect_metric_samples(self, &query)? + .into_iter() + .fold( + BTreeMap::<(MetricFieldSource, String), MetricKeyAccumulator>::new(), + |mut accumulators, sample| { + let key = (sample.source, sample.key.as_str().to_owned()); + let _ = accumulators + .entry(key) + .and_modify(|entry| entry.observe(&sample)) + .or_insert_with(|| MetricKeyAccumulator::from_sample(&sample)); + accumulators + }, + ) + .into_values() + .map(MetricKeyAccumulator::finish) + .collect::<Vec<_>>(); + if query + .source + .is_none_or(|source| source == MetricFieldSource::RunMetric) + { + merge_registered_run_metric_summaries(self, &mut summaries)?; + } + summaries.sort_by(|left, right| { + left.key + .cmp(&right.key) + .then(left.source.cmp(&right.source)) + }); + Ok(summaries) + } + + pub fn best_metrics(&self, query: MetricBestQuery) -> Result<Vec<MetricBestEntry>, StoreError> { + let matching = collect_metric_samples( + self, + &MetricKeyQuery { + frontier_id: query.frontier_id, + source: query.source, + dimensions: query.dimensions.clone(), + }, + )? + .into_iter() + .filter(|sample| sample.key == query.key) + .collect::<Vec<_>>(); + if matching.is_empty() { + return Ok(Vec::new()); + } + + let source = if let Some(source) = query.source { + source + } else { + let sources = matching + .iter() + .map(|sample| sample.source) + .collect::<BTreeSet<_>>(); + if sources.len() != 1 { + return Err(StoreError::AmbiguousMetricKey { + key: query.key.as_str().to_owned(), + sources: sources + .into_iter() + .map(MetricFieldSource::as_str) + .collect::<Vec<_>>() + .join(", "), + }); + } + let Some(source) = sources.iter().copied().next() else { + return Ok(Vec::new()); + }; + source + }; + + let mut matching = matching + .into_iter() + .filter(|sample| sample.source == source) + .collect::<Vec<_>>(); + if matching.is_empty() { + return Ok(Vec::new()); + } + + let order = resolve_metric_order(&matching, &query, source)?; + matching.sort_by(|left, right| compare_metric_samples(left, right, order)); + matching.truncate(query.limit as usize); + Ok(matching + .into_iter() + .map(|sample| sample.into_entry(order)) + .collect()) + } + pub fn archive_node(&mut self, node_id: fidget_spinner_core::NodeId) -> Result<(), StoreError> { let updated_at = encode_timestamp(OffsetDateTime::now_utc())?; let changed = self.connection.execute( @@ -668,11 +1112,26 @@ impl ProjectStore { if base_checkpoint.frontier_id != request.frontier_id { return Err(StoreError::CheckpointNotFound(request.base_checkpoint_id)); } + let tx = self.connection.transaction()?; + let dimensions = validate_run_dimensions_tx(&tx, &request.dimensions)?; + let primary_metric_definition = + load_metric_definition_tx(&tx, &request.primary_metric.key)?.ok_or_else(|| { + StoreError::UnknownMetricDefinition(request.primary_metric.key.clone()) + })?; + let supporting_metric_definitions = request + .supporting_metrics + .iter() + .map(|metric| { + load_metric_definition_tx(&tx, &metric.key)? + .ok_or_else(|| StoreError::UnknownMetricDefinition(metric.key.clone())) + }) + .collect::<Result<Vec<_>, StoreError>>()?; + let benchmark_suite = benchmark_suite_label(&dimensions); let run_payload = NodePayload::with_schema( self.schema.schema_ref(), json_object(json!({ - "benchmark_suite": request.benchmark_suite.as_str(), + "dimensions": run_dimensions_json(&dimensions), "backend": format!("{:?}", request.backend), "command": request.command.argv.iter().map(NonEmptyText::as_str).collect::<Vec<_>>(), }))?, @@ -695,7 +1154,7 @@ impl ProjectStore { status: RunStatus::Succeeded, backend: request.backend, code_snapshot: request.code_snapshot, - benchmark_suite: Some(request.benchmark_suite.clone()), + dimensions: dimensions.clone(), command: request.command, started_at: Some(now), finished_at: Some(now), @@ -748,7 +1207,7 @@ impl ProjectStore { analysis_node_id: request.analysis_node_id, decision_node_id: decision_node.id, result: ExperimentResult { - benchmark_suite: request.benchmark_suite, + dimensions: dimensions.clone(), primary_metric: request.primary_metric, supporting_metrics: request.supporting_metrics, benchmark_bundle: None, @@ -757,8 +1216,6 @@ impl ProjectStore { verdict: request.verdict, created_at: now, }; - - let tx = self.connection.transaction()?; insert_node(&tx, &run_node)?; insert_node(&tx, &decision_node)?; insert_edge( @@ -780,9 +1237,13 @@ impl ProjectStore { insert_run( &tx, &run, + benchmark_suite.as_deref(), &experiment.result.primary_metric, + &primary_metric_definition, &experiment.result.supporting_metrics, + supporting_metric_definitions.as_slice(), )?; + insert_run_dimensions(&tx, run.run_id, &dimensions)?; match request.verdict { FrontierVerdict::PromoteToChampion => { demote_previous_champion(&tx, request.frontier_id)?; @@ -887,6 +1348,375 @@ impl ProjectStore { } } +fn upgrade_store(connection: &mut Connection) -> Result<(), StoreError> { + migrate(connection)?; + backfill_prose_summaries(connection)?; + let tx = connection.transaction()?; + let _ = normalize_metric_plane_tx(&tx)?; + tx.commit()?; + Ok(()) +} + +fn validate_prose_node_request(request: &CreateNodeRequest) -> Result<(), StoreError> { + if !matches!(request.class, NodeClass::Note | NodeClass::Research) { + return Ok(()); + } + if request.summary.is_none() { + return Err(StoreError::ProseSummaryRequired(request.class)); + } + match request.payload.field("body") { + Some(Value::String(body)) if !body.trim().is_empty() => Ok(()), + _ => Err(StoreError::ProseBodyRequired(request.class)), + } +} + +#[derive(Clone, Debug)] +struct MetricSample { + key: NonEmptyText, + source: MetricFieldSource, + value: f64, + frontier_id: fidget_spinner_core::FrontierId, + experiment_id: fidget_spinner_core::ExperimentId, + change_node_id: fidget_spinner_core::NodeId, + change_title: NonEmptyText, + run_id: fidget_spinner_core::RunId, + verdict: FrontierVerdict, + candidate_checkpoint_id: fidget_spinner_core::CheckpointId, + candidate_commit_hash: GitCommitHash, + unit: Option<MetricUnit>, + objective: Option<OptimizationObjective>, + dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, +} + +impl MetricSample { + fn into_entry(self, order: MetricRankOrder) -> MetricBestEntry { + MetricBestEntry { + key: self.key, + source: self.source, + value: self.value, + order, + experiment_id: self.experiment_id, + frontier_id: self.frontier_id, + change_node_id: self.change_node_id, + change_title: self.change_title, + run_id: self.run_id, + verdict: self.verdict, + candidate_checkpoint_id: self.candidate_checkpoint_id, + candidate_commit_hash: self.candidate_commit_hash, + unit: self.unit, + objective: self.objective, + dimensions: self.dimensions, + } + } +} + +#[derive(Clone, Debug)] +struct MetricKeyAccumulator { + key: NonEmptyText, + source: MetricFieldSource, + experiment_ids: BTreeSet<fidget_spinner_core::ExperimentId>, + unit: Option<MetricUnit>, + objective: Option<OptimizationObjective>, + ambiguous_semantics: bool, +} + +impl MetricKeyAccumulator { + fn from_sample(sample: &MetricSample) -> Self { + Self { + key: sample.key.clone(), + source: sample.source, + experiment_ids: BTreeSet::from([sample.experiment_id]), + unit: sample.unit, + objective: sample.objective, + ambiguous_semantics: false, + } + } + + fn observe(&mut self, sample: &MetricSample) { + let _ = self.experiment_ids.insert(sample.experiment_id); + if self.unit != sample.unit || self.objective != sample.objective { + self.ambiguous_semantics = true; + self.unit = None; + self.objective = None; + } + } + + fn finish(self) -> MetricKeySummary { + MetricKeySummary { + key: self.key, + source: self.source, + experiment_count: self.experiment_ids.len() as u64, + unit: self.unit, + objective: self.objective, + description: None, + requires_order: self.source != MetricFieldSource::RunMetric + || self.ambiguous_semantics + || !matches!( + self.objective, + Some(OptimizationObjective::Minimize | OptimizationObjective::Maximize) + ), + } + } +} + +fn collect_metric_samples( + store: &ProjectStore, + query: &MetricKeyQuery, +) -> Result<Vec<MetricSample>, StoreError> { + let rows = load_experiment_rows(store)?; + let metric_definitions = metric_definitions_by_key(store)?; + let mut samples = Vec::new(); + for row in rows { + if query + .frontier_id + .is_some_and(|frontier_id| row.frontier_id != frontier_id) + { + continue; + } + if !dimensions_match(&row.dimensions, &query.dimensions) { + continue; + } + samples.extend(metric_samples_for_row( + store.schema(), + &row, + &metric_definitions, + )); + } + Ok(if let Some(source) = query.source { + samples + .into_iter() + .filter(|sample| sample.source == source) + .collect() + } else { + samples + }) +} + +fn resolve_metric_order( + matching: &[MetricSample], + query: &MetricBestQuery, + source: MetricFieldSource, +) -> Result<MetricRankOrder, StoreError> { + if let Some(order) = query.order { + return Ok(order); + } + if source != MetricFieldSource::RunMetric { + return Err(StoreError::MetricOrderRequired { + key: query.key.as_str().to_owned(), + metric_source: source.as_str().to_owned(), + }); + } + let objectives = matching + .iter() + .map(|sample| sample.objective) + .collect::<BTreeSet<_>>(); + match objectives.len() { + 1 => match objectives.into_iter().next().flatten() { + Some(OptimizationObjective::Minimize) => Ok(MetricRankOrder::Asc), + Some(OptimizationObjective::Maximize) => Ok(MetricRankOrder::Desc), + Some(OptimizationObjective::Target) | None => Err(StoreError::MetricOrderRequired { + key: query.key.as_str().to_owned(), + metric_source: source.as_str().to_owned(), + }), + }, + _ => Err(StoreError::MetricSemanticsAmbiguous { + key: query.key.as_str().to_owned(), + metric_source: source.as_str().to_owned(), + }), + } +} + +fn compare_metric_samples( + left: &MetricSample, + right: &MetricSample, + order: MetricRankOrder, +) -> Ordering { + let metric_order = match order { + MetricRankOrder::Asc => left + .value + .partial_cmp(&right.value) + .unwrap_or(Ordering::Equal), + MetricRankOrder::Desc => right + .value + .partial_cmp(&left.value) + .unwrap_or(Ordering::Equal), + }; + metric_order + .then_with(|| right.experiment_id.cmp(&left.experiment_id)) + .then_with(|| left.key.cmp(&right.key)) +} + +#[derive(Clone, Debug)] +struct ExperimentMetricRow { + experiment_id: fidget_spinner_core::ExperimentId, + frontier_id: fidget_spinner_core::FrontierId, + run_id: fidget_spinner_core::RunId, + verdict: FrontierVerdict, + candidate_checkpoint: CheckpointRecord, + change_node: DagNode, + run_node: DagNode, + analysis_node: Option<DagNode>, + decision_node: DagNode, + primary_metric: MetricValue, + supporting_metrics: Vec<MetricValue>, + dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, +} + +fn load_experiment_rows(store: &ProjectStore) -> Result<Vec<ExperimentMetricRow>, StoreError> { + let run_dimensions = load_run_dimensions_by_run_id(store)?; + let mut statement = store.connection.prepare( + "SELECT + id, + frontier_id, + run_id, + change_node_id, + run_node_id, + analysis_node_id, + decision_node_id, + candidate_checkpoint_id, + primary_metric_json, + supporting_metrics_json, + verdict + FROM experiments", + )?; + let mut rows = statement.query([])?; + let mut items = Vec::new(); + while let Some(row) = rows.next()? { + let change_node_id = parse_node_id(&row.get::<_, String>(3)?)?; + let run_id = parse_run_id(&row.get::<_, String>(2)?)?; + let run_node_id = parse_node_id(&row.get::<_, String>(4)?)?; + let analysis_node_id = row + .get::<_, Option<String>>(5)? + .map(|raw| parse_node_id(&raw)) + .transpose()?; + let decision_node_id = parse_node_id(&row.get::<_, String>(6)?)?; + let candidate_checkpoint_id = parse_checkpoint_id(&row.get::<_, String>(7)?)?; + items.push(ExperimentMetricRow { + experiment_id: parse_experiment_id(&row.get::<_, String>(0)?)?, + frontier_id: parse_frontier_id(&row.get::<_, String>(1)?)?, + run_id, + verdict: parse_frontier_verdict(&row.get::<_, String>(10)?)?, + candidate_checkpoint: store + .load_checkpoint(candidate_checkpoint_id)? + .ok_or(StoreError::CheckpointNotFound(candidate_checkpoint_id))?, + change_node: store + .get_node(change_node_id)? + .ok_or(StoreError::NodeNotFound(change_node_id))?, + run_node: store + .get_node(run_node_id)? + .ok_or(StoreError::NodeNotFound(run_node_id))?, + analysis_node: analysis_node_id + .map(|node_id| { + store + .get_node(node_id)? + .ok_or(StoreError::NodeNotFound(node_id)) + }) + .transpose()?, + decision_node: store + .get_node(decision_node_id)? + .ok_or(StoreError::NodeNotFound(decision_node_id))?, + primary_metric: decode_json(&row.get::<_, String>(8)?)?, + supporting_metrics: decode_json(&row.get::<_, String>(9)?)?, + dimensions: run_dimensions.get(&run_id).cloned().unwrap_or_default(), + }); + } + Ok(items) +} + +fn metric_samples_for_row( + schema: &ProjectSchema, + row: &ExperimentMetricRow, + metric_definitions: &BTreeMap<String, MetricDefinition>, +) -> Vec<MetricSample> { + let mut samples = vec![metric_sample_from_observation( + row, + &row.primary_metric, + metric_definitions, + MetricFieldSource::RunMetric, + )]; + samples.extend(row.supporting_metrics.iter().map(|metric| { + metric_sample_from_observation( + row, + metric, + metric_definitions, + MetricFieldSource::RunMetric, + ) + })); + samples.extend(metric_samples_from_payload(schema, row, &row.change_node)); + samples.extend(metric_samples_from_payload(schema, row, &row.run_node)); + if let Some(node) = row.analysis_node.as_ref() { + samples.extend(metric_samples_from_payload(schema, row, node)); + } + samples.extend(metric_samples_from_payload(schema, row, &row.decision_node)); + samples +} + +fn metric_sample_from_observation( + row: &ExperimentMetricRow, + metric: &MetricValue, + metric_definitions: &BTreeMap<String, MetricDefinition>, + source: MetricFieldSource, +) -> MetricSample { + let registry = metric_definitions.get(metric.key.as_str()); + MetricSample { + key: metric.key.clone(), + source, + value: metric.value, + frontier_id: row.frontier_id, + experiment_id: row.experiment_id, + change_node_id: row.change_node.id, + change_title: row.change_node.title.clone(), + run_id: row.run_id, + verdict: row.verdict, + candidate_checkpoint_id: row.candidate_checkpoint.id, + candidate_commit_hash: row.candidate_checkpoint.snapshot.commit_hash.clone(), + unit: registry.map(|definition| definition.unit), + objective: registry.map(|definition| definition.objective), + dimensions: row.dimensions.clone(), + } +} + +fn metric_samples_from_payload( + schema: &ProjectSchema, + row: &ExperimentMetricRow, + node: &DagNode, +) -> Vec<MetricSample> { + let Some(source) = MetricFieldSource::from_payload_class(node.class) else { + return Vec::new(); + }; + node.payload + .fields + .iter() + .filter_map(|(key, value)| { + let value = value.as_f64()?; + let spec = schema.field_spec(node.class, key); + if spec.is_some_and(|field| { + field + .value_type + .is_some_and(|kind| kind != FieldValueType::Numeric) + }) { + return None; + } + Some(MetricSample { + key: NonEmptyText::new(key.clone()).ok()?, + source, + value, + frontier_id: row.frontier_id, + experiment_id: row.experiment_id, + change_node_id: row.change_node.id, + change_title: row.change_node.title.clone(), + run_id: row.run_id, + verdict: row.verdict, + candidate_checkpoint_id: row.candidate_checkpoint.id, + candidate_commit_hash: row.candidate_checkpoint.snapshot.commit_hash.clone(), + unit: None, + objective: None, + dimensions: row.dimensions.clone(), + }) + }) + .collect() +} + fn migrate(connection: &Connection) -> Result<(), StoreError> { connection.execute_batch( " @@ -986,6 +1816,32 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> { value REAL NOT NULL ); + CREATE TABLE IF NOT EXISTS metric_definitions ( + metric_key TEXT PRIMARY KEY, + unit TEXT NOT NULL, + objective TEXT NOT NULL, + description TEXT, + created_at TEXT NOT NULL + ); + + CREATE TABLE IF NOT EXISTS run_dimension_definitions ( + dimension_key TEXT PRIMARY KEY, + value_type TEXT NOT NULL, + description TEXT, + created_at TEXT NOT NULL + ); + + CREATE TABLE IF NOT EXISTS run_dimensions ( + run_id TEXT NOT NULL REFERENCES runs(run_id) ON DELETE CASCADE, + dimension_key TEXT NOT NULL REFERENCES run_dimension_definitions(dimension_key) ON DELETE RESTRICT, + value_type TEXT NOT NULL, + value_text TEXT, + value_numeric REAL, + value_boolean INTEGER, + value_timestamp TEXT, + PRIMARY KEY (run_id, dimension_key) + ); + CREATE TABLE IF NOT EXISTS experiments ( id TEXT PRIMARY KEY, frontier_id TEXT NOT NULL REFERENCES frontiers(id) ON DELETE CASCADE, @@ -1005,6 +1861,12 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> { created_at TEXT NOT NULL ); + CREATE INDEX IF NOT EXISTS metrics_by_key ON metrics(metric_key); + CREATE INDEX IF NOT EXISTS run_dimensions_by_key_text ON run_dimensions(dimension_key, value_text); + CREATE INDEX IF NOT EXISTS run_dimensions_by_key_numeric ON run_dimensions(dimension_key, value_numeric); + CREATE INDEX IF NOT EXISTS run_dimensions_by_run ON run_dimensions(run_id, dimension_key); + CREATE INDEX IF NOT EXISTS experiments_by_frontier ON experiments(frontier_id, created_at DESC); + CREATE TABLE IF NOT EXISTS events ( id INTEGER PRIMARY KEY AUTOINCREMENT, entity_kind TEXT NOT NULL, @@ -1018,6 +1880,672 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> { Ok(()) } +fn backfill_prose_summaries(connection: &Connection) -> Result<(), StoreError> { + let mut statement = connection.prepare( + "SELECT id, payload_json + FROM nodes + WHERE class IN ('note', 'research') + AND (summary IS NULL OR trim(summary) = '')", + )?; + let mut rows = statement.query([])?; + let mut updates = Vec::new(); + while let Some(row) = rows.next()? { + let node_id = row.get::<_, String>(0)?; + let payload = decode_json::<NodePayload>(&row.get::<_, String>(1)?)?; + let Some(Value::String(body)) = payload.field("body") else { + continue; + }; + let Some(summary) = derive_summary_from_body(body) else { + continue; + }; + updates.push((node_id, summary)); + } + for (node_id, summary) in updates { + let _ = connection.execute( + "UPDATE nodes SET summary = ?1 WHERE id = ?2", + params![summary.as_str(), node_id], + )?; + } + Ok(()) +} + +fn sort_schema_fields(fields: &mut [ProjectFieldSpec]) { + fields.sort_by(|left, right| { + left.name + .cmp(&right.name) + .then_with(|| left.node_classes.iter().cmp(right.node_classes.iter())) + }); +} + +fn normalize_metric_plane_tx( + tx: &Transaction<'_>, +) -> Result<MetricPlaneMigrationReport, StoreError> { + let mut report = MetricPlaneMigrationReport::default(); + + if insert_run_dimension_definition_tx( + tx, + &RunDimensionDefinition::new( + NonEmptyText::new("benchmark_suite")?, + FieldValueType::String, + Some(NonEmptyText::new("Legacy coarse benchmark label")?), + ), + )? { + report.inserted_dimension_definitions += 1; + } + + { + let mut statement = tx.prepare( + "SELECT DISTINCT metric_key, unit, objective + FROM metrics + ORDER BY metric_key ASC", + )?; + let mut rows = statement.query([])?; + while let Some(row) = rows.next()? { + let definition = MetricDefinition::new( + NonEmptyText::new(row.get::<_, String>(0)?)?, + decode_metric_unit(&row.get::<_, String>(1)?)?, + decode_optimization_objective(&row.get::<_, String>(2)?)?, + None, + ); + if upsert_metric_definition_tx(tx, &definition)? { + report.inserted_metric_definitions += 1; + } + } + } + + { + let mut statement = tx.prepare( + "SELECT payload_json + FROM nodes + WHERE class = 'contract'", + )?; + let mut rows = statement.query([])?; + while let Some(row) = rows.next()? { + let payload = decode_json::<NodePayload>(&row.get::<_, String>(0)?)?; + for definition in contract_metric_definitions(&payload)? { + if upsert_metric_definition_tx(tx, &definition)? { + report.inserted_metric_definitions += 1; + } + } + } + } + + { + let mut statement = tx.prepare( + "SELECT run_id, benchmark_suite + FROM runs + WHERE benchmark_suite IS NOT NULL + AND trim(benchmark_suite) != ''", + )?; + let mut rows = statement.query([])?; + while let Some(row) = rows.next()? { + let run_id = parse_run_id(&row.get::<_, String>(0)?)?; + let value = RunDimensionValue::String(NonEmptyText::new(row.get::<_, String>(1)?)?); + if insert_run_dimension_value_tx( + tx, + run_id, + &NonEmptyText::new("benchmark_suite")?, + &value, + )? { + report.inserted_dimension_values += 1; + } + } + } + + Ok(report) +} + +fn contract_metric_definitions(payload: &NodePayload) -> Result<Vec<MetricDefinition>, StoreError> { + let mut definitions = Vec::new(); + if let Some(primary) = payload.field("primary_metric") { + definitions.push(metric_definition_from_json(primary, None)?); + } + if let Some(Value::Array(items)) = payload.field("supporting_metrics") { + for item in items { + definitions.push(metric_definition_from_json(item, None)?); + } + } + Ok(definitions) +} + +fn metric_definition_from_json( + value: &Value, + description: Option<NonEmptyText>, +) -> Result<MetricDefinition, StoreError> { + let Some(object) = value.as_object() else { + return Err(StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "metric definition payload must be an object", + )))); + }; + let key = object + .get("metric_key") + .or_else(|| object.get("key")) + .and_then(Value::as_str) + .ok_or_else(|| { + StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "metric definition missing key", + ))) + })?; + let unit = object.get("unit").and_then(Value::as_str).ok_or_else(|| { + StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "metric definition missing unit", + ))) + })?; + let objective = object + .get("objective") + .and_then(Value::as_str) + .ok_or_else(|| { + StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "metric definition missing objective", + ))) + })?; + Ok(MetricDefinition::new( + NonEmptyText::new(key)?, + decode_metric_unit(unit)?, + decode_optimization_objective(objective)?, + description, + )) +} + +fn upsert_metric_definition_tx( + tx: &Transaction<'_>, + definition: &MetricDefinition, +) -> Result<bool, StoreError> { + let existing = tx + .query_row( + "SELECT unit, objective, description + FROM metric_definitions + WHERE metric_key = ?1", + params![definition.key.as_str()], + |row| { + Ok(( + row.get::<_, String>(0)?, + row.get::<_, String>(1)?, + row.get::<_, Option<String>>(2)?, + )) + }, + ) + .optional()?; + if let Some((existing_unit, existing_objective, existing_description)) = existing { + let new_unit = encode_metric_unit(definition.unit).to_owned(); + let new_objective = encode_optimization_objective(definition.objective).to_owned(); + if existing_unit != new_unit || existing_objective != new_objective { + return Err(StoreError::ConflictingMetricDefinition { + key: definition.key.as_str().to_owned(), + existing_unit, + existing_objective, + new_unit, + new_objective, + }); + } + if existing_description.is_none() && definition.description.is_some() { + let _ = tx.execute( + "UPDATE metric_definitions SET description = ?2 WHERE metric_key = ?1", + params![ + definition.key.as_str(), + definition.description.as_ref().map(NonEmptyText::as_str) + ], + )?; + } + Ok(false) + } else { + let _ = tx.execute( + "INSERT INTO metric_definitions (metric_key, unit, objective, description, created_at) + VALUES (?1, ?2, ?3, ?4, ?5)", + params![ + definition.key.as_str(), + encode_metric_unit(definition.unit), + encode_optimization_objective(definition.objective), + definition.description.as_ref().map(NonEmptyText::as_str), + encode_timestamp(definition.created_at)?, + ], + )?; + Ok(true) + } +} + +fn insert_run_dimension_definition_tx( + tx: &Transaction<'_>, + definition: &RunDimensionDefinition, +) -> Result<bool, StoreError> { + let existing = tx + .query_row( + "SELECT value_type, description + FROM run_dimension_definitions + WHERE dimension_key = ?1", + params![definition.key.as_str()], + |row| Ok((row.get::<_, String>(0)?, row.get::<_, Option<String>>(1)?)), + ) + .optional()?; + if let Some((existing_type, existing_description)) = existing { + let new_type = encode_field_value_type(definition.value_type).to_owned(); + if existing_type != new_type { + return Err(StoreError::ConflictingRunDimensionDefinition { + key: definition.key.as_str().to_owned(), + existing_type, + new_type, + }); + } + if existing_description.is_none() && definition.description.is_some() { + let _ = tx.execute( + "UPDATE run_dimension_definitions SET description = ?2 WHERE dimension_key = ?1", + params![ + definition.key.as_str(), + definition.description.as_ref().map(NonEmptyText::as_str) + ], + )?; + } + Ok(false) + } else { + let _ = tx.execute( + "INSERT INTO run_dimension_definitions (dimension_key, value_type, description, created_at) + VALUES (?1, ?2, ?3, ?4)", + params![ + definition.key.as_str(), + encode_field_value_type(definition.value_type), + definition.description.as_ref().map(NonEmptyText::as_str), + encode_timestamp(definition.created_at)?, + ], + )?; + Ok(true) + } +} + +fn load_metric_definition_tx( + tx: &Transaction<'_>, + key: &NonEmptyText, +) -> Result<Option<MetricDefinition>, StoreError> { + tx.query_row( + "SELECT metric_key, unit, objective, description, created_at + FROM metric_definitions + WHERE metric_key = ?1", + params![key.as_str()], + |row| { + Ok(MetricDefinition { + key: NonEmptyText::new(row.get::<_, String>(0)?) + .map_err(core_to_sql_conversion_error)?, + unit: decode_metric_unit(&row.get::<_, String>(1)?) + .map_err(to_sql_conversion_error)?, + objective: decode_optimization_objective(&row.get::<_, String>(2)?) + .map_err(to_sql_conversion_error)?, + description: row + .get::<_, Option<String>>(3)? + .map(NonEmptyText::new) + .transpose() + .map_err(core_to_sql_conversion_error)?, + created_at: decode_timestamp(&row.get::<_, String>(4)?) + .map_err(to_sql_conversion_error)?, + }) + }, + ) + .optional() + .map_err(StoreError::from) +} + +fn metric_definitions_by_key( + store: &ProjectStore, +) -> Result<BTreeMap<String, MetricDefinition>, StoreError> { + Ok(store + .list_metric_definitions()? + .into_iter() + .map(|definition| (definition.key.as_str().to_owned(), definition)) + .collect()) +} + +fn run_dimension_definitions_by_key( + store: &ProjectStore, +) -> Result<BTreeMap<String, RunDimensionDefinition>, StoreError> { + let mut statement = store.connection.prepare( + "SELECT dimension_key, value_type, description, created_at + FROM run_dimension_definitions", + )?; + let mut rows = statement.query([])?; + let mut items = BTreeMap::new(); + while let Some(row) = rows.next()? { + let definition = RunDimensionDefinition { + key: NonEmptyText::new(row.get::<_, String>(0)?)?, + value_type: decode_field_value_type(&row.get::<_, String>(1)?)?, + description: row + .get::<_, Option<String>>(2)? + .map(NonEmptyText::new) + .transpose()?, + created_at: decode_timestamp(&row.get::<_, String>(3)?)?, + }; + let _ = items.insert(definition.key.as_str().to_owned(), definition); + } + Ok(items) +} + +fn coerce_run_dimension_map( + definitions: &BTreeMap<String, RunDimensionDefinition>, + raw_dimensions: BTreeMap<String, Value>, +) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, StoreError> { + let mut dimensions = BTreeMap::new(); + for (raw_key, raw_value) in raw_dimensions { + let key = NonEmptyText::new(raw_key)?; + let Some(definition) = definitions.get(key.as_str()) else { + return Err(StoreError::UnknownRunDimension(key)); + }; + let value = coerce_run_dimension_value(definition, raw_value)?; + let _ = dimensions.insert(key, value); + } + Ok(dimensions) +} + +fn coerce_run_dimension_value( + definition: &RunDimensionDefinition, + raw_value: Value, +) -> Result<RunDimensionValue, StoreError> { + match definition.value_type { + FieldValueType::String => match raw_value { + Value::String(value) => Ok(RunDimensionValue::String(NonEmptyText::new(value)?)), + other => Err(StoreError::InvalidRunDimensionValue { + key: definition.key.as_str().to_owned(), + expected: definition.value_type.as_str().to_owned(), + observed: value_kind_name(&other).to_owned(), + }), + }, + FieldValueType::Numeric => match raw_value.as_f64() { + Some(value) => Ok(RunDimensionValue::Numeric(value)), + None => Err(StoreError::InvalidRunDimensionValue { + key: definition.key.as_str().to_owned(), + expected: definition.value_type.as_str().to_owned(), + observed: value_kind_name(&raw_value).to_owned(), + }), + }, + FieldValueType::Boolean => match raw_value { + Value::Bool(value) => Ok(RunDimensionValue::Boolean(value)), + other => Err(StoreError::InvalidRunDimensionValue { + key: definition.key.as_str().to_owned(), + expected: definition.value_type.as_str().to_owned(), + observed: value_kind_name(&other).to_owned(), + }), + }, + FieldValueType::Timestamp => match raw_value { + Value::String(value) => { + let _ = OffsetDateTime::parse(&value, &Rfc3339)?; + Ok(RunDimensionValue::Timestamp(NonEmptyText::new(value)?)) + } + other => Err(StoreError::InvalidRunDimensionValue { + key: definition.key.as_str().to_owned(), + expected: definition.value_type.as_str().to_owned(), + observed: value_kind_name(&other).to_owned(), + }), + }, + } +} + +fn insert_run_dimension_value_tx( + tx: &Transaction<'_>, + run_id: fidget_spinner_core::RunId, + key: &NonEmptyText, + value: &RunDimensionValue, +) -> Result<bool, StoreError> { + let (value_text, value_numeric, value_boolean, value_timestamp) = + encode_run_dimension_columns(value)?; + let changed = tx.execute( + "INSERT OR IGNORE INTO run_dimensions ( + run_id, + dimension_key, + value_type, + value_text, + value_numeric, + value_boolean, + value_timestamp + ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)", + params![ + run_id.to_string(), + key.as_str(), + encode_field_value_type(value.value_type()), + value_text, + value_numeric, + value_boolean, + value_timestamp, + ], + )?; + Ok(changed > 0) +} + +fn insert_run_dimensions( + tx: &Transaction<'_>, + run_id: fidget_spinner_core::RunId, + dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>, +) -> Result<(), StoreError> { + for (key, value) in dimensions { + let _ = insert_run_dimension_value_tx(tx, run_id, key, value)?; + } + Ok(()) +} + +fn validate_run_dimensions_tx( + tx: &Transaction<'_>, + dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>, +) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, StoreError> { + for (key, value) in dimensions { + let Some(expected_type) = tx + .query_row( + "SELECT value_type + FROM run_dimension_definitions + WHERE dimension_key = ?1", + params![key.as_str()], + |row| row.get::<_, String>(0), + ) + .optional()? + else { + return Err(StoreError::UnknownRunDimension(key.clone())); + }; + let expected_type = decode_field_value_type(&expected_type)?; + let observed_type = value.value_type(); + if expected_type != observed_type { + return Err(StoreError::InvalidRunDimensionValue { + key: key.as_str().to_owned(), + expected: expected_type.as_str().to_owned(), + observed: observed_type.as_str().to_owned(), + }); + } + if matches!(value, RunDimensionValue::Timestamp(raw) if OffsetDateTime::parse(raw.as_str(), &Rfc3339).is_err()) + { + return Err(StoreError::InvalidRunDimensionValue { + key: key.as_str().to_owned(), + expected: FieldValueType::Timestamp.as_str().to_owned(), + observed: "string".to_owned(), + }); + } + } + Ok(dimensions.clone()) +} + +fn load_run_dimensions_by_run_id( + store: &ProjectStore, +) -> Result< + BTreeMap<fidget_spinner_core::RunId, BTreeMap<NonEmptyText, RunDimensionValue>>, + StoreError, +> { + let mut statement = store.connection.prepare( + "SELECT run_id, dimension_key, value_type, value_text, value_numeric, value_boolean, value_timestamp + FROM run_dimensions + ORDER BY dimension_key ASC", + )?; + let mut rows = statement.query([])?; + let mut values = + BTreeMap::<fidget_spinner_core::RunId, BTreeMap<NonEmptyText, RunDimensionValue>>::new(); + while let Some(row) = rows.next()? { + let run_id = parse_run_id(&row.get::<_, String>(0)?)?; + let key = NonEmptyText::new(row.get::<_, String>(1)?)?; + let value_type = decode_field_value_type(&row.get::<_, String>(2)?)?; + let value = decode_run_dimension_value( + value_type, + row.get::<_, Option<String>>(3)?, + row.get::<_, Option<f64>>(4)?, + row.get::<_, Option<i64>>(5)?, + row.get::<_, Option<String>>(6)?, + )?; + let _ = values.entry(run_id).or_default().insert(key, value); + } + Ok(values) +} + +fn load_run_dimension_summaries( + store: &ProjectStore, +) -> Result<Vec<RunDimensionSummary>, StoreError> { + let definitions = { + let mut statement = store.connection.prepare( + "SELECT dimension_key, value_type, description, created_at + FROM run_dimension_definitions + ORDER BY dimension_key ASC", + )?; + let mut rows = statement.query([])?; + let mut items = Vec::new(); + while let Some(row) = rows.next()? { + items.push(RunDimensionDefinition { + key: NonEmptyText::new(row.get::<_, String>(0)?)?, + value_type: decode_field_value_type(&row.get::<_, String>(1)?)?, + description: row + .get::<_, Option<String>>(2)? + .map(NonEmptyText::new) + .transpose()?, + created_at: decode_timestamp(&row.get::<_, String>(3)?)?, + }); + } + items + }; + + let mut summaries = Vec::new(); + for definition in definitions { + let mut statement = store.connection.prepare( + "SELECT value_text, value_numeric, value_boolean, value_timestamp + FROM run_dimensions + WHERE dimension_key = ?1", + )?; + let mut rows = statement.query(params![definition.key.as_str()])?; + let mut observed_run_count = 0_u64; + let mut distinct = BTreeSet::new(); + let mut sample_values = Vec::new(); + while let Some(row) = rows.next()? { + observed_run_count += 1; + let value = decode_run_dimension_value( + definition.value_type, + row.get::<_, Option<String>>(0)?, + row.get::<_, Option<f64>>(1)?, + row.get::<_, Option<i64>>(2)?, + row.get::<_, Option<String>>(3)?, + )?; + let serialized = encode_json(&value.as_json())?; + if distinct.insert(serialized) && sample_values.len() < 5 { + sample_values.push(value.as_json()); + } + } + summaries.push(RunDimensionSummary { + key: definition.key, + value_type: definition.value_type, + description: definition.description, + observed_run_count, + distinct_value_count: distinct.len() as u64, + sample_values, + }); + } + Ok(summaries) +} + +fn merge_registered_run_metric_summaries( + store: &ProjectStore, + summaries: &mut Vec<MetricKeySummary>, +) -> Result<(), StoreError> { + let definitions = store.list_metric_definitions()?; + for definition in definitions { + if let Some(summary) = summaries.iter_mut().find(|summary| { + summary.source == MetricFieldSource::RunMetric && summary.key == definition.key + }) { + summary.unit = Some(definition.unit); + summary.objective = Some(definition.objective); + summary.description.clone_from(&definition.description); + summary.requires_order = matches!(definition.objective, OptimizationObjective::Target); + continue; + } + summaries.push(MetricKeySummary { + key: definition.key, + source: MetricFieldSource::RunMetric, + experiment_count: 0, + unit: Some(definition.unit), + objective: Some(definition.objective), + description: definition.description, + requires_order: matches!(definition.objective, OptimizationObjective::Target), + }); + } + Ok(()) +} + +fn dimensions_match( + haystack: &BTreeMap<NonEmptyText, RunDimensionValue>, + needle: &BTreeMap<NonEmptyText, RunDimensionValue>, +) -> bool { + needle + .iter() + .all(|(key, value)| haystack.get(key) == Some(value)) +} + +fn run_dimensions_json(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> Value { + Value::Object( + dimensions + .iter() + .map(|(key, value)| (key.to_string(), value.as_json())) + .collect::<serde_json::Map<String, Value>>(), + ) +} + +fn benchmark_suite_label(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> Option<String> { + dimensions + .get(&NonEmptyText::new("benchmark_suite").ok()?) + .and_then(|value| match value { + RunDimensionValue::String(item) => Some(item.to_string()), + _ => None, + }) + .or_else(|| { + if dimensions.is_empty() { + None + } else { + Some( + dimensions + .iter() + .map(|(key, value)| format!("{key}={}", dimension_value_text(value))) + .collect::<Vec<_>>() + .join(", "), + ) + } + }) +} + +fn derive_summary_from_body(body: &str) -> Option<NonEmptyText> { + const MAX_SUMMARY_CHARS: usize = 240; + + let paragraph = body + .split("\n\n") + .map(collapse_inline_whitespace) + .map(|text| text.trim().to_owned()) + .find(|text| !text.is_empty())?; + let summary = truncate_chars(¶graph, MAX_SUMMARY_CHARS); + NonEmptyText::new(summary).ok() +} + +fn collapse_inline_whitespace(raw: &str) -> String { + raw.split_whitespace().collect::<Vec<_>>().join(" ") +} + +fn truncate_chars(value: &str, max_chars: usize) -> String { + if value.chars().count() <= max_chars { + return value.to_owned(); + } + let mut truncated = value.chars().take(max_chars).collect::<String>(); + if let Some(index) = truncated.rfind(char::is_whitespace) { + truncated.truncate(index); + } + format!("{}…", truncated.trim_end()) +} + fn insert_node(tx: &Transaction<'_>, node: &DagNode) -> Result<(), StoreError> { let schema_namespace = node .payload @@ -1210,8 +2738,11 @@ fn insert_checkpoint( fn insert_run( tx: &Transaction<'_>, run: &RunRecord, - primary_metric: &MetricObservation, - supporting_metrics: &[MetricObservation], + benchmark_suite: Option<&str>, + primary_metric: &MetricValue, + primary_metric_definition: &MetricDefinition, + supporting_metrics: &[MetricValue], + supporting_metric_definitions: &[MetricDefinition], ) -> Result<(), StoreError> { let (repo_root, worktree_root, worktree_name, head_commit, dirty_paths) = run .code_snapshot @@ -1273,7 +2804,7 @@ fn insert_run( worktree_name.map(|item| item.to_string()), head_commit.map(|item| item.to_string()), dirty_paths_json, - run.benchmark_suite.as_ref().map(NonEmptyText::as_str), + benchmark_suite, run.command.working_directory.as_str(), encode_json(&run.command.argv)?, encode_json(&run.command.env)?, @@ -1282,15 +2813,19 @@ fn insert_run( ], )?; - for metric in std::iter::once(primary_metric).chain(supporting_metrics.iter()) { + for (metric, definition) in std::iter::once((primary_metric, primary_metric_definition)).chain( + supporting_metrics + .iter() + .zip(supporting_metric_definitions.iter()), + ) { let _ = tx.execute( "INSERT INTO metrics (run_id, metric_key, unit, objective, value) VALUES (?1, ?2, ?3, ?4, ?5)", params![ run.run_id.to_string(), - metric.metric_key.as_str(), - encode_metric_unit(metric.unit), - encode_optimization_objective(metric.objective), + metric.key.as_str(), + encode_metric_unit(definition.unit), + encode_optimization_objective(definition.objective), metric.value, ], )?; @@ -1331,7 +2866,7 @@ fn insert_experiment( experiment.run_id.to_string(), experiment.analysis_node_id.map(|id| id.to_string()), experiment.decision_node_id.to_string(), - experiment.result.benchmark_suite.as_str(), + benchmark_suite_label(&experiment.result.dimensions), encode_json(&experiment.result.primary_metric)?, encode_json(&experiment.result.supporting_metrics)?, experiment.note.summary.as_str(), @@ -1623,6 +3158,16 @@ fn parse_checkpoint_id(raw: &str) -> Result<fidget_spinner_core::CheckpointId, S )?)) } +fn parse_experiment_id(raw: &str) -> Result<fidget_spinner_core::ExperimentId, StoreError> { + Ok(fidget_spinner_core::ExperimentId::from_uuid(parse_uuid( + raw, + )?)) +} + +fn parse_run_id(raw: &str) -> Result<fidget_spinner_core::RunId, StoreError> { + Ok(fidget_spinner_core::RunId::from_uuid(parse_uuid(raw)?)) +} + fn parse_agent_session_id(raw: &str) -> Result<fidget_spinner_core::AgentSessionId, StoreError> { Ok(fidget_spinner_core::AgentSessionId::from_uuid(parse_uuid( raw, @@ -1762,6 +3307,23 @@ fn encode_backend(backend: ExecutionBackend) -> &'static str { } } +fn encode_field_value_type(value_type: FieldValueType) -> &'static str { + value_type.as_str() +} + +fn decode_field_value_type(raw: &str) -> Result<FieldValueType, StoreError> { + match raw { + "string" => Ok(FieldValueType::String), + "numeric" => Ok(FieldValueType::Numeric), + "boolean" => Ok(FieldValueType::Boolean), + "timestamp" => Ok(FieldValueType::Timestamp), + other => Err(StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + format!("unknown field value type `{other}`"), + )))), + } +} + fn encode_metric_unit(unit: MetricUnit) -> &'static str { match unit { MetricUnit::Seconds => "seconds", @@ -1772,6 +3334,20 @@ fn encode_metric_unit(unit: MetricUnit) -> &'static str { } } +fn decode_metric_unit(raw: &str) -> Result<MetricUnit, StoreError> { + match raw { + "seconds" => Ok(MetricUnit::Seconds), + "bytes" => Ok(MetricUnit::Bytes), + "count" => Ok(MetricUnit::Count), + "ratio" => Ok(MetricUnit::Ratio), + "custom" => Ok(MetricUnit::Custom), + other => Err(StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + format!("unknown metric unit `{other}`"), + )))), + } +} + fn encode_optimization_objective(objective: OptimizationObjective) -> &'static str { match objective { OptimizationObjective::Minimize => "minimize", @@ -1780,6 +3356,18 @@ fn encode_optimization_objective(objective: OptimizationObjective) -> &'static s } } +fn decode_optimization_objective(raw: &str) -> Result<OptimizationObjective, StoreError> { + match raw { + "minimize" => Ok(OptimizationObjective::Minimize), + "maximize" => Ok(OptimizationObjective::Maximize), + "target" => Ok(OptimizationObjective::Target), + other => Err(StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + format!("unknown optimization objective `{other}`"), + )))), + } +} + fn encode_frontier_verdict(verdict: FrontierVerdict) -> &'static str { match verdict { FrontierVerdict::PromoteToChampion => "promote-to-champion", @@ -1790,18 +3378,117 @@ fn encode_frontier_verdict(verdict: FrontierVerdict) -> &'static str { } } +fn parse_frontier_verdict(raw: &str) -> Result<FrontierVerdict, StoreError> { + match raw { + "promote-to-champion" => Ok(FrontierVerdict::PromoteToChampion), + "keep-on-frontier" => Ok(FrontierVerdict::KeepOnFrontier), + "revert-to-champion" => Ok(FrontierVerdict::RevertToChampion), + "archive-dead-end" => Ok(FrontierVerdict::ArchiveDeadEnd), + "needs-more-evidence" => Ok(FrontierVerdict::NeedsMoreEvidence), + other => Err(StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + format!("unknown frontier verdict `{other}`"), + )))), + } +} + +type RunDimensionColumns = (Option<String>, Option<f64>, Option<i64>, Option<String>); + +fn encode_run_dimension_columns( + value: &RunDimensionValue, +) -> Result<RunDimensionColumns, StoreError> { + match value { + RunDimensionValue::String(item) => Ok((Some(item.to_string()), None, None, None)), + RunDimensionValue::Numeric(item) => Ok((None, Some(*item), None, None)), + RunDimensionValue::Boolean(item) => Ok((None, None, Some(i64::from(*item)), None)), + RunDimensionValue::Timestamp(item) => { + let _ = OffsetDateTime::parse(item.as_str(), &Rfc3339)?; + Ok((None, None, None, Some(item.to_string()))) + } + } +} + +fn decode_run_dimension_value( + value_type: FieldValueType, + value_text: Option<String>, + value_numeric: Option<f64>, + value_boolean: Option<i64>, + value_timestamp: Option<String>, +) -> Result<RunDimensionValue, StoreError> { + match value_type { + FieldValueType::String => Ok(RunDimensionValue::String(NonEmptyText::new( + value_text.ok_or_else(|| { + StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "missing string dimension value", + ))) + })?, + )?)), + FieldValueType::Numeric => Ok(RunDimensionValue::Numeric(value_numeric.ok_or_else( + || { + StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "missing numeric dimension value", + ))) + }, + )?)), + FieldValueType::Boolean => Ok(RunDimensionValue::Boolean( + value_boolean.ok_or_else(|| { + StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "missing boolean dimension value", + ))) + })? != 0, + )), + FieldValueType::Timestamp => { + let value = value_timestamp.ok_or_else(|| { + StoreError::Json(serde_json::Error::io(io::Error::new( + io::ErrorKind::InvalidData, + "missing timestamp dimension value", + ))) + })?; + let _ = OffsetDateTime::parse(&value, &Rfc3339)?; + Ok(RunDimensionValue::Timestamp(NonEmptyText::new(value)?)) + } + } +} + +fn dimension_value_text(value: &RunDimensionValue) -> String { + match value { + RunDimensionValue::String(item) | RunDimensionValue::Timestamp(item) => item.to_string(), + RunDimensionValue::Numeric(item) => item.to_string(), + RunDimensionValue::Boolean(item) => item.to_string(), + } +} + +fn value_kind_name(value: &Value) -> &'static str { + match value { + Value::Null => "null", + Value::Bool(_) => "boolean", + Value::Number(_) => "numeric", + Value::String(_) => "string", + Value::Array(_) => "array", + Value::Object(_) => "object", + } +} + #[cfg(test)] mod tests { - use std::collections::BTreeSet; + use std::collections::{BTreeMap, BTreeSet}; use serde_json::json; use super::{ - CreateFrontierRequest, CreateNodeRequest, ListNodesQuery, PROJECT_SCHEMA_NAME, ProjectStore, + CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest, + DefineRunDimensionRequest, ListNodesQuery, MetricBestQuery, MetricFieldSource, + MetricKeyQuery, MetricRankOrder, PROJECT_SCHEMA_NAME, ProjectStore, + RemoveSchemaFieldRequest, UpsertSchemaFieldRequest, }; use fidget_spinner_core::{ - CheckpointSnapshotRef, EvaluationProtocol, FrontierContract, MetricSpec, MetricUnit, - NodeAnnotation, NodeClass, NodePayload, NonEmptyText, OptimizationObjective, TagName, + CheckpointSnapshotRef, CommandRecipe, DiagnosticSeverity, EvaluationProtocol, + FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote, FrontierVerdict, + GitCommitHash, InferencePolicy, MetricSpec, MetricUnit, MetricValue, NodeAnnotation, + NodeClass, NodePayload, NonEmptyText, OptimizationObjective, RunDimensionValue, TagName, }; fn temp_project_root(label: &str) -> camino::Utf8PathBuf { @@ -1893,7 +3580,7 @@ mod tests { repo_root: root.clone(), worktree_root: root, worktree_name: Some(NonEmptyText::new("main")?), - commit_hash: fidget_spinner_core::GitCommitHash::new("0123456789abcdef")?, + commit_hash: GitCommitHash::new("0123456789abcdef")?, }, }), })?; @@ -1914,7 +3601,7 @@ mod tests { class: NodeClass::Note, frontier_id: None, title: NonEmptyText::new("quick note")?, - summary: None, + summary: Some(NonEmptyText::new("quick note summary")?), tags: Some(BTreeSet::new()), payload: NodePayload::with_schema( store.schema().schema_ref(), @@ -1987,7 +3674,7 @@ mod tests { class: NodeClass::Note, frontier_id: None, title: NonEmptyText::new("quick note")?, - summary: None, + summary: Some(NonEmptyText::new("quick note summary")?), tags: None, payload: NodePayload::with_schema( store.schema().schema_ref(), @@ -2021,7 +3708,7 @@ mod tests { class: NodeClass::Note, frontier_id: None, title: NonEmptyText::new("tagged note")?, - summary: None, + summary: Some(NonEmptyText::new("tagged note summary")?), tags: Some(BTreeSet::from([cuts.name.clone(), heuristics.name.clone()])), payload: NodePayload::with_schema( store.schema().schema_ref(), @@ -2044,4 +3731,506 @@ mod tests { assert_eq!(filtered[0].tags.len(), 2); Ok(()) } + + #[test] + fn prose_nodes_require_summary_and_body() -> Result<(), super::StoreError> { + let root = temp_project_root("prose-summary"); + let mut store = ProjectStore::init( + &root, + NonEmptyText::new("test project")?, + NonEmptyText::new("local.test")?, + )?; + + let missing_summary = store.add_node(CreateNodeRequest { + class: NodeClass::Research, + frontier_id: None, + title: NonEmptyText::new("research note")?, + summary: None, + tags: None, + payload: NodePayload::with_schema( + store.schema().schema_ref(), + super::json_object(json!({"body": "research body"}))?, + ), + annotations: Vec::new(), + attachments: Vec::new(), + }); + assert!(matches!( + missing_summary, + Err(super::StoreError::ProseSummaryRequired(NodeClass::Research)) + )); + + let missing_body = store.add_node(CreateNodeRequest { + class: NodeClass::Note, + frontier_id: None, + title: NonEmptyText::new("quick note")?, + summary: Some(NonEmptyText::new("quick note summary")?), + tags: Some(BTreeSet::new()), + payload: NodePayload::with_schema(store.schema().schema_ref(), serde_json::Map::new()), + annotations: Vec::new(), + attachments: Vec::new(), + }); + assert!(matches!( + missing_body, + Err(super::StoreError::ProseBodyRequired(NodeClass::Note)) + )); + Ok(()) + } + + #[test] + fn opening_store_backfills_missing_prose_summaries() -> Result<(), super::StoreError> { + let root = temp_project_root("summary-backfill"); + let mut store = ProjectStore::init( + &root, + NonEmptyText::new("test project")?, + NonEmptyText::new("local.test")?, + )?; + let node = store.add_node(CreateNodeRequest { + class: NodeClass::Research, + frontier_id: None, + title: NonEmptyText::new("research note")?, + summary: Some(NonEmptyText::new("temporary summary")?), + tags: None, + payload: NodePayload::with_schema( + store.schema().schema_ref(), + super::json_object(json!({"body": "First paragraph.\n\nSecond paragraph."}))?, + ), + annotations: Vec::new(), + attachments: Vec::new(), + })?; + drop(store); + + let connection = rusqlite::Connection::open( + root.join(super::STORE_DIR_NAME) + .join(super::STATE_DB_NAME) + .as_std_path(), + )?; + let _ = connection.execute( + "UPDATE nodes SET summary = NULL WHERE id = ?1", + rusqlite::params![node.id.to_string()], + )?; + drop(connection); + + let reopened = ProjectStore::open(&root)?; + let loaded = reopened + .get_node(node.id)? + .ok_or(super::StoreError::NodeNotFound(node.id))?; + assert_eq!( + loaded.summary.as_ref().map(NonEmptyText::as_str), + Some("First paragraph.") + ); + Ok(()) + } + + #[test] + fn schema_field_upsert_remove_persists_and_bumps_version() -> Result<(), super::StoreError> { + let root = temp_project_root("schema-upsert-remove"); + let mut store = ProjectStore::init( + &root, + NonEmptyText::new("test project")?, + NonEmptyText::new("local.test")?, + )?; + let initial_version = store.schema().version; + + let field = store.upsert_schema_field(UpsertSchemaFieldRequest { + name: NonEmptyText::new("scenario")?, + node_classes: BTreeSet::from([NodeClass::Change, NodeClass::Analysis]), + presence: FieldPresence::Recommended, + severity: DiagnosticSeverity::Warning, + role: FieldRole::ProjectionGate, + inference_policy: InferencePolicy::ManualOnly, + value_type: Some(FieldValueType::String), + })?; + assert_eq!(field.name.as_str(), "scenario"); + assert_eq!(store.schema().version, initial_version + 1); + assert!( + store + .schema() + .fields + .iter() + .any(|item| item.name.as_str() == "scenario") + ); + drop(store); + + let mut reopened = ProjectStore::open(&root)?; + assert_eq!(reopened.schema().version, initial_version + 1); + assert!( + reopened + .schema() + .fields + .iter() + .any(|item| item.name.as_str() == "scenario") + ); + + let removed = reopened.remove_schema_field(RemoveSchemaFieldRequest { + name: NonEmptyText::new("scenario")?, + node_classes: Some(BTreeSet::from([NodeClass::Change, NodeClass::Analysis])), + })?; + assert_eq!(removed, 1); + assert_eq!(reopened.schema().version, initial_version + 2); + assert!( + !reopened + .schema() + .fields + .iter() + .any(|item| item.name.as_str() == "scenario") + ); + Ok(()) + } + + #[test] + fn metric_queries_surface_canonical_and_payload_numeric_fields() -> Result<(), super::StoreError> + { + let root = temp_project_root("metric-best"); + let mut store = ProjectStore::init( + &root, + NonEmptyText::new("test project")?, + NonEmptyText::new("local.test")?, + )?; + let projection = store.create_frontier(CreateFrontierRequest { + label: NonEmptyText::new("optimization frontier")?, + contract_title: NonEmptyText::new("contract root")?, + contract_summary: None, + contract: FrontierContract { + objective: NonEmptyText::new("improve wall time")?, + evaluation: EvaluationProtocol { + benchmark_suites: BTreeSet::from([NonEmptyText::new("smoke")?]), + primary_metric: MetricSpec { + metric_key: NonEmptyText::new("wall_clock_s")?, + unit: MetricUnit::Seconds, + objective: OptimizationObjective::Minimize, + }, + supporting_metrics: BTreeSet::new(), + }, + promotion_criteria: vec![NonEmptyText::new("strict speedup")?], + }, + initial_checkpoint: Some(super::CheckpointSeed { + summary: NonEmptyText::new("seed")?, + snapshot: checkpoint_snapshot(&root, "aaaaaaaaaaaaaaaa")?, + }), + })?; + let frontier_id = projection.frontier.id; + let base_checkpoint_id = projection + .champion_checkpoint_id + .ok_or_else(|| super::StoreError::MissingChampionCheckpoint { frontier_id })?; + let _ = store.define_metric(DefineMetricRequest { + key: NonEmptyText::new("wall_clock_s")?, + unit: MetricUnit::Seconds, + objective: OptimizationObjective::Minimize, + description: Some(NonEmptyText::new("elapsed wall time")?), + })?; + let _ = store.define_run_dimension(DefineRunDimensionRequest { + key: NonEmptyText::new("scenario")?, + value_type: FieldValueType::String, + description: Some(NonEmptyText::new("workload family")?), + })?; + let _ = store.define_run_dimension(DefineRunDimensionRequest { + key: NonEmptyText::new("duration_s")?, + value_type: FieldValueType::Numeric, + description: Some(NonEmptyText::new("time budget in seconds")?), + })?; + + let first_change = store.add_node(CreateNodeRequest { + class: NodeClass::Change, + frontier_id: Some(frontier_id), + title: NonEmptyText::new("first change")?, + summary: Some(NonEmptyText::new("first change summary")?), + tags: None, + payload: NodePayload::with_schema( + store.schema().schema_ref(), + super::json_object(json!({"body": "first body", "latency_hint": 14.0}))?, + ), + annotations: Vec::new(), + attachments: Vec::new(), + })?; + let second_change = store.add_node(CreateNodeRequest { + class: NodeClass::Change, + frontier_id: Some(frontier_id), + title: NonEmptyText::new("second change")?, + summary: Some(NonEmptyText::new("second change summary")?), + tags: None, + payload: NodePayload::with_schema( + store.schema().schema_ref(), + super::json_object(json!({"body": "second body", "latency_hint": 7.0}))?, + ), + annotations: Vec::new(), + attachments: Vec::new(), + })?; + + let first_receipt = store.close_experiment(experiment_request( + &root, + frontier_id, + base_checkpoint_id, + first_change.id, + "bbbbbbbbbbbbbbbb", + "first run", + 10.0, + run_dimensions("belt_4x5", 20.0)?, + )?)?; + let second_receipt = store.close_experiment(experiment_request( + &root, + frontier_id, + base_checkpoint_id, + second_change.id, + "cccccccccccccccc", + "second run", + 5.0, + run_dimensions("belt_4x5", 60.0)?, + )?)?; + + let keys = store.list_metric_keys()?; + assert!(keys.iter().any(|key| { + key.key.as_str() == "wall_clock_s" && key.source == MetricFieldSource::RunMetric + })); + assert!(keys.iter().any(|key| { + key.key.as_str() == "latency_hint" && key.source == MetricFieldSource::ChangePayload + })); + assert!(keys.iter().any(|key| { + key.key.as_str() == "wall_clock_s" + && key.source == MetricFieldSource::RunMetric + && key.description.as_ref().map(NonEmptyText::as_str) == Some("elapsed wall time") + })); + + let filtered_keys = store.list_metric_keys_filtered(MetricKeyQuery { + frontier_id: Some(frontier_id), + source: Some(MetricFieldSource::RunMetric), + dimensions: run_dimensions("belt_4x5", 60.0)?, + })?; + assert_eq!(filtered_keys.len(), 1); + assert_eq!(filtered_keys[0].experiment_count, 1); + + let dimension_summaries = store.list_run_dimensions()?; + assert!(dimension_summaries.iter().any(|dimension| { + dimension.key.as_str() == "benchmark_suite" + && dimension.value_type == FieldValueType::String + && dimension.observed_run_count == 2 + })); + assert!(dimension_summaries.iter().any(|dimension| { + dimension.key.as_str() == "scenario" + && dimension.description.as_ref().map(NonEmptyText::as_str) + == Some("workload family") + })); + assert!(dimension_summaries.iter().any(|dimension| { + dimension.key.as_str() == "duration_s" + && dimension.value_type == FieldValueType::Numeric + && dimension.distinct_value_count == 2 + })); + + let canonical_best = store.best_metrics(MetricBestQuery { + key: NonEmptyText::new("wall_clock_s")?, + frontier_id: Some(frontier_id), + source: Some(MetricFieldSource::RunMetric), + dimensions: run_dimensions("belt_4x5", 60.0)?, + order: None, + limit: 5, + })?; + assert_eq!(canonical_best.len(), 1); + assert_eq!(canonical_best[0].value, 5.0); + assert_eq!( + canonical_best[0].candidate_checkpoint_id, + second_receipt.checkpoint.id + ); + assert_eq!( + canonical_best[0] + .dimensions + .get(&NonEmptyText::new("duration_s")?), + Some(&RunDimensionValue::Numeric(60.0)) + ); + + let payload_best = store.best_metrics(MetricBestQuery { + key: NonEmptyText::new("latency_hint")?, + frontier_id: Some(frontier_id), + source: Some(MetricFieldSource::ChangePayload), + dimensions: run_dimensions("belt_4x5", 60.0)?, + order: Some(MetricRankOrder::Asc), + limit: 5, + })?; + assert_eq!(payload_best.len(), 1); + assert_eq!(payload_best[0].value, 7.0); + assert_eq!(payload_best[0].change_node_id, second_change.id); + + let missing_order = store.best_metrics(MetricBestQuery { + key: NonEmptyText::new("latency_hint")?, + frontier_id: Some(frontier_id), + source: Some(MetricFieldSource::ChangePayload), + dimensions: BTreeMap::new(), + order: None, + limit: 5, + }); + assert!(matches!( + missing_order, + Err(super::StoreError::MetricOrderRequired { .. }) + )); + assert_eq!( + first_receipt.checkpoint.snapshot.commit_hash.as_str(), + "bbbbbbbbbbbbbbbb" + ); + Ok(()) + } + + #[test] + fn opening_store_backfills_legacy_benchmark_suite_dimensions() -> Result<(), super::StoreError> + { + let root = temp_project_root("metric-plane-backfill"); + let mut store = ProjectStore::init( + &root, + NonEmptyText::new("test project")?, + NonEmptyText::new("local.test")?, + )?; + let projection = store.create_frontier(CreateFrontierRequest { + label: NonEmptyText::new("migration frontier")?, + contract_title: NonEmptyText::new("migration contract")?, + contract_summary: None, + contract: FrontierContract { + objective: NonEmptyText::new("exercise metric migration")?, + evaluation: EvaluationProtocol { + benchmark_suites: BTreeSet::from([NonEmptyText::new("smoke")?]), + primary_metric: MetricSpec { + metric_key: NonEmptyText::new("wall_clock_s")?, + unit: MetricUnit::Seconds, + objective: OptimizationObjective::Minimize, + }, + supporting_metrics: BTreeSet::new(), + }, + promotion_criteria: vec![NonEmptyText::new("keep the metric plane queryable")?], + }, + initial_checkpoint: Some(super::CheckpointSeed { + summary: NonEmptyText::new("seed")?, + snapshot: checkpoint_snapshot(&root, "aaaaaaaaaaaaaaaa")?, + }), + })?; + let frontier_id = projection.frontier.id; + let base_checkpoint_id = projection + .champion_checkpoint_id + .ok_or_else(|| super::StoreError::MissingChampionCheckpoint { frontier_id })?; + let change = store.add_node(CreateNodeRequest { + class: NodeClass::Change, + frontier_id: Some(frontier_id), + title: NonEmptyText::new("candidate change")?, + summary: Some(NonEmptyText::new("candidate change summary")?), + tags: None, + payload: NodePayload::with_schema( + store.schema().schema_ref(), + super::json_object(json!({"latency_hint": 9.0}))?, + ), + annotations: Vec::new(), + attachments: Vec::new(), + })?; + let _ = store.close_experiment(experiment_request( + &root, + frontier_id, + base_checkpoint_id, + change.id, + "bbbbbbbbbbbbbbbb", + "migration run", + 11.0, + BTreeMap::from([( + NonEmptyText::new("benchmark_suite")?, + RunDimensionValue::String(NonEmptyText::new("smoke")?), + )]), + )?)?; + drop(store); + + let connection = rusqlite::Connection::open( + root.join(super::STORE_DIR_NAME) + .join(super::STATE_DB_NAME) + .as_std_path(), + )?; + let _ = connection.execute("DELETE FROM run_dimensions", [])?; + drop(connection); + + let reopened = ProjectStore::open(&root)?; + let dimensions = reopened.list_run_dimensions()?; + assert!(dimensions.iter().any(|dimension| { + dimension.key.as_str() == "benchmark_suite" && dimension.observed_run_count == 1 + })); + + let best = reopened.best_metrics(MetricBestQuery { + key: NonEmptyText::new("wall_clock_s")?, + frontier_id: Some(frontier_id), + source: Some(MetricFieldSource::RunMetric), + dimensions: BTreeMap::from([( + NonEmptyText::new("benchmark_suite")?, + RunDimensionValue::String(NonEmptyText::new("smoke")?), + )]), + order: None, + limit: 5, + })?; + assert_eq!(best.len(), 1); + assert_eq!(best[0].value, 11.0); + Ok(()) + } + + fn checkpoint_snapshot( + root: &camino::Utf8Path, + commit: &str, + ) -> Result<CheckpointSnapshotRef, super::StoreError> { + Ok(CheckpointSnapshotRef { + repo_root: root.to_path_buf(), + worktree_root: root.to_path_buf(), + worktree_name: Some(NonEmptyText::new("main")?), + commit_hash: GitCommitHash::new(commit)?, + }) + } + + fn experiment_request( + root: &camino::Utf8Path, + frontier_id: fidget_spinner_core::FrontierId, + base_checkpoint_id: fidget_spinner_core::CheckpointId, + change_node_id: fidget_spinner_core::NodeId, + candidate_commit: &str, + run_title: &str, + wall_clock_s: f64, + dimensions: BTreeMap<NonEmptyText, RunDimensionValue>, + ) -> Result<CloseExperimentRequest, super::StoreError> { + Ok(CloseExperimentRequest { + frontier_id, + base_checkpoint_id, + change_node_id, + candidate_summary: NonEmptyText::new(format!("candidate {candidate_commit}"))?, + candidate_snapshot: checkpoint_snapshot(root, candidate_commit)?, + run_title: NonEmptyText::new(run_title)?, + run_summary: Some(NonEmptyText::new("run summary")?), + backend: fidget_spinner_core::ExecutionBackend::WorktreeProcess, + dimensions, + command: CommandRecipe::new( + root.to_path_buf(), + vec![NonEmptyText::new("true")?], + BTreeMap::new(), + )?, + code_snapshot: None, + primary_metric: MetricValue { + key: NonEmptyText::new("wall_clock_s")?, + value: wall_clock_s, + }, + supporting_metrics: Vec::new(), + note: FrontierNote { + summary: NonEmptyText::new("note summary")?, + next_hypotheses: Vec::new(), + }, + verdict: FrontierVerdict::KeepOnFrontier, + decision_title: NonEmptyText::new("decision")?, + decision_rationale: NonEmptyText::new("decision rationale")?, + analysis_node_id: None, + }) + } + + fn run_dimensions( + scenario: &str, + duration_s: f64, + ) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, super::StoreError> { + Ok(BTreeMap::from([ + ( + NonEmptyText::new("benchmark_suite")?, + RunDimensionValue::String(NonEmptyText::new("smoke")?), + ), + ( + NonEmptyText::new("scenario")?, + RunDimensionValue::String(NonEmptyText::new(scenario)?), + ), + ( + NonEmptyText::new("duration_s")?, + RunDimensionValue::Numeric(duration_s), + ), + ])) + } } diff --git a/docs/architecture.md b/docs/architecture.md index 5db53fb..37f5c55 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -320,7 +320,7 @@ This projection is derived from canonical state and intentionally rebuildable. These are intentionally cheap: - `note.quick`, but only with explicit tags from the repo-local registry -- `research.record` +- `research.record`, optionally tagged into the same repo-local taxonomy - generic `node.create` for escape-hatch use - `node.annotate` @@ -351,6 +351,13 @@ framing. The public server is a stable host. It owns initialization state, replay policy, telemetry, and host rollout. Execution happens in a disposable worker subprocess. +Presentation is orthogonal to payload detail: + +- `render=porcelain|json` +- `detail=concise|full` + +Porcelain is the terse model-facing surface, not a pretty-printed JSON dump. + ### Host responsibilities - own the public JSON-RPC session @@ -436,6 +443,8 @@ Implemented tools: - `project.bind` - `project.status` - `project.schema` +- `schema.field.upsert` +- `schema.field.remove` - `tag.add` - `tag.list` - `frontier.list` @@ -449,6 +458,12 @@ Implemented tools: - `node.archive` - `note.quick` - `research.record` +- `metric.define` +- `metric.keys` +- `metric.best` +- `metric.migrate` +- `run.dimension.define` +- `run.dimension.list` - `experiment.close` - `skill.list` - `skill.show` @@ -464,7 +479,9 @@ Implemented resources: ### Operational tools -`system.health` returns a typed operational snapshot: +`system.health` returns a typed operational snapshot. Concise/default output +stays on immediate session state; full detail widens to the entire health +object: - initialization state - binding state @@ -472,7 +489,7 @@ Implemented resources: - current executable path - launch-path stability - rollout-pending state -- last recorded fault +- last recorded fault in full detail `system.telemetry` returns cumulative counters: @@ -482,6 +499,7 @@ Implemented resources: - retries - worker restarts - host rollouts +- last recorded fault - per-operation counts and last latencies ### Rollout model @@ -507,6 +525,8 @@ Current commands: - `init` - `schema show` +- `schema upsert-field` +- `schema remove-field` - `frontier init` - `frontier status` - `node add` @@ -515,16 +535,29 @@ Current commands: - `node annotate` - `node archive` - `note quick` +- `tag add` +- `tag list` - `research add` +- `metric define` +- `metric keys` +- `metric best` +- `metric migrate` +- `dimension define` +- `dimension list` - `experiment close` - `mcp serve` +- `ui serve` - hidden internal `mcp worker` - `skill list` - `skill install` - `skill show` The CLI is not the strategic write plane, but it is the easiest repair and -bootstrap surface. +bootstrap surface. Its naming is intentionally parallel but not identical to +the MCP surface: + +- CLI subcommands use spaces such as `schema upsert-field` and `dimension define` +- MCP tools use dotted names such as `schema.field.upsert` and `run.dimension.define` ## Bundled Skill |