swarm repositories / source
aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md99
-rw-r--r--assets/codex-skills/fidget-spinner/SKILL.md18
-rw-r--r--crates/fidget-spinner-cli/src/main.rs533
-rw-r--r--crates/fidget-spinner-cli/src/mcp/catalog.rs209
-rw-r--r--crates/fidget-spinner-cli/src/mcp/host/runtime.rs22
-rw-r--r--crates/fidget-spinner-cli/src/mcp/service.rs925
-rw-r--r--crates/fidget-spinner-cli/src/mcp/telemetry.rs1
-rw-r--r--crates/fidget-spinner-cli/tests/mcp_hardening.rs1040
-rw-r--r--crates/fidget-spinner-core/src/lib.rs7
-rw-r--r--crates/fidget-spinner-core/src/model.rs144
-rw-r--r--crates/fidget-spinner-store-sqlite/src/lib.rs2257
-rw-r--r--docs/architecture.md41
12 files changed, 5083 insertions, 213 deletions
diff --git a/README.md b/README.md
index 4540fcd..9e30b28 100644
--- a/README.md
+++ b/README.md
@@ -63,6 +63,47 @@ cargo run -p fidget-spinner-cli -- frontier init \
--primary-metric-objective maximize
```
+Register project-level metric and run-dimension vocabulary before recording a
+lot of experiments:
+
+```bash
+cargo run -p fidget-spinner-cli -- schema upsert-field \
+ --project . \
+ --name scenario \
+ --class change \
+ --class analysis \
+ --presence recommended \
+ --severity warning \
+ --role projection-gate \
+ --inference manual-only \
+ --type string
+```
+
+```bash
+cargo run -p fidget-spinner-cli -- metric define \
+ --project . \
+ --key wall_clock_s \
+ --unit seconds \
+ --objective minimize \
+ --description "elapsed wall time"
+```
+
+```bash
+cargo run -p fidget-spinner-cli -- dimension define \
+ --project . \
+ --key scenario \
+ --type string \
+ --description "workload family"
+```
+
+```bash
+cargo run -p fidget-spinner-cli -- dimension define \
+ --project . \
+ --key duration_s \
+ --type numeric \
+ --description "time budget in seconds"
+```
+
Record low-ceremony off-path work:
```bash
@@ -76,17 +117,33 @@ cargo run -p fidget-spinner-cli -- tag add \
cargo run -p fidget-spinner-cli -- research add \
--project . \
--title "next feature slate" \
- --body "Investigate pruning, richer projections, and libgrid schema presets."
+ --summary "Investigate the next tranche of high-value product work." \
+ --body "Investigate pruning, richer projections, and libgrid schema presets." \
+ --tag dogfood/mvp
```
```bash
cargo run -p fidget-spinner-cli -- note quick \
--project . \
--title "first tagged note" \
+ --summary "Tag-aware note capture is live." \
--body "Tag-aware note capture is live." \
--tag dogfood/mvp
```
+```bash
+cargo run -p fidget-spinner-cli -- metric keys --project .
+```
+
+```bash
+cargo run -p fidget-spinner-cli -- metric best \
+ --project . \
+ --key wall_clock_s \
+ --dimension scenario=belt_4x5 \
+ --dimension duration_s=60 \
+ --source run-metric
+```
+
Serve the local MCP surface in unbound mode:
```bash
@@ -145,6 +202,8 @@ The current MCP tools are:
- `project.bind`
- `project.status`
- `project.schema`
+- `schema.field.upsert`
+- `schema.field.remove`
- `tag.add`
- `tag.list`
- `frontier.list`
@@ -158,10 +217,22 @@ The current MCP tools are:
- `node.archive`
- `note.quick`
- `research.record`
+- `metric.define`
+- `metric.keys`
+- `metric.best`
+- `metric.migrate`
+- `run.dimension.define`
+- `run.dimension.list`
- `experiment.close`
- `skill.list`
- `skill.show`
+Nontrivial MCP tools follow the shared presentation contract:
+
+- `render=porcelain|json` chooses terse text vs structured JSON rendering
+- `detail=concise|full` chooses triage payload vs widened detail
+- porcelain is default and is intentionally not just pretty-printed JSON
+
Operationally, the MCP now runs as a stable host process that owns the public
JSON-RPC session and delegates tool execution to an internal worker subprocess.
Safe replay is only allowed for explicitly read-only operations and resources.
@@ -172,6 +243,31 @@ created with `tag.add`, each with a required human description. `note.quick`
accepts `tags: []` when no existing tag applies, but the field itself is still
mandatory so note classification is always conscious.
+`research.record` now also accepts optional `tags`, so rich imported documents
+can join the same campaign/subsystem index as terse notes without falling back
+to the generic escape hatch.
+
+`note.quick`, `research.record`, and generic `node create` for `note`/`research`
+now enforce the same strict prose split: `title` is terse identity, `summary`
+is the triage/search layer, and `body` holds the full text. List-like surfaces
+stay on `title` + `summary`; full prose is for explicit reads only.
+
+Schema authoring no longer has to happen by hand in `.fidget_spinner/schema.json`.
+The CLI exposes `schema upsert-field` / `schema remove-field`, and the MCP
+surface exposes the corresponding `schema.field.upsert` / `schema.field.remove`
+tools. The CLI uses space-separated subcommands; the MCP uses dotted tool names.
+
+Metrics and run dimensions are now project-level registries. Frontier contracts
+still declare the evaluation metric vocabulary, but closed experiments report
+only thin `key=value` metrics plus typed run dimensions. `metric.define` can
+enrich metric descriptions, CLI `dimension define` / MCP `run.dimension.define`
+preregister slicers such as `scenario` or `duration_s`, `metric.keys`
+discovers rankable numeric signals, and `metric.best` ranks one key within
+optional exact dimension filters.
+Legacy `benchmark_suite` data is normalized into a builtin string dimension on
+store open, and `metric.migrate` can be invoked explicitly as an idempotent
+repair pass.
+
The intended flow is:
1. inspect `system.health`
@@ -190,6 +286,7 @@ Off-path work does not require git. You can initialize a local project and use:
- `research add`
- `tag add`
- `note quick`
+- `metric keys`
- `node annotate`
- `mcp serve`
diff --git a/assets/codex-skills/fidget-spinner/SKILL.md b/assets/codex-skills/fidget-spinner/SKILL.md
index e4cb7b7..1e4c2a3 100644
--- a/assets/codex-skills/fidget-spinner/SKILL.md
+++ b/assets/codex-skills/fidget-spinner/SKILL.md
@@ -43,8 +43,15 @@ If you need more context, pull it from:
- `tag.add` when a new note taxonomy token is genuinely needed; every tag must carry a description
- `tag.list` before inventing note tags by memory
-- `research.record` for exploratory work, design notes, dead ends, and enabling ideas
-- `note.quick` for terse state pushes, always with an explicit `tags` list; use `[]` only when no registered tag applies
+- `schema.field.upsert` when one project payload field needs to become canonical without hand-editing `schema.json`
+- `schema.field.remove` when one project payload field definition should be purged cleanly
+- `research.record` for exploratory work, design notes, dead ends, and enabling ideas; always pass `title`, `summary`, and `body`, and pass `tags` when the research belongs in a campaign/subsystem index
+- `note.quick` for terse state pushes, always with an explicit `tags` list plus `title`, `summary`, and `body`; use `[]` only when no registered tag applies
+- `metric.define` when a project-level metric key needs a canonical unit, objective, or human description
+- `run.dimension.define` when a new experiment slicer such as `scenario` or `duration_s` becomes query-worthy
+- `run.dimension.list` before guessing which run dimensions actually exist in the store
+- `metric.keys` before guessing which numeric signals are actually rankable; pass exact run-dimension filters when narrowing to one workload slice
+- `metric.best` when you need the best closed experiments by one numeric key; pass `order` for noncanonical payload fields and exact run-dimension filters when comparing one slice
- `node.annotate` for scratch text that should stay off the main path
- `change.record` before core-path work
- `experiment.close` only when you have checkpoint, measured result, note, and verdict
@@ -60,5 +67,10 @@ If you need more context, pull it from:
and `system.telemetry` before pushing further.
5. Keep fetches narrow by default; widen only when stale or archived context is
actually needed.
-6. When the task becomes a true indefinite optimization push, pair this skill
+6. Treat metric keys as project-level registry entries and run dimensions as the
+ first-class slice surface for experiment comparison; do not encode scenario
+ context into the metric key itself.
+7. Porcelain is the terse triage surface. Use `detail=full` only when concise
+ output stops being decision-sufficient.
+8. When the task becomes a true indefinite optimization push, pair this skill
with `frontier-loop`.
diff --git a/crates/fidget-spinner-cli/src/main.rs b/crates/fidget-spinner-cli/src/main.rs
index fe4cb5f..3ad9534 100644
--- a/crates/fidget-spinner-cli/src/main.rs
+++ b/crates/fidget-spinner-cli/src/main.rs
@@ -10,13 +10,16 @@ use std::path::{Path, PathBuf};
use camino::{Utf8Path, Utf8PathBuf};
use clap::{Args, Parser, Subcommand, ValueEnum};
use fidget_spinner_core::{
- AnnotationVisibility, CodeSnapshotRef, CommandRecipe, ExecutionBackend, FrontierContract,
- FrontierNote, FrontierVerdict, GitCommitHash, MetricObservation, MetricSpec, MetricUnit,
- NodeAnnotation, NodeClass, NodePayload, NonEmptyText, OptimizationObjective, TagName,
+ AnnotationVisibility, CodeSnapshotRef, CommandRecipe, DiagnosticSeverity, ExecutionBackend,
+ FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote, FrontierVerdict,
+ GitCommitHash, InferencePolicy, MetricSpec, MetricUnit, MetricValue, NodeAnnotation, NodeClass,
+ NodePayload, NonEmptyText, OptimizationObjective, ProjectFieldSpec, TagName,
};
use fidget_spinner_store_sqlite::{
- CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, EdgeAttachment,
- EdgeAttachmentDirection, ListNodesQuery, ProjectStore, StoreError,
+ CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest,
+ DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ListNodesQuery,
+ MetricBestQuery, MetricFieldSource, MetricKeyQuery, MetricRankOrder, ProjectStore,
+ RemoveSchemaFieldRequest, StoreError, UpsertSchemaFieldRequest,
};
use serde::Serialize;
use serde_json::{Map, Value, json};
@@ -61,6 +64,16 @@ enum Command {
},
/// Record off-path research and enabling work.
Research(ResearchCommand),
+ /// Inspect rankable metrics across closed experiments.
+ Metric {
+ #[command(subcommand)]
+ command: MetricCommand,
+ },
+ /// Define and inspect run dimensions used to slice experiment metrics.
+ Dimension {
+ #[command(subcommand)]
+ command: DimensionCommand,
+ },
/// Close a core-path experiment atomically.
Experiment {
#[command(subcommand)]
@@ -100,6 +113,10 @@ struct InitArgs {
enum SchemaCommand {
/// Show the current project schema as JSON.
Show(ProjectArg),
+ /// Add or replace one project schema field definition.
+ UpsertField(SchemaFieldUpsertArgs),
+ /// Remove one project schema field definition.
+ RemoveField(SchemaFieldRemoveArgs),
}
#[derive(Subcommand)]
@@ -169,8 +186,10 @@ struct NodeAddArgs {
#[arg(long)]
title: String,
#[arg(long)]
+ /// Required for `note` and `research` nodes.
summary: Option<String>,
#[arg(long = "payload-json")]
+ /// JSON object payload. `note` and `research` nodes require a non-empty `body` string.
payload_json: Option<String>,
#[arg(long = "payload-file")]
payload_file: Option<PathBuf>,
@@ -270,6 +289,74 @@ enum ResearchSubcommand {
Add(QuickResearchArgs),
}
+#[derive(Subcommand)]
+enum MetricCommand {
+ /// Register a project-level metric definition.
+ Define(MetricDefineArgs),
+ /// List rankable numeric keys observed in completed experiments.
+ Keys(MetricKeysArgs),
+ /// Rank completed experiments by one numeric key.
+ Best(MetricBestArgs),
+ /// Re-run the idempotent legacy metric-plane normalization.
+ Migrate(ProjectArg),
+}
+
+#[derive(Subcommand)]
+enum DimensionCommand {
+ /// Register a project-level run dimension definition.
+ Define(DimensionDefineArgs),
+ /// List run dimensions and sample values observed in completed runs.
+ List(ProjectArg),
+}
+
+#[derive(Args)]
+struct MetricDefineArgs {
+ #[command(flatten)]
+ project: ProjectArg,
+ /// Metric key used in experiment closure and ranking.
+ #[arg(long)]
+ key: String,
+ /// Canonical unit for this metric key.
+ #[arg(long, value_enum)]
+ unit: CliMetricUnit,
+ /// Optimization direction for this metric key.
+ #[arg(long, value_enum)]
+ objective: CliOptimizationObjective,
+ /// Optional human description shown in metric listings.
+ #[arg(long)]
+ description: Option<String>,
+}
+
+#[derive(Args)]
+struct MetricKeysArgs {
+ #[command(flatten)]
+ project: ProjectArg,
+ /// Restrict results to one frontier.
+ #[arg(long)]
+ frontier: Option<String>,
+ /// Restrict results to one metric source.
+ #[arg(long, value_enum)]
+ source: Option<CliMetricSource>,
+ /// Exact run-dimension filter in the form `key=value`.
+ #[arg(long = "dimension")]
+ dimensions: Vec<String>,
+}
+
+#[derive(Args)]
+struct DimensionDefineArgs {
+ #[command(flatten)]
+ project: ProjectArg,
+ /// Run-dimension key used to slice experiments.
+ #[arg(long)]
+ key: String,
+ /// Canonical value type for this run dimension.
+ #[arg(long = "type", value_enum)]
+ value_type: CliFieldValueType,
+ /// Optional human description shown in dimension listings.
+ #[arg(long)]
+ description: Option<String>,
+}
+
#[derive(Args)]
struct QuickNoteArgs {
#[command(flatten)]
@@ -279,6 +366,8 @@ struct QuickNoteArgs {
#[arg(long)]
title: String,
#[arg(long)]
+ summary: String,
+ #[arg(long)]
body: String,
#[command(flatten)]
tag_selection: ExplicitTagSelectionArgs,
@@ -305,13 +394,69 @@ struct QuickResearchArgs {
#[arg(long)]
title: String,
#[arg(long)]
- body: String,
+ summary: String,
#[arg(long)]
- summary: Option<String>,
+ body: String,
+ #[command(flatten)]
+ tag_selection: ExplicitTagSelectionArgs,
#[arg(long = "parent")]
parents: Vec<String>,
}
+#[derive(Args)]
+struct SchemaFieldUpsertArgs {
+ #[command(flatten)]
+ project: ProjectArg,
+ #[arg(long)]
+ name: String,
+ #[arg(long = "class", value_enum)]
+ classes: Vec<CliNodeClass>,
+ #[arg(long, value_enum)]
+ presence: CliFieldPresence,
+ #[arg(long, value_enum)]
+ severity: CliDiagnosticSeverity,
+ #[arg(long, value_enum)]
+ role: CliFieldRole,
+ #[arg(long = "inference", value_enum)]
+ inference_policy: CliInferencePolicy,
+ #[arg(long = "type", value_enum)]
+ value_type: Option<CliFieldValueType>,
+}
+
+#[derive(Args)]
+struct SchemaFieldRemoveArgs {
+ #[command(flatten)]
+ project: ProjectArg,
+ #[arg(long)]
+ name: String,
+ #[arg(long = "class", value_enum)]
+ classes: Vec<CliNodeClass>,
+}
+
+#[derive(Args)]
+struct MetricBestArgs {
+ #[command(flatten)]
+ project: ProjectArg,
+ /// Metric key to rank on.
+ #[arg(long)]
+ key: String,
+ /// Restrict results to one frontier.
+ #[arg(long)]
+ frontier: Option<String>,
+ /// Restrict results to one metric source.
+ #[arg(long, value_enum)]
+ source: Option<CliMetricSource>,
+ /// Explicit ordering for sources whose objective cannot be inferred.
+ #[arg(long, value_enum)]
+ order: Option<CliMetricOrder>,
+ /// Exact run-dimension filter in the form `key=value`.
+ #[arg(long = "dimension")]
+ dimensions: Vec<String>,
+ /// Maximum number of ranked experiments to return.
+ #[arg(long, default_value_t = 10)]
+ limit: u32,
+}
+
#[derive(Subcommand)]
enum ExperimentCommand {
/// Close a core-path experiment with checkpoint, run, note, and verdict.
@@ -348,24 +493,23 @@ struct ExperimentCloseArgs {
run_title: String,
#[arg(long = "run-summary")]
run_summary: Option<String>,
- #[arg(long = "benchmark-suite")]
- benchmark_suite: String,
+ /// Repeat for each run dimension as `key=value`.
+ #[arg(long = "dimension")]
+ dimensions: Vec<String>,
#[arg(long = "backend", value_enum, default_value_t = CliExecutionBackend::Worktree)]
backend: CliExecutionBackend,
#[arg(long = "cwd")]
working_directory: Option<PathBuf>,
+ /// Repeat for each argv token passed to the recorded command.
#[arg(long = "argv")]
argv: Vec<String>,
+ /// Repeat for each environment override as `KEY=VALUE`.
#[arg(long = "env")]
env: Vec<String>,
- #[arg(long = "primary-metric-key")]
- primary_metric_key: String,
- #[arg(long = "primary-metric-unit", value_enum)]
- primary_metric_unit: CliMetricUnit,
- #[arg(long = "primary-metric-objective", value_enum)]
- primary_metric_objective: CliOptimizationObjective,
- #[arg(long = "primary-metric-value")]
- primary_metric_value: f64,
+ /// Primary metric in the form `key=value`; key must be preregistered.
+ #[arg(long = "primary-metric")]
+ primary_metric: String,
+ /// Supporting metric in the form `key=value`; repeat as needed.
#[arg(long = "metric")]
metrics: Vec<String>,
#[arg(long)]
@@ -475,6 +619,57 @@ enum CliExecutionBackend {
}
#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
+enum CliMetricSource {
+ RunMetric,
+ ChangePayload,
+ RunPayload,
+ AnalysisPayload,
+ DecisionPayload,
+}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
+enum CliMetricOrder {
+ Asc,
+ Desc,
+}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
+enum CliFieldValueType {
+ String,
+ Numeric,
+ Boolean,
+ Timestamp,
+}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
+enum CliDiagnosticSeverity {
+ Error,
+ Warning,
+ Info,
+}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
+enum CliFieldPresence {
+ Required,
+ Recommended,
+ Optional,
+}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
+enum CliFieldRole {
+ Index,
+ ProjectionGate,
+ RenderOnly,
+ Opaque,
+}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
+enum CliInferencePolicy {
+ ManualOnly,
+ ModelMayInfer,
+}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
enum CliFrontierVerdict {
PromoteToChampion,
KeepOnFrontier,
@@ -499,6 +694,8 @@ fn run() -> Result<(), StoreError> {
let store = open_store(&project.project)?;
print_json(store.schema())
}
+ SchemaCommand::UpsertField(args) => run_schema_field_upsert(args),
+ SchemaCommand::RemoveField(args) => run_schema_field_remove(args),
},
Command::Frontier { command } => match command {
FrontierCommand::Init(args) => run_frontier_init(args),
@@ -521,6 +718,16 @@ fn run() -> Result<(), StoreError> {
Command::Research(command) => match command.command {
ResearchSubcommand::Add(args) => run_quick_research(args),
},
+ Command::Metric { command } => match command {
+ MetricCommand::Define(args) => run_metric_define(args),
+ MetricCommand::Keys(args) => run_metric_keys(args),
+ MetricCommand::Best(args) => run_metric_best(args),
+ MetricCommand::Migrate(project) => run_metric_migrate(project),
+ },
+ Command::Dimension { command } => match command {
+ DimensionCommand::Define(args) => run_dimension_define(args),
+ DimensionCommand::List(project) => run_dimension_list(project),
+ },
Command::Experiment { command } => match command {
ExperimentCommand::Close(args) => run_experiment_close(args),
},
@@ -597,27 +804,58 @@ fn run_frontier_status(args: FrontierStatusArgs) -> Result<(), StoreError> {
print_json(&frontiers)
}
+fn run_schema_field_upsert(args: SchemaFieldUpsertArgs) -> Result<(), StoreError> {
+ let mut store = open_store(&args.project.project)?;
+ let field = store.upsert_schema_field(UpsertSchemaFieldRequest {
+ name: NonEmptyText::new(args.name)?,
+ node_classes: parse_node_class_set(args.classes),
+ presence: args.presence.into(),
+ severity: args.severity.into(),
+ role: args.role.into(),
+ inference_policy: args.inference_policy.into(),
+ value_type: args.value_type.map(Into::into),
+ })?;
+ print_json(&json!({
+ "schema": store.schema().schema_ref(),
+ "field": schema_field_json(&field),
+ }))
+}
+
+fn run_schema_field_remove(args: SchemaFieldRemoveArgs) -> Result<(), StoreError> {
+ let mut store = open_store(&args.project.project)?;
+ let removed_count = store.remove_schema_field(RemoveSchemaFieldRequest {
+ name: NonEmptyText::new(args.name)?,
+ node_classes: (!args.classes.is_empty()).then(|| parse_node_class_set(args.classes)),
+ })?;
+ print_json(&json!({
+ "schema": store.schema().schema_ref(),
+ "removed_count": removed_count,
+ }))
+}
+
fn run_node_add(args: NodeAddArgs) -> Result<(), StoreError> {
let mut store = open_store(&args.project.project)?;
+ let class: NodeClass = args.class.into();
let frontier_id = args
.frontier
.as_deref()
.map(parse_frontier_id)
.transpose()?;
- let tags = optional_cli_tags(args.tag_selection, args.class == CliNodeClass::Note)?;
+ let tags = optional_cli_tags(args.tag_selection, class == NodeClass::Note)?;
let payload = load_payload(
store.schema().schema_ref(),
args.payload_json,
args.payload_file,
args.fields,
)?;
+ validate_cli_prose_payload(class, args.summary.as_deref(), &payload)?;
let annotations = args
.annotations
.into_iter()
.map(|body| Ok(NodeAnnotation::hidden(NonEmptyText::new(body)?)))
.collect::<Result<Vec<_>, StoreError>>()?;
let node = store.add_node(CreateNodeRequest {
- class: args.class.into(),
+ class,
frontier_id,
title: NonEmptyText::new(args.title)?,
summary: args.summary.map(NonEmptyText::new).transpose()?,
@@ -693,7 +931,7 @@ fn run_quick_note(args: QuickNoteArgs) -> Result<(), StoreError> {
.map(parse_frontier_id)
.transpose()?,
title: NonEmptyText::new(args.title)?,
- summary: None,
+ summary: Some(NonEmptyText::new(args.summary)?),
tags: Some(explicit_cli_tags(args.tag_selection)?),
payload,
annotations: Vec::new(),
@@ -730,8 +968,8 @@ fn run_quick_research(args: QuickResearchArgs) -> Result<(), StoreError> {
.map(parse_frontier_id)
.transpose()?,
title: NonEmptyText::new(args.title)?,
- summary: args.summary.map(NonEmptyText::new).transpose()?,
- tags: None,
+ summary: Some(NonEmptyText::new(args.summary)?),
+ tags: optional_cli_tags(args.tag_selection, false)?,
payload,
annotations: Vec::new(),
attachments: lineage_attachments(args.parents)?,
@@ -739,6 +977,69 @@ fn run_quick_research(args: QuickResearchArgs) -> Result<(), StoreError> {
print_json(&node)
}
+fn run_metric_define(args: MetricDefineArgs) -> Result<(), StoreError> {
+ let mut store = open_store(&args.project.project)?;
+ let record = store.define_metric(DefineMetricRequest {
+ key: NonEmptyText::new(args.key)?,
+ unit: args.unit.into(),
+ objective: args.objective.into(),
+ description: args.description.map(NonEmptyText::new).transpose()?,
+ })?;
+ print_json(&record)
+}
+
+fn run_metric_keys(args: MetricKeysArgs) -> Result<(), StoreError> {
+ let store = open_store(&args.project.project)?;
+ print_json(
+ &store.list_metric_keys_filtered(MetricKeyQuery {
+ frontier_id: args
+ .frontier
+ .as_deref()
+ .map(parse_frontier_id)
+ .transpose()?,
+ source: args.source.map(Into::into),
+ dimensions: coerce_cli_dimension_filters(&store, args.dimensions)?,
+ })?,
+ )
+}
+
+fn run_metric_best(args: MetricBestArgs) -> Result<(), StoreError> {
+ let store = open_store(&args.project.project)?;
+ let entries = store.best_metrics(MetricBestQuery {
+ key: NonEmptyText::new(args.key)?,
+ frontier_id: args
+ .frontier
+ .as_deref()
+ .map(parse_frontier_id)
+ .transpose()?,
+ source: args.source.map(Into::into),
+ dimensions: coerce_cli_dimension_filters(&store, args.dimensions)?,
+ order: args.order.map(Into::into),
+ limit: args.limit,
+ })?;
+ print_json(&entries)
+}
+
+fn run_metric_migrate(args: ProjectArg) -> Result<(), StoreError> {
+ let mut store = open_store(&args.project)?;
+ print_json(&store.migrate_metric_plane()?)
+}
+
+fn run_dimension_define(args: DimensionDefineArgs) -> Result<(), StoreError> {
+ let mut store = open_store(&args.project.project)?;
+ let record = store.define_run_dimension(DefineRunDimensionRequest {
+ key: NonEmptyText::new(args.key)?,
+ value_type: args.value_type.into(),
+ description: args.description.map(NonEmptyText::new).transpose()?,
+ })?;
+ print_json(&record)
+}
+
+fn run_dimension_list(args: ProjectArg) -> Result<(), StoreError> {
+ let store = open_store(&args.project)?;
+ print_json(&store.list_run_dimensions()?)
+}
+
fn run_experiment_close(args: ExperimentCloseArgs) -> Result<(), StoreError> {
let mut store = open_store(&args.project.project)?;
let frontier_id = parse_frontier_id(&args.frontier)?;
@@ -764,19 +1065,14 @@ fn run_experiment_close(args: ExperimentCloseArgs) -> Result<(), StoreError> {
run_title: NonEmptyText::new(args.run_title)?,
run_summary: args.run_summary.map(NonEmptyText::new).transpose()?,
backend: args.backend.into(),
- benchmark_suite: NonEmptyText::new(args.benchmark_suite)?,
+ dimensions: coerce_cli_dimension_filters(&store, args.dimensions)?,
command,
code_snapshot: Some(capture_code_snapshot(store.project_root())?),
- primary_metric: MetricObservation {
- metric_key: NonEmptyText::new(args.primary_metric_key)?,
- unit: args.primary_metric_unit.into(),
- objective: args.primary_metric_objective.into(),
- value: args.primary_metric_value,
- },
+ primary_metric: parse_metric_value(args.primary_metric)?,
supporting_metrics: args
.metrics
.into_iter()
- .map(parse_metric_observation)
+ .map(parse_metric_value)
.collect::<Result<Vec<_>, _>>()?,
note: FrontierNote {
summary: NonEmptyText::new(args.note)?,
@@ -1011,6 +1307,23 @@ fn load_payload(
Ok(NodePayload::with_schema(schema, map))
}
+fn validate_cli_prose_payload(
+ class: NodeClass,
+ summary: Option<&str>,
+ payload: &NodePayload,
+) -> Result<(), StoreError> {
+ if !matches!(class, NodeClass::Note | NodeClass::Research) {
+ return Ok(());
+ }
+ if summary.is_none() {
+ return Err(StoreError::ProseSummaryRequired(class));
+ }
+ match payload.field("body") {
+ Some(Value::String(body)) if !body.trim().is_empty() => Ok(()),
+ _ => Err(StoreError::ProseBodyRequired(class)),
+ }
+}
+
fn json_object(value: Value) -> Result<Map<String, Value>, StoreError> {
match value {
Value::Object(map) => Ok(map),
@@ -1020,6 +1333,22 @@ fn json_object(value: Value) -> Result<Map<String, Value>, StoreError> {
}
}
+fn schema_field_json(field: &ProjectFieldSpec) -> Value {
+ json!({
+ "name": field.name,
+ "node_classes": field.node_classes.iter().map(ToString::to_string).collect::<Vec<_>>(),
+ "presence": field.presence.as_str(),
+ "severity": field.severity.as_str(),
+ "role": field.role.as_str(),
+ "inference_policy": field.inference_policy.as_str(),
+ "value_type": field.value_type.map(FieldValueType::as_str),
+ })
+}
+
+fn parse_node_class_set(classes: Vec<CliNodeClass>) -> BTreeSet<NodeClass> {
+ classes.into_iter().map(Into::into).collect()
+}
+
fn capture_code_snapshot(project_root: &Utf8Path) -> Result<CodeSnapshotRef, StoreError> {
let head_commit = run_git(project_root, &["rev-parse", "HEAD"])?;
let dirty_paths = run_git(project_root, &["status", "--porcelain"])?
@@ -1084,23 +1413,71 @@ fn maybe_print_gitignore_hint(project_root: &Utf8Path) -> Result<(), StoreError>
}
}
-fn parse_metric_observation(raw: String) -> Result<MetricObservation, StoreError> {
- let parts = raw.split(':').collect::<Vec<_>>();
- if parts.len() != 4 {
- return Err(invalid_input(
- "metrics must look like key:unit:objective:value",
- ));
- }
- Ok(MetricObservation {
- metric_key: NonEmptyText::new(parts[0])?,
- unit: parse_metric_unit(parts[1])?,
- objective: parse_optimization_objective(parts[2])?,
- value: parts[3]
+fn parse_metric_value(raw: String) -> Result<MetricValue, StoreError> {
+ let Some((key, value)) = raw.split_once('=') else {
+ return Err(invalid_input("metrics must look like key=value"));
+ };
+ Ok(MetricValue {
+ key: NonEmptyText::new(key)?,
+ value: value
.parse::<f64>()
.map_err(|error| invalid_input(format!("invalid metric value: {error}")))?,
})
}
+fn coerce_cli_dimension_filters(
+ store: &ProjectStore,
+ raw_dimensions: Vec<String>,
+) -> Result<BTreeMap<NonEmptyText, fidget_spinner_core::RunDimensionValue>, StoreError> {
+ let definitions = store
+ .list_run_dimensions()?
+ .into_iter()
+ .map(|summary| (summary.key.to_string(), summary.value_type))
+ .collect::<BTreeMap<_, _>>();
+ let raw_dimensions = parse_dimension_assignments(raw_dimensions)?
+ .into_iter()
+ .map(|(key, raw_value)| {
+ let Some(value_type) = definitions.get(&key) else {
+ return Err(invalid_input(format!(
+ "unknown run dimension `{key}`; register it first"
+ )));
+ };
+ Ok((key, parse_cli_dimension_value(*value_type, &raw_value)?))
+ })
+ .collect::<Result<BTreeMap<_, _>, StoreError>>()?;
+ store.coerce_run_dimensions(raw_dimensions)
+}
+
+fn parse_dimension_assignments(
+ raw_dimensions: Vec<String>,
+) -> Result<BTreeMap<String, String>, StoreError> {
+ raw_dimensions
+ .into_iter()
+ .map(|raw| {
+ let Some((key, value)) = raw.split_once('=') else {
+ return Err(invalid_input("dimensions must look like key=value"));
+ };
+ Ok((key.to_owned(), value.to_owned()))
+ })
+ .collect()
+}
+
+fn parse_cli_dimension_value(value_type: FieldValueType, raw: &str) -> Result<Value, StoreError> {
+ match value_type {
+ FieldValueType::String | FieldValueType::Timestamp => Ok(Value::String(raw.to_owned())),
+ FieldValueType::Numeric => Ok(json!(raw.parse::<f64>().map_err(|error| {
+ invalid_input(format!("invalid numeric dimension value: {error}"))
+ })?)),
+ FieldValueType::Boolean => match raw {
+ "true" => Ok(Value::Bool(true)),
+ "false" => Ok(Value::Bool(false)),
+ other => Err(invalid_input(format!(
+ "invalid boolean dimension value `{other}`"
+ ))),
+ },
+ }
+}
+
fn parse_metric_unit(raw: &str) -> Result<MetricUnit, StoreError> {
match raw {
"seconds" => Ok(MetricUnit::Seconds),
@@ -1204,6 +1581,78 @@ impl From<CliExecutionBackend> for ExecutionBackend {
}
}
+impl From<CliMetricSource> for MetricFieldSource {
+ fn from(value: CliMetricSource) -> Self {
+ match value {
+ CliMetricSource::RunMetric => Self::RunMetric,
+ CliMetricSource::ChangePayload => Self::ChangePayload,
+ CliMetricSource::RunPayload => Self::RunPayload,
+ CliMetricSource::AnalysisPayload => Self::AnalysisPayload,
+ CliMetricSource::DecisionPayload => Self::DecisionPayload,
+ }
+ }
+}
+
+impl From<CliMetricOrder> for MetricRankOrder {
+ fn from(value: CliMetricOrder) -> Self {
+ match value {
+ CliMetricOrder::Asc => Self::Asc,
+ CliMetricOrder::Desc => Self::Desc,
+ }
+ }
+}
+
+impl From<CliFieldValueType> for FieldValueType {
+ fn from(value: CliFieldValueType) -> Self {
+ match value {
+ CliFieldValueType::String => Self::String,
+ CliFieldValueType::Numeric => Self::Numeric,
+ CliFieldValueType::Boolean => Self::Boolean,
+ CliFieldValueType::Timestamp => Self::Timestamp,
+ }
+ }
+}
+
+impl From<CliDiagnosticSeverity> for DiagnosticSeverity {
+ fn from(value: CliDiagnosticSeverity) -> Self {
+ match value {
+ CliDiagnosticSeverity::Error => Self::Error,
+ CliDiagnosticSeverity::Warning => Self::Warning,
+ CliDiagnosticSeverity::Info => Self::Info,
+ }
+ }
+}
+
+impl From<CliFieldPresence> for FieldPresence {
+ fn from(value: CliFieldPresence) -> Self {
+ match value {
+ CliFieldPresence::Required => Self::Required,
+ CliFieldPresence::Recommended => Self::Recommended,
+ CliFieldPresence::Optional => Self::Optional,
+ }
+ }
+}
+
+impl From<CliFieldRole> for FieldRole {
+ fn from(value: CliFieldRole) -> Self {
+ match value {
+ CliFieldRole::Index => Self::Index,
+ CliFieldRole::ProjectionGate => Self::ProjectionGate,
+ CliFieldRole::RenderOnly => Self::RenderOnly,
+ CliFieldRole::Opaque => Self::Opaque,
+ }
+ }
+}
+
+impl From<CliInferencePolicy> for InferencePolicy {
+ fn from(value: CliInferencePolicy) -> Self {
+ match value {
+ CliInferencePolicy::ManualOnly => Self::ManualOnly,
+ CliInferencePolicy::ModelMayInfer => Self::ModelMayInfer,
+ }
+ }
+}
+
impl From<CliFrontierVerdict> for FrontierVerdict {
fn from(value: CliFrontierVerdict) -> Self {
match value {
diff --git a/crates/fidget-spinner-cli/src/mcp/catalog.rs b/crates/fidget-spinner-cli/src/mcp/catalog.rs
index b23cb31..0831ba4 100644
--- a/crates/fidget-spinner-cli/src/mcp/catalog.rs
+++ b/crates/fidget-spinner-cli/src/mcp/catalog.rs
@@ -67,6 +67,18 @@ pub(crate) fn tool_spec(name: &str) -> Option<ToolSpec> {
dispatch: DispatchTarget::Worker,
replay: ReplayContract::Convergent,
}),
+ "schema.field.upsert" => Some(ToolSpec {
+ name: "schema.field.upsert",
+ description: "Add or replace one project-local payload schema field definition.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::NeverReplay,
+ }),
+ "schema.field.remove" => Some(ToolSpec {
+ name: "schema.field.remove",
+ description: "Remove one project-local payload schema field definition, optionally narrowed by node-class set.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::NeverReplay,
+ }),
"tag.add" => Some(ToolSpec {
name: "tag.add",
description: "Register one repo-local tag with a required description. Notes may only reference tags from this registry.",
@@ -145,9 +157,45 @@ pub(crate) fn tool_spec(name: &str) -> Option<ToolSpec> {
dispatch: DispatchTarget::Worker,
replay: ReplayContract::NeverReplay,
}),
+ "metric.define" => Some(ToolSpec {
+ name: "metric.define",
+ description: "Register one project-level metric definition so experiment ingestion only has to send key/value observations.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::NeverReplay,
+ }),
+ "run.dimension.define" => Some(ToolSpec {
+ name: "run.dimension.define",
+ description: "Register one project-level run dimension used to slice metrics across scenarios, budgets, and flags.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::NeverReplay,
+ }),
+ "run.dimension.list" => Some(ToolSpec {
+ name: "run.dimension.list",
+ description: "List registered run dimensions together with observed value counts and sample values.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::Convergent,
+ }),
+ "metric.keys" => Some(ToolSpec {
+ name: "metric.keys",
+ description: "List rankable metric keys, including registered run metrics and observed payload-derived numeric fields.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::Convergent,
+ }),
+ "metric.best" => Some(ToolSpec {
+ name: "metric.best",
+ description: "Rank completed experiments by one numeric key, with optional run-dimension filters and candidate commit surfacing.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::Convergent,
+ }),
+ "metric.migrate" => Some(ToolSpec {
+ name: "metric.migrate",
+ description: "Re-run the idempotent legacy metric-plane normalization that registers canonical metrics and backfills benchmark_suite dimensions.",
+ dispatch: DispatchTarget::Worker,
+ replay: ReplayContract::NeverReplay,
+ }),
"experiment.close" => Some(ToolSpec {
name: "experiment.close",
- description: "Atomically close a core-path experiment with candidate checkpoint capture, measured result, note, and verdict.",
+ description: "Atomically close a core-path experiment with typed run dimensions, preregistered metric observations, candidate checkpoint capture, note, and verdict.",
dispatch: DispatchTarget::Worker,
replay: ReplayContract::NeverReplay,
}),
@@ -212,6 +260,8 @@ pub(crate) fn tool_definitions() -> Vec<Value> {
"project.bind",
"project.status",
"project.schema",
+ "schema.field.upsert",
+ "schema.field.remove",
"tag.add",
"tag.list",
"frontier.list",
@@ -225,6 +275,12 @@ pub(crate) fn tool_definitions() -> Vec<Value> {
"node.archive",
"note.quick",
"research.record",
+ "metric.define",
+ "run.dimension.define",
+ "run.dimension.list",
+ "metric.keys",
+ "metric.best",
+ "metric.migrate",
"experiment.close",
"skill.list",
"skill.show",
@@ -277,7 +333,32 @@ pub(crate) fn list_resources() -> Vec<Value> {
fn input_schema(name: &str) -> Value {
match name {
"project.status" | "project.schema" | "tag.list" | "skill.list" | "system.health"
- | "system.telemetry" => json!({"type":"object","additionalProperties":false}),
+ | "system.telemetry" | "run.dimension.list" | "metric.migrate" => {
+ json!({"type":"object","additionalProperties":false})
+ }
+ "schema.field.upsert" => json!({
+ "type": "object",
+ "properties": {
+ "name": { "type": "string", "description": "Project payload field name." },
+ "node_classes": { "type": "array", "items": node_class_schema(), "description": "Optional node-class scope. Omit or pass [] for all classes." },
+ "presence": field_presence_schema(),
+ "severity": diagnostic_severity_schema(),
+ "role": field_role_schema(),
+ "inference_policy": inference_policy_schema(),
+ "value_type": field_value_type_schema(),
+ },
+ "required": ["name", "presence", "severity", "role", "inference_policy"],
+ "additionalProperties": false
+ }),
+ "schema.field.remove" => json!({
+ "type": "object",
+ "properties": {
+ "name": { "type": "string", "description": "Project payload field name." },
+ "node_classes": { "type": "array", "items": node_class_schema(), "description": "Optional exact node-class scope to remove." }
+ },
+ "required": ["name"],
+ "additionalProperties": false
+ }),
"project.bind" => json!({
"type": "object",
"properties": {
@@ -333,9 +414,9 @@ fn input_schema(name: &str) -> Value {
"class": node_class_schema(),
"frontier_id": { "type": "string" },
"title": { "type": "string" },
- "summary": { "type": "string" },
- "tags": { "type": "array", "items": tag_name_schema() },
- "payload": { "type": "object" },
+ "summary": { "type": "string", "description": "Required for `note` and `research` nodes." },
+ "tags": { "type": "array", "items": tag_name_schema(), "description": "Required for `note` nodes; optional for other classes." },
+ "payload": { "type": "object", "description": "`note` and `research` nodes require a non-empty string `body` field." },
"annotations": { "type": "array", "items": annotation_schema() },
"parents": { "type": "array", "items": { "type": "string" } }
},
@@ -393,12 +474,13 @@ fn input_schema(name: &str) -> Value {
"properties": {
"frontier_id": { "type": "string" },
"title": { "type": "string" },
+ "summary": { "type": "string" },
"body": { "type": "string" },
"tags": { "type": "array", "items": tag_name_schema() },
"annotations": { "type": "array", "items": annotation_schema() },
"parents": { "type": "array", "items": { "type": "string" } }
},
- "required": ["title", "body", "tags"],
+ "required": ["title", "summary", "body", "tags"],
"additionalProperties": false
}),
"research.record" => json!({
@@ -408,10 +490,54 @@ fn input_schema(name: &str) -> Value {
"title": { "type": "string" },
"summary": { "type": "string" },
"body": { "type": "string" },
+ "tags": { "type": "array", "items": tag_name_schema() },
"annotations": { "type": "array", "items": annotation_schema() },
"parents": { "type": "array", "items": { "type": "string" } }
},
- "required": ["title", "body"],
+ "required": ["title", "summary", "body"],
+ "additionalProperties": false
+ }),
+ "metric.define" => json!({
+ "type": "object",
+ "properties": {
+ "key": { "type": "string" },
+ "unit": metric_unit_schema(),
+ "objective": optimization_objective_schema(),
+ "description": { "type": "string" }
+ },
+ "required": ["key", "unit", "objective"],
+ "additionalProperties": false
+ }),
+ "run.dimension.define" => json!({
+ "type": "object",
+ "properties": {
+ "key": { "type": "string" },
+ "value_type": field_value_type_schema(),
+ "description": { "type": "string" }
+ },
+ "required": ["key", "value_type"],
+ "additionalProperties": false
+ }),
+ "metric.keys" => json!({
+ "type": "object",
+ "properties": {
+ "frontier_id": { "type": "string" },
+ "source": metric_source_schema(),
+ "dimensions": { "type": "object" }
+ },
+ "additionalProperties": false
+ }),
+ "metric.best" => json!({
+ "type": "object",
+ "properties": {
+ "key": { "type": "string" },
+ "frontier_id": { "type": "string" },
+ "source": metric_source_schema(),
+ "dimensions": { "type": "object" },
+ "order": metric_order_schema(),
+ "limit": { "type": "integer", "minimum": 1, "maximum": 500 }
+ },
+ "required": ["key"],
"additionalProperties": false
}),
"experiment.close" => json!({
@@ -422,8 +548,8 @@ fn input_schema(name: &str) -> Value {
"change_node_id": { "type": "string" },
"candidate_summary": { "type": "string" },
"run": run_schema(),
- "primary_metric": metric_observation_schema(),
- "supporting_metrics": { "type": "array", "items": metric_observation_schema() },
+ "primary_metric": metric_value_schema(),
+ "supporting_metrics": { "type": "array", "items": metric_value_schema() },
"note": note_schema(),
"verdict": verdict_schema(),
"decision_title": { "type": "string" },
@@ -461,16 +587,14 @@ fn metric_spec_schema() -> Value {
})
}
-fn metric_observation_schema() -> Value {
+fn metric_value_schema() -> Value {
json!({
"type": "object",
"properties": {
"key": { "type": "string" },
- "unit": metric_unit_schema(),
- "objective": optimization_objective_schema(),
"value": { "type": "number" }
},
- "required": ["key", "unit", "objective", "value"],
+ "required": ["key", "value"],
"additionalProperties": false
})
}
@@ -509,6 +633,61 @@ fn metric_unit_schema() -> Value {
})
}
+fn metric_source_schema() -> Value {
+ json!({
+ "type": "string",
+ "enum": [
+ "run_metric",
+ "change_payload",
+ "run_payload",
+ "analysis_payload",
+ "decision_payload"
+ ]
+ })
+}
+
+fn metric_order_schema() -> Value {
+ json!({
+ "type": "string",
+ "enum": ["asc", "desc"]
+ })
+}
+
+fn field_value_type_schema() -> Value {
+ json!({
+ "type": "string",
+ "enum": ["string", "numeric", "boolean", "timestamp"]
+ })
+}
+
+fn diagnostic_severity_schema() -> Value {
+ json!({
+ "type": "string",
+ "enum": ["error", "warning", "info"]
+ })
+}
+
+fn field_presence_schema() -> Value {
+ json!({
+ "type": "string",
+ "enum": ["required", "recommended", "optional"]
+ })
+}
+
+fn field_role_schema() -> Value {
+ json!({
+ "type": "string",
+ "enum": ["index", "projection_gate", "render_only", "opaque"]
+ })
+}
+
+fn inference_policy_schema() -> Value {
+ json!({
+ "type": "string",
+ "enum": ["manual_only", "model_may_infer"]
+ })
+}
+
fn optimization_objective_schema() -> Value {
json!({
"type": "string",
@@ -539,7 +718,7 @@ fn run_schema() -> Value {
"type": "string",
"enum": ["local_process", "worktree_process", "ssh_process"]
},
- "benchmark_suite": { "type": "string" },
+ "dimensions": { "type": "object" },
"command": {
"type": "object",
"properties": {
@@ -554,7 +733,7 @@ fn run_schema() -> Value {
"additionalProperties": false
}
},
- "required": ["title", "backend", "benchmark_suite", "command"],
+ "required": ["title", "backend", "dimensions", "command"],
"additionalProperties": false
})
}
diff --git a/crates/fidget-spinner-cli/src/mcp/host/runtime.rs b/crates/fidget-spinner-cli/src/mcp/host/runtime.rs
index f84f604..d57a21e 100644
--- a/crates/fidget-spinner-cli/src/mcp/host/runtime.rs
+++ b/crates/fidget-spinner-cli/src/mcp/host/runtime.rs
@@ -838,20 +838,6 @@ fn system_health_output(health: &HealthSnapshot) -> Result<ToolOutput, FaultReco
"rollout_pending".to_owned(),
json!(health.binary.rollout_pending),
);
- if let Some(fault) = health.last_fault.as_ref() {
- let _ = concise.insert(
- "last_fault".to_owned(),
- json!({
- "kind": format!("{:?}", fault.kind).to_ascii_lowercase(),
- "stage": format!("{:?}", fault.stage).to_ascii_lowercase(),
- "operation": fault.operation,
- "message": fault.message,
- "retryable": fault.retryable,
- "retried": fault.retried,
- }),
- );
- }
-
let mut lines = vec![format!(
"{} | {}",
if health.initialization.ready && health.initialization.seed_captured {
@@ -886,14 +872,6 @@ fn system_health_output(health: &HealthSnapshot) -> Result<ToolOutput, FaultReco
""
}
));
- if let Some(fault) = health.last_fault.as_ref() {
- lines.push(format!(
- "fault: {} {} {}",
- format!("{:?}", fault.kind).to_ascii_lowercase(),
- fault.operation,
- fault.message,
- ));
- }
detailed_tool_output(
&Value::Object(concise),
health,
diff --git a/crates/fidget-spinner-cli/src/mcp/service.rs b/crates/fidget-spinner-cli/src/mcp/service.rs
index aee53e0..62e3641 100644
--- a/crates/fidget-spinner-cli/src/mcp/service.rs
+++ b/crates/fidget-spinner-cli/src/mcp/service.rs
@@ -3,15 +3,18 @@ use std::fs;
use camino::{Utf8Path, Utf8PathBuf};
use fidget_spinner_core::{
- AdmissionState, AnnotationVisibility, CodeSnapshotRef, CommandRecipe, ExecutionBackend,
- FrontierContract, FrontierNote, FrontierProjection, FrontierRecord, FrontierVerdict,
- MetricObservation, MetricSpec, MetricUnit, NodeAnnotation, NodeClass, NodePayload,
- NonEmptyText, ProjectSchema, TagName, TagRecord,
+ AdmissionState, AnnotationVisibility, CodeSnapshotRef, CommandRecipe, DiagnosticSeverity,
+ ExecutionBackend, FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote,
+ FrontierProjection, FrontierRecord, FrontierVerdict, InferencePolicy, MetricSpec, MetricUnit,
+ MetricValue, NodeAnnotation, NodeClass, NodePayload, NonEmptyText, ProjectFieldSpec,
+ ProjectSchema, RunDimensionValue, TagName, TagRecord,
};
use fidget_spinner_store_sqlite::{
- CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, EdgeAttachment,
- EdgeAttachmentDirection, ExperimentReceipt, ListNodesQuery, NodeSummary, ProjectStore,
- StoreError,
+ CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest,
+ DefineRunDimensionRequest, EdgeAttachment, EdgeAttachmentDirection, ExperimentReceipt,
+ ListNodesQuery, MetricBestQuery, MetricFieldSource, MetricKeyQuery, MetricKeySummary,
+ MetricRankOrder, NodeSummary, ProjectStore, RemoveSchemaFieldRequest, StoreError,
+ UpsertSchemaFieldRequest,
};
use serde::Deserialize;
use serde_json::{Map, Value, json};
@@ -74,6 +77,73 @@ impl WorkerService {
FaultStage::Worker,
"tools/call:project.schema",
),
+ "schema.field.upsert" => {
+ let args = deserialize::<SchemaFieldUpsertToolArgs>(arguments)?;
+ let field = self
+ .store
+ .upsert_schema_field(UpsertSchemaFieldRequest {
+ name: NonEmptyText::new(args.name)
+ .map_err(store_fault("tools/call:schema.field.upsert"))?,
+ node_classes: args
+ .node_classes
+ .unwrap_or_default()
+ .into_iter()
+ .map(|class| {
+ parse_node_class_name(&class)
+ .map_err(store_fault("tools/call:schema.field.upsert"))
+ })
+ .collect::<Result<_, _>>()?,
+ presence: parse_field_presence_name(&args.presence)
+ .map_err(store_fault("tools/call:schema.field.upsert"))?,
+ severity: parse_diagnostic_severity_name(&args.severity)
+ .map_err(store_fault("tools/call:schema.field.upsert"))?,
+ role: parse_field_role_name(&args.role)
+ .map_err(store_fault("tools/call:schema.field.upsert"))?,
+ inference_policy: parse_inference_policy_name(&args.inference_policy)
+ .map_err(store_fault("tools/call:schema.field.upsert"))?,
+ value_type: args
+ .value_type
+ .as_deref()
+ .map(parse_field_value_type_name)
+ .transpose()
+ .map_err(store_fault("tools/call:schema.field.upsert"))?,
+ })
+ .map_err(store_fault("tools/call:schema.field.upsert"))?;
+ tool_success(
+ schema_field_upsert_output(self.store.schema(), &field)?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:schema.field.upsert",
+ )
+ }
+ "schema.field.remove" => {
+ let args = deserialize::<SchemaFieldRemoveToolArgs>(arguments)?;
+ let removed_count = self
+ .store
+ .remove_schema_field(RemoveSchemaFieldRequest {
+ name: NonEmptyText::new(args.name)
+ .map_err(store_fault("tools/call:schema.field.remove"))?,
+ node_classes: args
+ .node_classes
+ .map(|node_classes| {
+ node_classes
+ .into_iter()
+ .map(|class| {
+ parse_node_class_name(&class)
+ .map_err(store_fault("tools/call:schema.field.remove"))
+ })
+ .collect::<Result<_, _>>()
+ })
+ .transpose()?,
+ })
+ .map_err(store_fault("tools/call:schema.field.remove"))?;
+ tool_success(
+ schema_field_remove_output(self.store.schema(), removed_count)?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:schema.field.remove",
+ )
+ }
"tag.add" => {
let args = deserialize::<TagAddToolArgs>(arguments)?;
let tag = self
@@ -402,7 +472,10 @@ impl WorkerService {
.map_err(store_fault("tools/call:note.quick"))?,
title: NonEmptyText::new(args.title)
.map_err(store_fault("tools/call:note.quick"))?,
- summary: None,
+ summary: Some(
+ NonEmptyText::new(args.summary)
+ .map_err(store_fault("tools/call:note.quick"))?,
+ ),
tags: Some(
parse_tag_set(args.tags)
.map_err(store_fault("tools/call:note.quick"))?,
@@ -439,12 +512,14 @@ impl WorkerService {
.map_err(store_fault("tools/call:research.record"))?,
title: NonEmptyText::new(args.title)
.map_err(store_fault("tools/call:research.record"))?,
- summary: args
- .summary
- .map(NonEmptyText::new)
- .transpose()
- .map_err(store_fault("tools/call:research.record"))?,
- tags: None,
+ summary: Some(
+ NonEmptyText::new(args.summary)
+ .map_err(store_fault("tools/call:research.record"))?,
+ ),
+ tags: Some(
+ parse_tag_set(args.tags)
+ .map_err(store_fault("tools/call:research.record"))?,
+ ),
payload: NodePayload::with_schema(
self.store.schema().schema_ref(),
crate::json_object(json!({ "body": args.body }))
@@ -463,6 +538,170 @@ impl WorkerService {
"tools/call:research.record",
)
}
+ "metric.define" => {
+ let args = deserialize::<MetricDefineToolArgs>(arguments)?;
+ let metric = self
+ .store
+ .define_metric(DefineMetricRequest {
+ key: NonEmptyText::new(args.key)
+ .map_err(store_fault("tools/call:metric.define"))?,
+ unit: parse_metric_unit_name(&args.unit)
+ .map_err(store_fault("tools/call:metric.define"))?,
+ objective: crate::parse_optimization_objective(&args.objective)
+ .map_err(store_fault("tools/call:metric.define"))?,
+ description: args
+ .description
+ .map(NonEmptyText::new)
+ .transpose()
+ .map_err(store_fault("tools/call:metric.define"))?,
+ })
+ .map_err(store_fault("tools/call:metric.define"))?;
+ tool_success(
+ json_created_output(
+ "registered metric",
+ json!({
+ "key": metric.key,
+ "unit": metric_unit_name(metric.unit),
+ "objective": metric_objective_name(metric.objective),
+ "description": metric.description,
+ }),
+ "tools/call:metric.define",
+ )?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:metric.define",
+ )
+ }
+ "run.dimension.define" => {
+ let args = deserialize::<RunDimensionDefineToolArgs>(arguments)?;
+ let dimension = self
+ .store
+ .define_run_dimension(DefineRunDimensionRequest {
+ key: NonEmptyText::new(args.key)
+ .map_err(store_fault("tools/call:run.dimension.define"))?,
+ value_type: parse_field_value_type_name(&args.value_type)
+ .map_err(store_fault("tools/call:run.dimension.define"))?,
+ description: args
+ .description
+ .map(NonEmptyText::new)
+ .transpose()
+ .map_err(store_fault("tools/call:run.dimension.define"))?,
+ })
+ .map_err(store_fault("tools/call:run.dimension.define"))?;
+ tool_success(
+ json_created_output(
+ "registered run dimension",
+ json!({
+ "key": dimension.key,
+ "value_type": dimension.value_type.as_str(),
+ "description": dimension.description,
+ }),
+ "tools/call:run.dimension.define",
+ )?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:run.dimension.define",
+ )
+ }
+ "run.dimension.list" => {
+ let items = self
+ .store
+ .list_run_dimensions()
+ .map_err(store_fault("tools/call:run.dimension.list"))?;
+ tool_success(
+ run_dimension_list_output(items.as_slice())?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:run.dimension.list",
+ )
+ }
+ "metric.keys" => {
+ let args = deserialize::<MetricKeysToolArgs>(arguments)?;
+ let keys = self
+ .store
+ .list_metric_keys_filtered(MetricKeyQuery {
+ frontier_id: args
+ .frontier_id
+ .as_deref()
+ .map(crate::parse_frontier_id)
+ .transpose()
+ .map_err(store_fault("tools/call:metric.keys"))?,
+ source: args
+ .source
+ .as_deref()
+ .map(parse_metric_source_name)
+ .transpose()
+ .map_err(store_fault("tools/call:metric.keys"))?,
+ dimensions: coerce_tool_dimensions(
+ &self.store,
+ args.dimensions.unwrap_or_default(),
+ "tools/call:metric.keys",
+ )?,
+ })
+ .map_err(store_fault("tools/call:metric.keys"))?;
+ tool_success(
+ metric_keys_output(keys.as_slice())?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:metric.keys",
+ )
+ }
+ "metric.best" => {
+ let args = deserialize::<MetricBestToolArgs>(arguments)?;
+ let items = self
+ .store
+ .best_metrics(MetricBestQuery {
+ key: NonEmptyText::new(args.key)
+ .map_err(store_fault("tools/call:metric.best"))?,
+ frontier_id: args
+ .frontier_id
+ .as_deref()
+ .map(crate::parse_frontier_id)
+ .transpose()
+ .map_err(store_fault("tools/call:metric.best"))?,
+ source: args
+ .source
+ .as_deref()
+ .map(parse_metric_source_name)
+ .transpose()
+ .map_err(store_fault("tools/call:metric.best"))?,
+ dimensions: coerce_tool_dimensions(
+ &self.store,
+ args.dimensions.unwrap_or_default(),
+ "tools/call:metric.best",
+ )?,
+ order: args
+ .order
+ .as_deref()
+ .map(parse_metric_order_name)
+ .transpose()
+ .map_err(store_fault("tools/call:metric.best"))?,
+ limit: args.limit.unwrap_or(10),
+ })
+ .map_err(store_fault("tools/call:metric.best"))?;
+ tool_success(
+ metric_best_output(items.as_slice())?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:metric.best",
+ )
+ }
+ "metric.migrate" => {
+ let report = self
+ .store
+ .migrate_metric_plane()
+ .map_err(store_fault("tools/call:metric.migrate"))?;
+ tool_success(
+ json_created_output(
+ "normalized legacy metric plane",
+ json!(report),
+ "tools/call:metric.migrate",
+ )?,
+ presentation,
+ FaultStage::Worker,
+ "tools/call:metric.migrate",
+ )
+ }
"experiment.close" => {
let args = deserialize::<ExperimentCloseToolArgs>(arguments)?;
let frontier_id = crate::parse_frontier_id(&args.frontier_id)
@@ -507,8 +746,11 @@ impl WorkerService {
.map_err(store_fault("tools/call:experiment.close"))?,
backend: parse_backend_name(&args.run.backend)
.map_err(store_fault("tools/call:experiment.close"))?,
- benchmark_suite: NonEmptyText::new(args.run.benchmark_suite)
- .map_err(store_fault("tools/call:experiment.close"))?,
+ dimensions: coerce_tool_dimensions(
+ &self.store,
+ args.run.dimensions,
+ "tools/call:experiment.close",
+ )?,
command: command_recipe_from_wire(
args.run.command,
self.store.project_root(),
@@ -518,12 +760,12 @@ impl WorkerService {
capture_code_snapshot(self.store.project_root())
.map_err(store_fault("tools/call:experiment.close"))?,
),
- primary_metric: metric_observation_from_wire(args.primary_metric)
+ primary_metric: metric_value_from_wire(args.primary_metric)
.map_err(store_fault("tools/call:experiment.close"))?,
supporting_metrics: args
.supporting_metrics
.into_iter()
- .map(metric_observation_from_wire)
+ .map(metric_value_from_wire)
.collect::<Result<Vec<_>, _>>()
.map_err(store_fault("tools/call:experiment.close"))?,
note: FrontierNote {
@@ -547,7 +789,7 @@ impl WorkerService {
})
.map_err(store_fault("tools/call:experiment.close"))?;
tool_success(
- experiment_close_output(&receipt)?,
+ experiment_close_output(&self.store, &receipt)?,
presentation,
FaultStage::Worker,
"tools/call:experiment.close",
@@ -692,8 +934,8 @@ fn project_schema_output(schema: &ProjectSchema) -> Result<ToolOutput, FaultReco
.collect::<Vec<_>>()
.join(",")
},
- format!("{:?}", field.presence).to_ascii_lowercase(),
- format!("{:?}", field.role).to_ascii_lowercase(),
+ field.presence.as_str(),
+ field.role.as_str(),
));
}
if schema.fields.len() > 8 {
@@ -709,6 +951,59 @@ fn project_schema_output(schema: &ProjectSchema) -> Result<ToolOutput, FaultReco
)
}
+fn schema_field_upsert_output(
+ schema: &ProjectSchema,
+ field: &ProjectFieldSpec,
+) -> Result<ToolOutput, FaultRecord> {
+ let concise = json!({
+ "schema": schema.schema_ref(),
+ "field": project_schema_field_value(field),
+ });
+ detailed_tool_output(
+ &concise,
+ &concise,
+ format!(
+ "upserted schema field {}\nschema: {}\nclasses: {}\npresence: {}\nseverity: {}\nrole: {}\ninference: {}{}",
+ field.name,
+ schema_label(schema),
+ render_schema_node_classes(&field.node_classes),
+ field.presence.as_str(),
+ field.severity.as_str(),
+ field.role.as_str(),
+ field.inference_policy.as_str(),
+ field
+ .value_type
+ .map(|value_type| format!("\nvalue_type: {}", value_type.as_str()))
+ .unwrap_or_default(),
+ ),
+ None,
+ FaultStage::Worker,
+ "tools/call:schema.field.upsert",
+ )
+}
+
+fn schema_field_remove_output(
+ schema: &ProjectSchema,
+ removed_count: u64,
+) -> Result<ToolOutput, FaultRecord> {
+ let concise = json!({
+ "schema": schema.schema_ref(),
+ "removed_count": removed_count,
+ });
+ detailed_tool_output(
+ &concise,
+ &concise,
+ format!(
+ "removed {} schema field definition(s)\nschema: {}",
+ removed_count,
+ schema_label(schema),
+ ),
+ None,
+ FaultStage::Worker,
+ "tools/call:schema.field.remove",
+ )
+}
+
fn tag_add_output(tag: &TagRecord) -> Result<ToolOutput, FaultRecord> {
let concise = json!({
"name": tag.name,
@@ -892,14 +1187,33 @@ fn node_read_output(node: &fidget_spinner_core::DagNode) -> Result<ToolOutput, F
);
}
if !node.payload.fields.is_empty() {
- let _ = concise.insert(
- "payload_field_count".to_owned(),
- json!(node.payload.fields.len()),
- );
- let _ = concise.insert(
- "payload_preview".to_owned(),
- payload_preview_value(&node.payload.fields),
- );
+ let filtered_fields =
+ filtered_payload_fields(node.class, &node.payload.fields).collect::<Vec<_>>();
+ if !filtered_fields.is_empty() {
+ let _ = concise.insert(
+ "payload_field_count".to_owned(),
+ json!(filtered_fields.len()),
+ );
+ if is_prose_node(node.class) {
+ let _ = concise.insert(
+ "payload_fields".to_owned(),
+ json!(
+ filtered_fields
+ .iter()
+ .take(6)
+ .map(|(name, _)| (*name).clone())
+ .collect::<Vec<_>>()
+ ),
+ );
+ } else {
+ let payload_preview = payload_preview_value(node.class, &node.payload.fields);
+ if let Value::Object(object) = &payload_preview
+ && !object.is_empty()
+ {
+ let _ = concise.insert("payload_preview".to_owned(), payload_preview);
+ }
+ }
+ }
}
if !node.diagnostics.items.is_empty() {
let _ = concise.insert(
@@ -930,7 +1244,7 @@ fn node_read_output(node: &fidget_spinner_core::DagNode) -> Result<ToolOutput, F
if !node.tags.is_empty() {
lines.push(format!("tags: {}", format_tags(&node.tags)));
}
- lines.extend(payload_preview_lines(&node.payload.fields));
+ lines.extend(payload_preview_lines(node.class, &node.payload.fields));
if !node.diagnostics.items.is_empty() {
lines.push(format!(
"diagnostics: {}",
@@ -972,7 +1286,10 @@ fn node_read_output(node: &fidget_spinner_core::DagNode) -> Result<ToolOutput, F
)
}
-fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, FaultRecord> {
+fn experiment_close_output(
+ store: &ProjectStore,
+ receipt: &ExperimentReceipt,
+) -> Result<ToolOutput, FaultRecord> {
let concise = json!({
"experiment_id": receipt.experiment.id,
"frontier_id": receipt.experiment.frontier_id,
@@ -980,7 +1297,8 @@ fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, Fa
"verdict": format!("{:?}", receipt.experiment.verdict).to_ascii_lowercase(),
"run_id": receipt.run.run_id,
"decision_node_id": receipt.decision_node.id,
- "primary_metric": metric_value(&receipt.experiment.result.primary_metric),
+ "dimensions": run_dimensions_value(&receipt.experiment.result.dimensions),
+ "primary_metric": metric_value(store, &receipt.experiment.result.primary_metric)?,
});
detailed_tool_output(
&concise,
@@ -997,7 +1315,11 @@ fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, Fa
),
format!(
"primary metric: {}",
- metric_text(&receipt.experiment.result.primary_metric)
+ metric_text(store, &receipt.experiment.result.primary_metric)?
+ ),
+ format!(
+ "dimensions: {}",
+ render_dimension_kv(&receipt.experiment.result.dimensions)
),
format!("run: {}", receipt.run.run_id),
]
@@ -1008,7 +1330,181 @@ fn experiment_close_output(receipt: &ExperimentReceipt) -> Result<ToolOutput, Fa
)
}
-fn project_schema_field_value(field: &fidget_spinner_core::ProjectFieldSpec) -> Value {
+fn metric_keys_output(keys: &[MetricKeySummary]) -> Result<ToolOutput, FaultRecord> {
+ let concise = keys
+ .iter()
+ .map(|key| {
+ json!({
+ "key": key.key,
+ "source": key.source.as_str(),
+ "experiment_count": key.experiment_count,
+ "unit": key.unit.map(metric_unit_name),
+ "objective": key.objective.map(metric_objective_name),
+ "description": key.description,
+ "requires_order": key.requires_order,
+ })
+ })
+ .collect::<Vec<_>>();
+ let mut lines = vec![format!("{} metric key(s)", keys.len())];
+ lines.extend(keys.iter().map(|key| {
+ let mut line = format!(
+ "{} [{}] experiments={}",
+ key.key,
+ key.source.as_str(),
+ key.experiment_count
+ );
+ if let Some(unit) = key.unit {
+ line.push_str(format!(" unit={}", metric_unit_name(unit)).as_str());
+ }
+ if let Some(objective) = key.objective {
+ line.push_str(format!(" objective={}", metric_objective_name(objective)).as_str());
+ }
+ if let Some(description) = key.description.as_ref() {
+ line.push_str(format!(" | {description}").as_str());
+ }
+ if key.requires_order {
+ line.push_str(" order=required");
+ }
+ line
+ }));
+ detailed_tool_output(
+ &concise,
+ &keys,
+ lines.join("\n"),
+ None,
+ FaultStage::Worker,
+ "tools/call:metric.keys",
+ )
+}
+
+fn metric_best_output(
+ items: &[fidget_spinner_store_sqlite::MetricBestEntry],
+) -> Result<ToolOutput, FaultRecord> {
+ let concise = items
+ .iter()
+ .enumerate()
+ .map(|(index, item)| {
+ json!({
+ "rank": index + 1,
+ "key": item.key,
+ "source": item.source.as_str(),
+ "value": item.value,
+ "order": item.order.as_str(),
+ "experiment_id": item.experiment_id,
+ "frontier_id": item.frontier_id,
+ "change_node_id": item.change_node_id,
+ "change_title": item.change_title,
+ "verdict": metric_verdict_name(item.verdict),
+ "candidate_checkpoint_id": item.candidate_checkpoint_id,
+ "candidate_commit_hash": item.candidate_commit_hash,
+ "run_id": item.run_id,
+ "unit": item.unit.map(metric_unit_name),
+ "objective": item.objective.map(metric_objective_name),
+ "dimensions": run_dimensions_value(&item.dimensions),
+ })
+ })
+ .collect::<Vec<_>>();
+ let mut lines = vec![format!("{} ranked experiment(s)", items.len())];
+ lines.extend(items.iter().enumerate().map(|(index, item)| {
+ format!(
+ "{}. {}={} [{}] {} | verdict={} | commit={} | checkpoint={}",
+ index + 1,
+ item.key,
+ item.value,
+ item.source.as_str(),
+ item.change_title,
+ metric_verdict_name(item.verdict),
+ item.candidate_commit_hash,
+ item.candidate_checkpoint_id,
+ )
+ }));
+ lines.extend(
+ items
+ .iter()
+ .map(|item| format!(" dims: {}", render_dimension_kv(&item.dimensions))),
+ );
+ detailed_tool_output(
+ &concise,
+ &items,
+ lines.join("\n"),
+ None,
+ FaultStage::Worker,
+ "tools/call:metric.best",
+ )
+}
+
+fn run_dimension_list_output(
+ items: &[fidget_spinner_store_sqlite::RunDimensionSummary],
+) -> Result<ToolOutput, FaultRecord> {
+ let concise = items
+ .iter()
+ .map(|item| {
+ json!({
+ "key": item.key,
+ "value_type": item.value_type.as_str(),
+ "description": item.description,
+ "observed_run_count": item.observed_run_count,
+ "distinct_value_count": item.distinct_value_count,
+ "sample_values": item.sample_values,
+ })
+ })
+ .collect::<Vec<_>>();
+ let mut lines = vec![format!("{} run dimension(s)", items.len())];
+ lines.extend(items.iter().map(|item| {
+ let mut line = format!(
+ "{} [{}] runs={} distinct={}",
+ item.key,
+ item.value_type.as_str(),
+ item.observed_run_count,
+ item.distinct_value_count
+ );
+ if let Some(description) = item.description.as_ref() {
+ line.push_str(format!(" | {description}").as_str());
+ }
+ if !item.sample_values.is_empty() {
+ line.push_str(
+ format!(
+ " | samples={}",
+ item.sample_values
+ .iter()
+ .map(value_summary)
+ .collect::<Vec<_>>()
+ .join(", ")
+ )
+ .as_str(),
+ );
+ }
+ line
+ }));
+ detailed_tool_output(
+ &concise,
+ &items,
+ lines.join("\n"),
+ None,
+ FaultStage::Worker,
+ "tools/call:run.dimension.list",
+ )
+}
+
+fn json_created_output(
+ headline: &str,
+ structured: Value,
+ operation: &'static str,
+) -> Result<ToolOutput, FaultRecord> {
+ detailed_tool_output(
+ &structured,
+ &structured,
+ format!(
+ "{headline}\n{}",
+ crate::to_pretty_json(&structured).map_err(store_fault(operation))?
+ ),
+ None,
+ FaultStage::Worker,
+ operation,
+ )
+}
+
+fn project_schema_field_value(field: &ProjectFieldSpec) -> Value {
let mut value = Map::new();
let _ = value.insert("name".to_owned(), json!(field.name));
if !field.node_classes.is_empty() {
@@ -1023,21 +1519,12 @@ fn project_schema_field_value(field: &fidget_spinner_core::ProjectFieldSpec) ->
),
);
}
- let _ = value.insert(
- "presence".to_owned(),
- json!(format!("{:?}", field.presence).to_ascii_lowercase()),
- );
- let _ = value.insert(
- "severity".to_owned(),
- json!(format!("{:?}", field.severity).to_ascii_lowercase()),
- );
- let _ = value.insert(
- "role".to_owned(),
- json!(format!("{:?}", field.role).to_ascii_lowercase()),
- );
+ let _ = value.insert("presence".to_owned(), json!(field.presence.as_str()));
+ let _ = value.insert("severity".to_owned(), json!(field.severity.as_str()));
+ let _ = value.insert("role".to_owned(), json!(field.role.as_str()));
let _ = value.insert(
"inference_policy".to_owned(),
- json!(format!("{:?}", field.inference_policy).to_ascii_lowercase()),
+ json!(field.inference_policy.as_str()),
);
if let Some(value_type) = field.value_type {
let _ = value.insert("value_type".to_owned(), json!(value_type.as_str()));
@@ -1045,6 +1532,17 @@ fn project_schema_field_value(field: &fidget_spinner_core::ProjectFieldSpec) ->
Value::Object(value)
}
+fn render_schema_node_classes(node_classes: &BTreeSet<NodeClass>) -> String {
+ if node_classes.is_empty() {
+ return "any".to_owned();
+ }
+ node_classes
+ .iter()
+ .map(ToString::to_string)
+ .collect::<Vec<_>>()
+ .join(", ")
+}
+
fn frontier_projection_summary_value(projection: &FrontierProjection) -> Value {
json!({
"frontier_id": projection.frontier.id,
@@ -1211,17 +1709,17 @@ fn diagnostic_tally(diagnostics: &fidget_spinner_core::NodeDiagnostics) -> Diagn
.fold(DiagnosticTally::default(), |mut tally, item| {
tally.total += 1;
match item.severity {
- fidget_spinner_core::DiagnosticSeverity::Error => tally.errors += 1,
- fidget_spinner_core::DiagnosticSeverity::Warning => tally.warnings += 1,
- fidget_spinner_core::DiagnosticSeverity::Info => tally.infos += 1,
+ DiagnosticSeverity::Error => tally.errors += 1,
+ DiagnosticSeverity::Warning => tally.warnings += 1,
+ DiagnosticSeverity::Info => tally.infos += 1,
}
tally
})
}
-fn payload_preview_value(fields: &Map<String, Value>) -> Value {
+fn payload_preview_value(class: NodeClass, fields: &Map<String, Value>) -> Value {
let mut preview = Map::new();
- for (index, (name, value)) in fields.iter().enumerate() {
+ for (index, (name, value)) in filtered_payload_fields(class, fields).enumerate() {
if index == 6 {
let _ = preview.insert(
"...".to_owned(),
@@ -1234,14 +1732,33 @@ fn payload_preview_value(fields: &Map<String, Value>) -> Value {
Value::Object(preview)
}
-fn payload_preview_lines(fields: &Map<String, Value>) -> Vec<String> {
- if fields.is_empty() {
+fn payload_preview_lines(class: NodeClass, fields: &Map<String, Value>) -> Vec<String> {
+ let filtered = filtered_payload_fields(class, fields).collect::<Vec<_>>();
+ if filtered.is_empty() {
return Vec::new();
}
- let mut lines = vec![format!("payload fields: {}", fields.len())];
- for (index, (name, value)) in fields.iter().enumerate() {
+ if is_prose_node(class) {
+ let preview_names = filtered
+ .iter()
+ .take(6)
+ .map(|(name, _)| (*name).clone())
+ .collect::<Vec<_>>();
+ let mut lines = vec![format!("payload fields: {}", preview_names.join(", "))];
+ if filtered.len() > preview_names.len() {
+ lines.push(format!(
+ "payload fields: +{} more field(s)",
+ filtered.len() - preview_names.len()
+ ));
+ }
+ return lines;
+ }
+ let mut lines = vec![format!("payload fields: {}", filtered.len())];
+ for (index, (name, value)) in filtered.iter().enumerate() {
if index == 6 {
- lines.push(format!("payload: +{} more field(s)", fields.len() - index));
+ lines.push(format!(
+ "payload: +{} more field(s)",
+ filtered.len() - index
+ ));
break;
}
lines.push(format!(
@@ -1253,10 +1770,19 @@ fn payload_preview_lines(fields: &Map<String, Value>) -> Vec<String> {
lines
}
+fn filtered_payload_fields(
+ class: NodeClass,
+ fields: &Map<String, Value>,
+) -> impl Iterator<Item = (&String, &Value)> + '_ {
+ fields.iter().filter(move |(name, _)| {
+ !matches!(class, NodeClass::Note | NodeClass::Research) || name.as_str() != "body"
+ })
+}
+
fn payload_value_preview(value: &Value) -> Value {
match value {
Value::Null | Value::Bool(_) | Value::Number(_) => value.clone(),
- Value::String(text) => Value::String(libmcp::collapse_inline_whitespace(text)),
+ Value::String(text) => Value::String(truncated_inline_preview(text, 96)),
Value::Array(items) => {
let preview = items
.iter()
@@ -1290,25 +1816,89 @@ fn payload_value_preview(value: &Value) -> Value {
}
}
-fn metric_value(metric: &MetricObservation) -> Value {
- json!({
- "key": metric.metric_key,
+fn is_prose_node(class: NodeClass) -> bool {
+ matches!(class, NodeClass::Note | NodeClass::Research)
+}
+
+fn truncated_inline_preview(text: &str, limit: usize) -> String {
+ let collapsed = libmcp::collapse_inline_whitespace(text);
+ let truncated = libmcp::render::truncate_chars(&collapsed, Some(limit));
+ if truncated.truncated {
+ format!("{}...", truncated.text)
+ } else {
+ truncated.text
+ }
+}
+
+fn metric_value(store: &ProjectStore, metric: &MetricValue) -> Result<Value, FaultRecord> {
+ let definition = metric_definition(store, &metric.key)?;
+ Ok(json!({
+ "key": metric.key,
"value": metric.value,
- "unit": format!("{:?}", metric.unit).to_ascii_lowercase(),
- "objective": format!("{:?}", metric.objective).to_ascii_lowercase(),
- })
+ "unit": metric_unit_name(definition.unit),
+ "objective": metric_objective_name(definition.objective),
+ }))
}
-fn metric_text(metric: &MetricObservation) -> String {
- format!(
+fn metric_text(store: &ProjectStore, metric: &MetricValue) -> Result<String, FaultRecord> {
+ let definition = metric_definition(store, &metric.key)?;
+ Ok(format!(
"{}={} {} ({})",
- metric.metric_key,
+ metric.key,
metric.value,
- format!("{:?}", metric.unit).to_ascii_lowercase(),
- format!("{:?}", metric.objective).to_ascii_lowercase(),
+ metric_unit_name(definition.unit),
+ metric_objective_name(definition.objective),
+ ))
+}
+
+fn metric_unit_name(unit: MetricUnit) -> &'static str {
+ match unit {
+ MetricUnit::Seconds => "seconds",
+ MetricUnit::Bytes => "bytes",
+ MetricUnit::Count => "count",
+ MetricUnit::Ratio => "ratio",
+ MetricUnit::Custom => "custom",
+ }
+}
+
+fn metric_objective_name(objective: fidget_spinner_core::OptimizationObjective) -> &'static str {
+ match objective {
+ fidget_spinner_core::OptimizationObjective::Minimize => "minimize",
+ fidget_spinner_core::OptimizationObjective::Maximize => "maximize",
+ fidget_spinner_core::OptimizationObjective::Target => "target",
+ }
+}
+
+fn metric_verdict_name(verdict: FrontierVerdict) -> &'static str {
+ match verdict {
+ FrontierVerdict::PromoteToChampion => "promote_to_champion",
+ FrontierVerdict::KeepOnFrontier => "keep_on_frontier",
+ FrontierVerdict::RevertToChampion => "revert_to_champion",
+ FrontierVerdict::ArchiveDeadEnd => "archive_dead_end",
+ FrontierVerdict::NeedsMoreEvidence => "needs_more_evidence",
+ }
+}
+
+fn run_dimensions_value(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> Value {
+ Value::Object(
+ dimensions
+ .iter()
+ .map(|(key, value)| (key.to_string(), value.as_json()))
+ .collect::<Map<String, Value>>(),
)
}
+fn render_dimension_kv(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> String {
+ if dimensions.is_empty() {
+ return "none".to_owned();
+ }
+ dimensions
+ .iter()
+ .map(|(key, value)| format!("{key}={}", value_summary(&value.as_json())))
+ .collect::<Vec<_>>()
+ .join(", ")
+}
+
fn format_tags(tags: &BTreeSet<TagName>) -> String {
tags.iter()
.map(ToString::to_string)
@@ -1360,6 +1950,12 @@ fn classify_fault_kind(message: &str) -> FaultKind {
|| message.contains("empty")
|| message.contains("already exists")
|| message.contains("require an explicit tag list")
+ || message.contains("requires a non-empty summary")
+ || message.contains("requires a non-empty string payload field `body`")
+ || message.contains("requires an explicit order")
+ || message.contains("is ambiguous across sources")
+ || message.contains("has conflicting semantics")
+ || message.contains("conflicts with existing definition")
{
FaultKind::InvalidInput
} else {
@@ -1414,17 +2010,44 @@ fn metric_spec_from_wire(raw: WireMetricSpec) -> Result<MetricSpec, StoreError>
})
}
-fn metric_observation_from_wire(
- raw: WireMetricObservation,
-) -> Result<MetricObservation, StoreError> {
- Ok(MetricObservation {
- metric_key: NonEmptyText::new(raw.key)?,
- unit: parse_metric_unit_name(&raw.unit)?,
- objective: crate::parse_optimization_objective(&raw.objective)?,
+fn metric_value_from_wire(raw: WireMetricValue) -> Result<MetricValue, StoreError> {
+ Ok(MetricValue {
+ key: NonEmptyText::new(raw.key)?,
value: raw.value,
})
}
+fn metric_definition(store: &ProjectStore, key: &NonEmptyText) -> Result<MetricSpec, FaultRecord> {
+ store
+ .list_metric_definitions()
+ .map_err(store_fault("tools/call:experiment.close"))?
+ .into_iter()
+ .find(|definition| definition.key == *key)
+ .map(|definition| MetricSpec {
+ metric_key: definition.key,
+ unit: definition.unit,
+ objective: definition.objective,
+ })
+ .ok_or_else(|| {
+ FaultRecord::new(
+ FaultKind::InvalidInput,
+ FaultStage::Store,
+ "tools/call:experiment.close",
+ format!("metric `{key}` is not registered"),
+ )
+ })
+}
+
+fn coerce_tool_dimensions(
+ store: &ProjectStore,
+ raw_dimensions: BTreeMap<String, Value>,
+ operation: &'static str,
+) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, FaultRecord> {
+ store
+ .coerce_run_dimensions(raw_dimensions)
+ .map_err(store_fault(operation))
+}
+
fn command_recipe_from_wire(
raw: WireRunCommand,
project_root: &Utf8Path,
@@ -1465,6 +2088,91 @@ fn parse_metric_unit_name(raw: &str) -> Result<MetricUnit, StoreError> {
crate::parse_metric_unit(raw)
}
+fn parse_metric_source_name(raw: &str) -> Result<MetricFieldSource, StoreError> {
+ match raw {
+ "run_metric" => Ok(MetricFieldSource::RunMetric),
+ "change_payload" => Ok(MetricFieldSource::ChangePayload),
+ "run_payload" => Ok(MetricFieldSource::RunPayload),
+ "analysis_payload" => Ok(MetricFieldSource::AnalysisPayload),
+ "decision_payload" => Ok(MetricFieldSource::DecisionPayload),
+ other => Err(StoreError::Json(serde_json::Error::io(
+ std::io::Error::new(
+ std::io::ErrorKind::InvalidInput,
+ format!("unknown metric source `{other}`"),
+ ),
+ ))),
+ }
+}
+
+fn parse_metric_order_name(raw: &str) -> Result<MetricRankOrder, StoreError> {
+ match raw {
+ "asc" => Ok(MetricRankOrder::Asc),
+ "desc" => Ok(MetricRankOrder::Desc),
+ other => Err(StoreError::Json(serde_json::Error::io(
+ std::io::Error::new(
+ std::io::ErrorKind::InvalidInput,
+ format!("unknown metric order `{other}`"),
+ ),
+ ))),
+ }
+}
+
+fn parse_field_value_type_name(raw: &str) -> Result<FieldValueType, StoreError> {
+ match raw {
+ "string" => Ok(FieldValueType::String),
+ "numeric" => Ok(FieldValueType::Numeric),
+ "boolean" => Ok(FieldValueType::Boolean),
+ "timestamp" => Ok(FieldValueType::Timestamp),
+ other => Err(crate::invalid_input(format!(
+ "unknown field value type `{other}`"
+ ))),
+ }
+}
+
+fn parse_diagnostic_severity_name(raw: &str) -> Result<DiagnosticSeverity, StoreError> {
+ match raw {
+ "error" => Ok(DiagnosticSeverity::Error),
+ "warning" => Ok(DiagnosticSeverity::Warning),
+ "info" => Ok(DiagnosticSeverity::Info),
+ other => Err(crate::invalid_input(format!(
+ "unknown diagnostic severity `{other}`"
+ ))),
+ }
+}
+
+fn parse_field_presence_name(raw: &str) -> Result<FieldPresence, StoreError> {
+ match raw {
+ "required" => Ok(FieldPresence::Required),
+ "recommended" => Ok(FieldPresence::Recommended),
+ "optional" => Ok(FieldPresence::Optional),
+ other => Err(crate::invalid_input(format!(
+ "unknown field presence `{other}`"
+ ))),
+ }
+}
+
+fn parse_field_role_name(raw: &str) -> Result<FieldRole, StoreError> {
+ match raw {
+ "index" => Ok(FieldRole::Index),
+ "projection_gate" => Ok(FieldRole::ProjectionGate),
+ "render_only" => Ok(FieldRole::RenderOnly),
+ "opaque" => Ok(FieldRole::Opaque),
+ other => Err(crate::invalid_input(format!(
+ "unknown field role `{other}`"
+ ))),
+ }
+}
+
+fn parse_inference_policy_name(raw: &str) -> Result<InferencePolicy, StoreError> {
+ match raw {
+ "manual_only" => Ok(InferencePolicy::ManualOnly),
+ "model_may_infer" => Ok(InferencePolicy::ModelMayInfer),
+ other => Err(crate::invalid_input(format!(
+ "unknown inference policy `{other}`"
+ ))),
+ }
+}
+
fn parse_backend_name(raw: &str) -> Result<ExecutionBackend, StoreError> {
match raw {
"local_process" => Ok(ExecutionBackend::LocalProcess),
@@ -1574,6 +2282,7 @@ struct NodeArchiveToolArgs {
struct QuickNoteToolArgs {
frontier_id: Option<String>,
title: String,
+ summary: String,
body: String,
tags: Vec<String>,
#[serde(default)]
@@ -1586,24 +2295,75 @@ struct QuickNoteToolArgs {
struct ResearchRecordToolArgs {
frontier_id: Option<String>,
title: String,
- summary: Option<String>,
+ summary: String,
body: String,
#[serde(default)]
+ tags: Vec<String>,
+ #[serde(default)]
annotations: Vec<WireAnnotation>,
#[serde(default)]
parents: Vec<String>,
}
#[derive(Debug, Deserialize)]
+struct SchemaFieldUpsertToolArgs {
+ name: String,
+ node_classes: Option<Vec<String>>,
+ presence: String,
+ severity: String,
+ role: String,
+ inference_policy: String,
+ value_type: Option<String>,
+}
+
+#[derive(Debug, Deserialize)]
+struct SchemaFieldRemoveToolArgs {
+ name: String,
+ node_classes: Option<Vec<String>>,
+}
+
+#[derive(Debug, Deserialize)]
+struct MetricDefineToolArgs {
+ key: String,
+ unit: String,
+ objective: String,
+ description: Option<String>,
+}
+
+#[derive(Debug, Deserialize)]
+struct RunDimensionDefineToolArgs {
+ key: String,
+ value_type: String,
+ description: Option<String>,
+}
+
+#[derive(Debug, Deserialize, Default)]
+struct MetricKeysToolArgs {
+ frontier_id: Option<String>,
+ source: Option<String>,
+ dimensions: Option<BTreeMap<String, Value>>,
+}
+
+#[derive(Debug, Deserialize)]
+struct MetricBestToolArgs {
+ key: String,
+ frontier_id: Option<String>,
+ source: Option<String>,
+ dimensions: Option<BTreeMap<String, Value>>,
+ order: Option<String>,
+ limit: Option<u32>,
+}
+
+#[derive(Debug, Deserialize)]
struct ExperimentCloseToolArgs {
frontier_id: String,
base_checkpoint_id: String,
change_node_id: String,
candidate_summary: String,
run: WireRun,
- primary_metric: WireMetricObservation,
+ primary_metric: WireMetricValue,
#[serde(default)]
- supporting_metrics: Vec<WireMetricObservation>,
+ supporting_metrics: Vec<WireMetricValue>,
note: WireFrontierNote,
verdict: String,
decision_title: String,
@@ -1627,10 +2387,8 @@ struct WireMetricSpec {
}
#[derive(Debug, Deserialize)]
-struct WireMetricObservation {
+struct WireMetricValue {
key: String,
- unit: String,
- objective: String,
value: f64,
}
@@ -1639,7 +2397,8 @@ struct WireRun {
title: String,
summary: Option<String>,
backend: String,
- benchmark_suite: String,
+ #[serde(default)]
+ dimensions: BTreeMap<String, Value>,
command: WireRunCommand,
}
diff --git a/crates/fidget-spinner-cli/src/mcp/telemetry.rs b/crates/fidget-spinner-cli/src/mcp/telemetry.rs
index 7206f76..93fbe71 100644
--- a/crates/fidget-spinner-cli/src/mcp/telemetry.rs
+++ b/crates/fidget-spinner-cli/src/mcp/telemetry.rs
@@ -36,6 +36,7 @@ impl ServerTelemetry {
pub fn record_success(&mut self, operation: &str, latency_ms: u128) {
self.successes += 1;
+ self.last_fault = None;
let entry = self.operations.entry(operation.to_owned()).or_default();
entry.successes += 1;
entry.last_latency_ms = Some(latency_ms);
diff --git a/crates/fidget-spinner-cli/tests/mcp_hardening.rs b/crates/fidget-spinner-cli/tests/mcp_hardening.rs
index 6c81d7f..f076d74 100644
--- a/crates/fidget-spinner-cli/tests/mcp_hardening.rs
+++ b/crates/fidget-spinner-cli/tests/mcp_hardening.rs
@@ -20,7 +20,10 @@ use uuid as _;
type TestResult<T = ()> = Result<T, Box<dyn std::error::Error>>;
-fn must<T, E: std::fmt::Display>(result: Result<T, E>, context: &str) -> TestResult<T> {
+fn must<T, E: std::fmt::Display, C: std::fmt::Display>(
+ result: Result<T, E>,
+ context: C,
+) -> TestResult<T> {
result.map_err(|error| io::Error::other(format!("{context}: {error}")).into())
}
@@ -54,6 +57,48 @@ fn init_project(root: &Utf8PathBuf) -> TestResult {
Ok(())
}
+fn run_command(root: &Utf8PathBuf, program: &str, args: &[&str]) -> TestResult<String> {
+ let output = must(
+ Command::new(program)
+ .current_dir(root.as_std_path())
+ .args(args)
+ .output(),
+ format!("{program} spawn"),
+ )?;
+ if !output.status.success() {
+ return Err(io::Error::other(format!(
+ "{program} {:?} failed: {}",
+ args,
+ String::from_utf8_lossy(&output.stderr)
+ ))
+ .into());
+ }
+ Ok(String::from_utf8_lossy(&output.stdout).trim().to_owned())
+}
+
+fn run_git(root: &Utf8PathBuf, args: &[&str]) -> TestResult<String> {
+ run_command(root, "git", args)
+}
+
+fn init_git_project(root: &Utf8PathBuf) -> TestResult<String> {
+ let _ = run_git(root, &["init", "-b", "main"])?;
+ let _ = run_git(root, &["config", "user.name", "main"])?;
+ let _ = run_git(root, &["config", "user.email", "main@swarm.moe"])?;
+ let _ = run_git(root, &["add", "-A"])?;
+ let _ = run_git(root, &["commit", "-m", "initial state"])?;
+ run_git(root, &["rev-parse", "HEAD"])
+}
+
+fn commit_project_state(root: &Utf8PathBuf, marker: &str, message: &str) -> TestResult<String> {
+ must(
+ fs::write(root.join(marker).as_std_path(), message),
+ format!("write marker {marker}"),
+ )?;
+ let _ = run_git(root, &["add", "-A"])?;
+ let _ = run_git(root, &["commit", "-m", message])?;
+ run_git(root, &["rev-parse", "HEAD"])
+}
+
fn binary_path() -> PathBuf {
PathBuf::from(env!("CARGO_BIN_EXE_fidget-spinner-cli"))
}
@@ -172,6 +217,10 @@ fn tool_text(response: &Value) -> Option<&str> {
.and_then(|entry| entry["text"].as_str())
}
+fn fault_message(response: &Value) -> Option<&str> {
+ response["result"]["structuredContent"]["message"].as_str()
+}
+
#[test]
fn cold_start_exposes_health_and_telemetry() -> TestResult {
let project_root = temp_project_root("cold_start")?;
@@ -361,6 +410,7 @@ fn side_effecting_request_is_not_replayed_after_worker_crash() -> TestResult {
"research.record",
json!({
"title": "should not duplicate",
+ "summary": "dedupe check",
"body": "host crash before worker execution",
}),
)?;
@@ -463,6 +513,57 @@ fn bind_rejects_nonempty_uninitialized_root() -> TestResult {
}
#[test]
+fn successful_bind_clears_stale_fault_from_health() -> TestResult {
+ let bad_root = temp_project_root("bind_fault_bad")?;
+ must(
+ fs::write(bad_root.join("README.txt").as_std_path(), "occupied"),
+ "seed bad bind root",
+ )?;
+ let good_root = temp_project_root("bind_fault_good")?;
+ init_project(&good_root)?;
+
+ let mut harness = McpHarness::spawn(None, &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+
+ let failed_bind = harness.bind_project(301, &bad_root)?;
+ assert_eq!(failed_bind["result"]["isError"].as_bool(), Some(true));
+
+ let failed_health = harness.call_tool(302, "system.health", json!({ "detail": "full" }))?;
+ assert_eq!(
+ tool_content(&failed_health)["last_fault"]["operation"].as_str(),
+ Some("tools/call:project.bind")
+ );
+
+ let good_bind = harness.bind_project(303, &good_root)?;
+ assert_eq!(good_bind["result"]["isError"].as_bool(), Some(false));
+
+ let recovered_health = harness.call_tool(304, "system.health", json!({}))?;
+ assert_eq!(recovered_health["result"]["isError"].as_bool(), Some(false));
+ assert!(tool_content(&recovered_health).get("last_fault").is_none());
+ assert!(!must_some(tool_text(&recovered_health), "recovered health text")?.contains("fault:"));
+
+ let recovered_health_full =
+ harness.call_tool(306, "system.health", json!({ "detail": "full" }))?;
+ assert_eq!(
+ tool_content(&recovered_health_full)["last_fault"],
+ Value::Null,
+ );
+
+ let recovered_telemetry = harness.call_tool(305, "system.telemetry", json!({}))?;
+ assert_eq!(
+ recovered_telemetry["result"]["isError"].as_bool(),
+ Some(false)
+ );
+ assert_eq!(
+ tool_content(&recovered_telemetry)["errors"].as_u64(),
+ Some(1)
+ );
+ assert!(tool_content(&recovered_telemetry)["last_fault"].is_null());
+ Ok(())
+}
+
+#[test]
fn bind_retargets_writes_to_sibling_project_root() -> TestResult {
let spinner_root = temp_project_root("spinner_root")?;
let libgrid_root = temp_project_root("libgrid_root")?;
@@ -502,6 +603,7 @@ fn bind_retargets_writes_to_sibling_project_root() -> TestResult {
"note.quick",
json!({
"title": "libgrid dogfood note",
+ "summary": "rebind summary",
"body": "rebind should redirect writes",
"tags": [],
}),
@@ -545,6 +647,7 @@ fn tag_registry_drives_note_creation_and_lookup() -> TestResult {
"note.quick",
json!({
"title": "untagged",
+ "summary": "should fail without explicit tags",
"body": "should fail",
}),
)?;
@@ -570,6 +673,7 @@ fn tag_registry_drives_note_creation_and_lookup() -> TestResult {
"note.quick",
json!({
"title": "tagged note",
+ "summary": "tagged lookup summary",
"body": "tagged lookup should work",
"tags": ["dogfood/mcp"],
}),
@@ -582,3 +686,937 @@ fn tag_registry_drives_note_creation_and_lookup() -> TestResult {
assert_eq!(nodes[0]["tags"][0].as_str(), Some("dogfood/mcp"));
Ok(())
}
+
+#[test]
+fn research_record_accepts_tags_and_filtering() -> TestResult {
+ let project_root = temp_project_root("research_tags")?;
+ init_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(None, &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+ let bind = harness.bind_project(451, &project_root)?;
+ assert_eq!(bind["result"]["isError"].as_bool(), Some(false));
+
+ let tag = harness.call_tool(
+ 452,
+ "tag.add",
+ json!({
+ "name": "campaign/libgrid",
+ "description": "libgrid migration campaign",
+ }),
+ )?;
+ assert_eq!(tag["result"]["isError"].as_bool(), Some(false));
+
+ let research = harness.call_tool(
+ 453,
+ "research.record",
+ json!({
+ "title": "ingest tranche",
+ "summary": "Import the next libgrid tranche.",
+ "body": "Full import notes live here.",
+ "tags": ["campaign/libgrid"],
+ }),
+ )?;
+ assert_eq!(research["result"]["isError"].as_bool(), Some(false));
+
+ let filtered = harness.call_tool(454, "node.list", json!({"tags": ["campaign/libgrid"]}))?;
+ let nodes = must_some(
+ tool_content(&filtered).as_array(),
+ "filtered research nodes",
+ )?;
+ assert_eq!(nodes.len(), 1);
+ assert_eq!(nodes[0]["class"].as_str(), Some("research"));
+ assert_eq!(nodes[0]["tags"][0].as_str(), Some("campaign/libgrid"));
+ Ok(())
+}
+
+#[test]
+fn prose_tools_reject_invalid_shapes_over_mcp() -> TestResult {
+ let project_root = temp_project_root("prose_invalid")?;
+ init_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(None, &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+ let bind = harness.bind_project(46, &project_root)?;
+ assert_eq!(bind["result"]["isError"].as_bool(), Some(false));
+
+ let missing_note_summary = harness.call_tool(
+ 47,
+ "note.quick",
+ json!({
+ "title": "untagged",
+ "body": "body only",
+ "tags": [],
+ }),
+ )?;
+ assert_eq!(
+ missing_note_summary["result"]["isError"].as_bool(),
+ Some(true)
+ );
+ assert!(
+ fault_message(&missing_note_summary)
+ .is_some_and(|message| message.contains("summary") || message.contains("missing field"))
+ );
+
+ let missing_research_summary = harness.call_tool(
+ 48,
+ "research.record",
+ json!({
+ "title": "research only",
+ "body": "body only",
+ }),
+ )?;
+ assert_eq!(
+ missing_research_summary["result"]["isError"].as_bool(),
+ Some(true)
+ );
+ assert!(
+ fault_message(&missing_research_summary)
+ .is_some_and(|message| message.contains("summary") || message.contains("missing field"))
+ );
+
+ let note_without_body = harness.call_tool(
+ 49,
+ "node.create",
+ json!({
+ "class": "note",
+ "title": "missing body",
+ "summary": "triage layer",
+ "tags": [],
+ "payload": {},
+ }),
+ )?;
+ assert_eq!(note_without_body["result"]["isError"].as_bool(), Some(true));
+ assert!(
+ fault_message(&note_without_body)
+ .is_some_and(|message| message.contains("payload field `body`"))
+ );
+
+ let research_without_summary = harness.call_tool(
+ 50,
+ "node.create",
+ json!({
+ "class": "research",
+ "title": "missing summary",
+ "payload": { "body": "full research body" },
+ }),
+ )?;
+ assert_eq!(
+ research_without_summary["result"]["isError"].as_bool(),
+ Some(true)
+ );
+ assert!(
+ fault_message(&research_without_summary)
+ .is_some_and(|message| message.contains("non-empty summary"))
+ );
+ Ok(())
+}
+
+#[test]
+fn concise_note_reads_do_not_leak_body_text() -> TestResult {
+ let project_root = temp_project_root("concise_note_read")?;
+ init_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(None, &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+ let bind = harness.bind_project(50, &project_root)?;
+ assert_eq!(bind["result"]["isError"].as_bool(), Some(false));
+
+ let note = harness.call_tool(
+ 51,
+ "note.quick",
+ json!({
+ "title": "tagged note",
+ "summary": "triage layer",
+ "body": "full note body should stay out of concise reads",
+ "tags": [],
+ }),
+ )?;
+ assert_eq!(note["result"]["isError"].as_bool(), Some(false));
+ let node_id = must_some(tool_content(&note)["id"].as_str(), "created note id")?.to_owned();
+
+ let concise = harness.call_tool(52, "node.read", json!({ "node_id": node_id }))?;
+ let concise_structured = tool_content(&concise);
+ assert_eq!(concise_structured["summary"].as_str(), Some("triage layer"));
+ assert!(concise_structured["payload_preview"].get("body").is_none());
+ assert!(
+ !must_some(tool_text(&concise), "concise note.read text")?
+ .contains("full note body should stay out of concise reads")
+ );
+
+ let full = harness.call_tool(
+ 53,
+ "node.read",
+ json!({ "node_id": node_id, "detail": "full" }),
+ )?;
+ assert_eq!(
+ tool_content(&full)["payload"]["fields"]["body"].as_str(),
+ Some("full note body should stay out of concise reads")
+ );
+ Ok(())
+}
+
+#[test]
+fn concise_prose_reads_only_surface_payload_field_names() -> TestResult {
+ let project_root = temp_project_root("concise_prose_field_names")?;
+ init_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(None, &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+ let bind = harness.bind_project(531, &project_root)?;
+ assert_eq!(bind["result"]["isError"].as_bool(), Some(false));
+
+ let research = harness.call_tool(
+ 532,
+ "node.create",
+ json!({
+ "class": "research",
+ "title": "rich import",
+ "summary": "triage layer only",
+ "payload": {
+ "body": "Body stays out of concise output.",
+ "source_excerpt": "This imported excerpt is intentionally long and should never reappear in concise node reads as a value preview.",
+ "verbatim_snippet": "Another long snippet that belongs in full payload inspection only, not in triage surfaces."
+ }
+ }),
+ )?;
+ assert_eq!(research["result"]["isError"].as_bool(), Some(false));
+ let node_id = must_some(
+ tool_content(&research)["id"].as_str(),
+ "created research id",
+ )?
+ .to_owned();
+
+ let concise = harness.call_tool(533, "node.read", json!({ "node_id": node_id }))?;
+ let concise_structured = tool_content(&concise);
+ assert_eq!(concise_structured["payload_field_count"].as_u64(), Some(2));
+ let payload_fields = must_some(
+ concise_structured["payload_fields"].as_array(),
+ "concise prose payload fields",
+ )?;
+ assert!(
+ payload_fields
+ .iter()
+ .any(|field| field.as_str() == Some("source_excerpt"))
+ );
+ assert!(concise_structured.get("payload_preview").is_none());
+ let concise_text = must_some(tool_text(&concise), "concise prose read text")?;
+ assert!(!concise_text.contains("This imported excerpt is intentionally long"));
+ assert!(concise_text.contains("payload fields: source_excerpt, verbatim_snippet"));
+ Ok(())
+}
+
+#[test]
+fn node_list_does_not_enumerate_full_prose_bodies() -> TestResult {
+ let project_root = temp_project_root("node_list_no_body_leak")?;
+ init_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(None, &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+ let bind = harness.bind_project(54, &project_root)?;
+ assert_eq!(bind["result"]["isError"].as_bool(), Some(false));
+
+ let note = harness.call_tool(
+ 55,
+ "note.quick",
+ json!({
+ "title": "tagged note",
+ "summary": "triage summary",
+ "body": "full note body should never appear in list-like surfaces",
+ "tags": [],
+ }),
+ )?;
+ assert_eq!(note["result"]["isError"].as_bool(), Some(false));
+
+ let listed = harness.call_tool(56, "node.list", json!({ "class": "note" }))?;
+ let listed_rows = must_some(tool_content(&listed).as_array(), "listed note rows")?;
+ assert_eq!(listed_rows.len(), 1);
+ assert_eq!(listed_rows[0]["summary"].as_str(), Some("triage summary"));
+ assert!(listed_rows[0].get("body").is_none());
+ assert!(
+ !must_some(tool_text(&listed), "node.list text")?
+ .contains("full note body should never appear in list-like surfaces")
+ );
+ Ok(())
+}
+
+#[test]
+fn metric_tools_are_listed_for_discovery() -> TestResult {
+ let project_root = temp_project_root("metric_tool_list")?;
+ init_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(Some(&project_root), &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+ let tools = harness.tools_list()?;
+ let names = must_some(tools["result"]["tools"].as_array(), "tool list")?
+ .iter()
+ .filter_map(|tool| tool["name"].as_str())
+ .collect::<Vec<_>>();
+ assert!(names.contains(&"metric.define"));
+ assert!(names.contains(&"metric.keys"));
+ assert!(names.contains(&"metric.best"));
+ assert!(names.contains(&"metric.migrate"));
+ assert!(names.contains(&"run.dimension.define"));
+ assert!(names.contains(&"run.dimension.list"));
+ assert!(names.contains(&"schema.field.upsert"));
+ assert!(names.contains(&"schema.field.remove"));
+ Ok(())
+}
+
+#[test]
+fn schema_field_tools_mutate_project_schema() -> TestResult {
+ let project_root = temp_project_root("schema_field_tools")?;
+ init_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(Some(&project_root), &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+
+ let upsert = harness.call_tool(
+ 861,
+ "schema.field.upsert",
+ json!({
+ "name": "scenario",
+ "node_classes": ["change", "analysis"],
+ "presence": "recommended",
+ "severity": "warning",
+ "role": "projection_gate",
+ "inference_policy": "manual_only",
+ "value_type": "string"
+ }),
+ )?;
+ assert_eq!(upsert["result"]["isError"].as_bool(), Some(false));
+ assert_eq!(
+ tool_content(&upsert)["field"]["name"].as_str(),
+ Some("scenario")
+ );
+ assert_eq!(
+ tool_content(&upsert)["field"]["node_classes"],
+ json!(["change", "analysis"])
+ );
+
+ let schema = harness.call_tool(862, "project.schema", json!({ "detail": "full" }))?;
+ assert_eq!(schema["result"]["isError"].as_bool(), Some(false));
+ let fields = must_some(tool_content(&schema)["fields"].as_array(), "schema fields")?;
+ assert!(fields.iter().any(|field| {
+ field["name"].as_str() == Some("scenario") && field["value_type"].as_str() == Some("string")
+ }));
+
+ let remove = harness.call_tool(
+ 863,
+ "schema.field.remove",
+ json!({
+ "name": "scenario",
+ "node_classes": ["change", "analysis"]
+ }),
+ )?;
+ assert_eq!(remove["result"]["isError"].as_bool(), Some(false));
+ assert_eq!(tool_content(&remove)["removed_count"].as_u64(), Some(1));
+
+ let schema_after = harness.call_tool(864, "project.schema", json!({ "detail": "full" }))?;
+ let fields_after = must_some(
+ tool_content(&schema_after)["fields"].as_array(),
+ "schema fields after remove",
+ )?;
+ assert!(
+ !fields_after
+ .iter()
+ .any(|field| field["name"].as_str() == Some("scenario"))
+ );
+ Ok(())
+}
+
+#[test]
+fn bind_open_backfills_legacy_missing_summary() -> TestResult {
+ let project_root = temp_project_root("bind_backfill")?;
+ init_project(&project_root)?;
+
+ let node_id = {
+ let mut store = must(ProjectStore::open(&project_root), "open project store")?;
+ let node = must(
+ store.add_node(fidget_spinner_store_sqlite::CreateNodeRequest {
+ class: fidget_spinner_core::NodeClass::Research,
+ frontier_id: None,
+ title: must(NonEmptyText::new("legacy research"), "legacy title")?,
+ summary: Some(must(
+ NonEmptyText::new("temporary summary"),
+ "temporary summary",
+ )?),
+ tags: None,
+ payload: fidget_spinner_core::NodePayload::with_schema(
+ store.schema().schema_ref(),
+ serde_json::from_value(json!({
+ "body": "Derived summary first paragraph.\n\nLonger body follows."
+ }))
+ .map_err(|error| io::Error::other(format!("payload object: {error}")))?,
+ ),
+ annotations: Vec::new(),
+ attachments: Vec::new(),
+ }),
+ "create legacy research node",
+ )?;
+ node.id.to_string()
+ };
+
+ let database_path = project_root.join(".fidget_spinner").join("state.sqlite");
+ let clear_output = must(
+ Command::new("sqlite3")
+ .current_dir(project_root.as_std_path())
+ .arg(database_path.as_str())
+ .arg(format!(
+ "UPDATE nodes SET summary = NULL WHERE id = '{node_id}';"
+ ))
+ .output(),
+ "spawn sqlite3 for direct summary clear",
+ )?;
+ if !clear_output.status.success() {
+ return Err(io::Error::other(format!(
+ "sqlite3 summary clear failed: {}",
+ String::from_utf8_lossy(&clear_output.stderr)
+ ))
+ .into());
+ }
+
+ let mut harness = McpHarness::spawn(None, &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+ let bind = harness.bind_project(60, &project_root)?;
+ assert_eq!(bind["result"]["isError"].as_bool(), Some(false));
+
+ let read = harness.call_tool(61, "node.read", json!({ "node_id": node_id }))?;
+ assert_eq!(read["result"]["isError"].as_bool(), Some(false));
+ assert_eq!(
+ tool_content(&read)["summary"].as_str(),
+ Some("Derived summary first paragraph.")
+ );
+
+ let listed = harness.call_tool(62, "node.list", json!({ "class": "research" }))?;
+ let items = must_some(tool_content(&listed).as_array(), "research node list")?;
+ assert_eq!(items.len(), 1);
+ assert_eq!(
+ items[0]["summary"].as_str(),
+ Some("Derived summary first paragraph.")
+ );
+ Ok(())
+}
+
+#[test]
+fn metric_tools_rank_closed_experiments_and_enforce_disambiguation() -> TestResult {
+ let project_root = temp_project_root("metric_rank_e2e")?;
+ init_project(&project_root)?;
+ let _initial_head = init_git_project(&project_root)?;
+
+ let mut harness = McpHarness::spawn(Some(&project_root), &[])?;
+ let _ = harness.initialize()?;
+ harness.notify_initialized()?;
+
+ let frontier = harness.call_tool(
+ 70,
+ "frontier.init",
+ json!({
+ "label": "metric frontier",
+ "objective": "exercise metric ranking",
+ "contract_title": "metric contract",
+ "benchmark_suites": ["smoke"],
+ "promotion_criteria": ["rank by one key"],
+ "primary_metric": {
+ "key": "wall_clock_s",
+ "unit": "seconds",
+ "objective": "minimize"
+ }
+ }),
+ )?;
+ assert_eq!(frontier["result"]["isError"].as_bool(), Some(false));
+ let frontier_id = must_some(
+ tool_content(&frontier)["frontier_id"].as_str(),
+ "frontier id",
+ )?
+ .to_owned();
+ let base_checkpoint_id = must_some(
+ tool_content(&frontier)["champion_checkpoint_id"].as_str(),
+ "base checkpoint id",
+ )?
+ .to_owned();
+ let metric_define = harness.call_tool(
+ 701,
+ "metric.define",
+ json!({
+ "key": "wall_clock_s",
+ "unit": "seconds",
+ "objective": "minimize",
+ "description": "elapsed wall time"
+ }),
+ )?;
+ assert_eq!(metric_define["result"]["isError"].as_bool(), Some(false));
+
+ let scenario_dimension = harness.call_tool(
+ 702,
+ "run.dimension.define",
+ json!({
+ "key": "scenario",
+ "value_type": "string",
+ "description": "workload family"
+ }),
+ )?;
+ assert_eq!(
+ scenario_dimension["result"]["isError"].as_bool(),
+ Some(false)
+ );
+
+ let duration_dimension = harness.call_tool(
+ 703,
+ "run.dimension.define",
+ json!({
+ "key": "duration_s",
+ "value_type": "numeric",
+ "description": "time budget in seconds"
+ }),
+ )?;
+ assert_eq!(
+ duration_dimension["result"]["isError"].as_bool(),
+ Some(false)
+ );
+
+ let dimensions = harness.call_tool(704, "run.dimension.list", json!({}))?;
+ assert_eq!(dimensions["result"]["isError"].as_bool(), Some(false));
+ let dimension_rows = must_some(tool_content(&dimensions).as_array(), "run dimension rows")?;
+ assert!(dimension_rows.iter().any(|row| {
+ row["key"].as_str() == Some("benchmark_suite")
+ && row["value_type"].as_str() == Some("string")
+ }));
+ assert!(dimension_rows.iter().any(|row| {
+ row["key"].as_str() == Some("scenario")
+ && row["description"].as_str() == Some("workload family")
+ }));
+ assert!(dimension_rows.iter().any(|row| {
+ row["key"].as_str() == Some("duration_s") && row["value_type"].as_str() == Some("numeric")
+ }));
+
+ let first_change = harness.call_tool(
+ 71,
+ "node.create",
+ json!({
+ "class": "change",
+ "frontier_id": frontier_id,
+ "title": "first change",
+ "summary": "first change summary",
+ "payload": {
+ "body": "first change body",
+ "wall_clock_s": 14.0
+ }
+ }),
+ )?;
+ assert_eq!(first_change["result"]["isError"].as_bool(), Some(false));
+ let first_change_id = must_some(
+ tool_content(&first_change)["id"].as_str(),
+ "first change id",
+ )?;
+ let _first_commit = commit_project_state(&project_root, "candidate-one.txt", "candidate one")?;
+
+ let first_close = harness.call_tool(
+ 72,
+ "experiment.close",
+ json!({
+ "frontier_id": frontier_id,
+ "base_checkpoint_id": base_checkpoint_id,
+ "change_node_id": first_change_id,
+ "candidate_summary": "candidate one",
+ "run": {
+ "title": "first run",
+ "summary": "first run summary",
+ "backend": "worktree_process",
+ "dimensions": {
+ "benchmark_suite": "smoke",
+ "scenario": "belt_4x5",
+ "duration_s": 20.0
+ },
+ "command": {
+ "working_directory": project_root.as_str(),
+ "argv": ["true"]
+ }
+ },
+ "primary_metric": {
+ "key": "wall_clock_s",
+ "value": 10.0
+ },
+ "note": {
+ "summary": "first run note"
+ },
+ "verdict": "keep_on_frontier",
+ "decision_title": "first decision",
+ "decision_rationale": "keep first candidate around"
+ }),
+ )?;
+ assert_eq!(first_close["result"]["isError"].as_bool(), Some(false));
+
+ let first_candidate_checkpoint_id = must_some(
+ tool_content(&first_close)["candidate_checkpoint_id"].as_str(),
+ "first candidate checkpoint id",
+ )?
+ .to_owned();
+
+ let second_change = harness.call_tool(
+ 73,
+ "node.create",
+ json!({
+ "class": "change",
+ "frontier_id": frontier_id,
+ "title": "second change",
+ "summary": "second change summary",
+ "payload": {
+ "body": "second change body",
+ "wall_clock_s": 7.0
+ }
+ }),
+ )?;
+ assert_eq!(second_change["result"]["isError"].as_bool(), Some(false));
+ let second_change_id = must_some(
+ tool_content(&second_change)["id"].as_str(),
+ "second change id",
+ )?;
+ let second_commit = commit_project_state(&project_root, "candidate-two.txt", "candidate two")?;
+
+ let second_close = harness.call_tool(
+ 74,
+ "experiment.close",
+ json!({
+ "frontier_id": frontier_id,
+ "base_checkpoint_id": base_checkpoint_id,
+ "change_node_id": second_change_id,
+ "candidate_summary": "candidate two",
+ "run": {
+ "title": "second run",
+ "summary": "second run summary",
+ "backend": "worktree_process",
+ "dimensions": {
+ "benchmark_suite": "smoke",
+ "scenario": "belt_4x5",
+ "duration_s": 60.0
+ },
+ "command": {
+ "working_directory": project_root.as_str(),
+ "argv": ["true"]
+ }
+ },
+ "primary_metric": {
+ "key": "wall_clock_s",
+ "value": 5.0
+ },
+ "note": {
+ "summary": "second run note"
+ },
+ "verdict": "keep_on_frontier",
+ "decision_title": "second decision",
+ "decision_rationale": "second candidate looks stronger"
+ }),
+ )?;
+ assert_eq!(second_close["result"]["isError"].as_bool(), Some(false));
+ let second_candidate_checkpoint_id = must_some(
+ tool_content(&second_close)["candidate_checkpoint_id"].as_str(),
+ "second candidate checkpoint id",
+ )?
+ .to_owned();
+
+ let second_frontier = harness.call_tool(
+ 80,
+ "frontier.init",
+ json!({
+ "label": "metric frontier two",
+ "objective": "exercise frontier filtering",
+ "contract_title": "metric contract two",
+ "benchmark_suites": ["smoke"],
+ "promotion_criteria": ["frontier filters should isolate rankings"],
+ "primary_metric": {
+ "key": "wall_clock_s",
+ "unit": "seconds",
+ "objective": "minimize"
+ }
+ }),
+ )?;
+ assert_eq!(second_frontier["result"]["isError"].as_bool(), Some(false));
+ let second_frontier_id = must_some(
+ tool_content(&second_frontier)["frontier_id"].as_str(),
+ "second frontier id",
+ )?
+ .to_owned();
+ let second_base_checkpoint_id = must_some(
+ tool_content(&second_frontier)["champion_checkpoint_id"].as_str(),
+ "second frontier base checkpoint id",
+ )?
+ .to_owned();
+
+ let third_change = harness.call_tool(
+ 81,
+ "node.create",
+ json!({
+ "class": "change",
+ "frontier_id": second_frontier_id,
+ "title": "third change",
+ "summary": "third change summary",
+ "payload": {
+ "body": "third change body",
+ "wall_clock_s": 3.0
+ }
+ }),
+ )?;
+ assert_eq!(third_change["result"]["isError"].as_bool(), Some(false));
+ let third_change_id = must_some(
+ tool_content(&third_change)["id"].as_str(),
+ "third change id",
+ )?;
+ let third_commit =
+ commit_project_state(&project_root, "candidate-three.txt", "candidate three")?;
+
+ let third_close = harness.call_tool(
+ 82,
+ "experiment.close",
+ json!({
+ "frontier_id": second_frontier_id,
+ "base_checkpoint_id": second_base_checkpoint_id,
+ "change_node_id": third_change_id,
+ "candidate_summary": "candidate three",
+ "run": {
+ "title": "third run",
+ "summary": "third run summary",
+ "backend": "worktree_process",
+ "dimensions": {
+ "benchmark_suite": "smoke",
+ "scenario": "belt_4x5_alt",
+ "duration_s": 60.0
+ },
+ "command": {
+ "working_directory": project_root.as_str(),
+ "argv": ["true"]
+ }
+ },
+ "primary_metric": {
+ "key": "wall_clock_s",
+ "value": 3.0
+ },
+ "note": {
+ "summary": "third run note"
+ },
+ "verdict": "keep_on_frontier",
+ "decision_title": "third decision",
+ "decision_rationale": "third candidate is best overall but not in the first frontier"
+ }),
+ )?;
+ assert_eq!(third_close["result"]["isError"].as_bool(), Some(false));
+ let third_candidate_checkpoint_id = must_some(
+ tool_content(&third_close)["candidate_checkpoint_id"].as_str(),
+ "third candidate checkpoint id",
+ )?
+ .to_owned();
+
+ let keys = harness.call_tool(75, "metric.keys", json!({}))?;
+ assert_eq!(keys["result"]["isError"].as_bool(), Some(false));
+ let key_rows = must_some(tool_content(&keys).as_array(), "metric keys array")?;
+ assert!(key_rows.iter().any(|row| {
+ row["key"].as_str() == Some("wall_clock_s") && row["source"].as_str() == Some("run_metric")
+ }));
+ assert!(key_rows.iter().any(|row| {
+ row["key"].as_str() == Some("wall_clock_s")
+ && row["source"].as_str() == Some("run_metric")
+ && row["description"].as_str() == Some("elapsed wall time")
+ && row["requires_order"].as_bool() == Some(false)
+ }));
+ assert!(key_rows.iter().any(|row| {
+ row["key"].as_str() == Some("wall_clock_s")
+ && row["source"].as_str() == Some("change_payload")
+ }));
+
+ let filtered_keys = harness.call_tool(
+ 750,
+ "metric.keys",
+ json!({
+ "source": "run_metric",
+ "dimensions": {
+ "scenario": "belt_4x5",
+ "duration_s": 60.0
+ }
+ }),
+ )?;
+ assert_eq!(filtered_keys["result"]["isError"].as_bool(), Some(false));
+ let filtered_key_rows = must_some(
+ tool_content(&filtered_keys).as_array(),
+ "filtered metric keys array",
+ )?;
+ assert_eq!(filtered_key_rows.len(), 1);
+ assert_eq!(filtered_key_rows[0]["key"].as_str(), Some("wall_clock_s"));
+ assert_eq!(filtered_key_rows[0]["experiment_count"].as_u64(), Some(1));
+
+ let ambiguous = harness.call_tool(76, "metric.best", json!({ "key": "wall_clock_s" }))?;
+ assert_eq!(ambiguous["result"]["isError"].as_bool(), Some(true));
+ assert!(
+ fault_message(&ambiguous)
+ .is_some_and(|message| message.contains("ambiguous across sources"))
+ );
+
+ let run_metric_best = harness.call_tool(
+ 77,
+ "metric.best",
+ json!({
+ "key": "wall_clock_s",
+ "source": "run_metric",
+ "dimensions": {
+ "scenario": "belt_4x5",
+ "duration_s": 60.0
+ },
+ "limit": 5
+ }),
+ )?;
+ assert_eq!(run_metric_best["result"]["isError"].as_bool(), Some(false));
+ let run_best_rows = must_some(
+ tool_content(&run_metric_best).as_array(),
+ "run metric best array",
+ )?;
+ assert_eq!(run_best_rows[0]["value"].as_f64(), Some(5.0));
+ assert_eq!(run_best_rows.len(), 1);
+ assert_eq!(
+ run_best_rows[0]["candidate_checkpoint_id"].as_str(),
+ Some(second_candidate_checkpoint_id.as_str())
+ );
+ assert_eq!(
+ run_best_rows[0]["candidate_commit_hash"].as_str(),
+ Some(second_commit.as_str())
+ );
+ assert_eq!(
+ run_best_rows[0]["dimensions"]["scenario"].as_str(),
+ Some("belt_4x5")
+ );
+ assert_eq!(
+ run_best_rows[0]["dimensions"]["duration_s"].as_f64(),
+ Some(60.0)
+ );
+ assert!(must_some(tool_text(&run_metric_best), "run metric best text")?.contains("commit="));
+ assert!(must_some(tool_text(&run_metric_best), "run metric best text")?.contains("dims:"));
+
+ let payload_requires_order = harness.call_tool(
+ 78,
+ "metric.best",
+ json!({
+ "key": "wall_clock_s",
+ "source": "change_payload"
+ }),
+ )?;
+ assert_eq!(
+ payload_requires_order["result"]["isError"].as_bool(),
+ Some(true)
+ );
+ assert!(
+ fault_message(&payload_requires_order)
+ .is_some_and(|message| message.contains("explicit order"))
+ );
+
+ let payload_best = harness.call_tool(
+ 79,
+ "metric.best",
+ json!({
+ "key": "wall_clock_s",
+ "source": "change_payload",
+ "dimensions": {
+ "scenario": "belt_4x5",
+ "duration_s": 60.0
+ },
+ "order": "asc"
+ }),
+ )?;
+ assert_eq!(payload_best["result"]["isError"].as_bool(), Some(false));
+ let payload_best_rows = must_some(
+ tool_content(&payload_best).as_array(),
+ "payload metric best array",
+ )?;
+ assert_eq!(payload_best_rows[0]["value"].as_f64(), Some(7.0));
+ assert_eq!(payload_best_rows.len(), 1);
+ assert_eq!(
+ payload_best_rows[0]["candidate_checkpoint_id"].as_str(),
+ Some(second_candidate_checkpoint_id.as_str())
+ );
+ assert_eq!(
+ payload_best_rows[0]["candidate_commit_hash"].as_str(),
+ Some(second_commit.as_str())
+ );
+
+ let filtered_best = harness.call_tool(
+ 83,
+ "metric.best",
+ json!({
+ "key": "wall_clock_s",
+ "source": "run_metric",
+ "frontier_id": frontier_id,
+ "dimensions": {
+ "scenario": "belt_4x5"
+ },
+ "limit": 5
+ }),
+ )?;
+ assert_eq!(filtered_best["result"]["isError"].as_bool(), Some(false));
+ let filtered_rows = must_some(
+ tool_content(&filtered_best).as_array(),
+ "filtered metric best array",
+ )?;
+ assert_eq!(filtered_rows.len(), 2);
+ assert_eq!(
+ filtered_rows[0]["candidate_checkpoint_id"].as_str(),
+ Some(second_candidate_checkpoint_id.as_str())
+ );
+ assert!(
+ filtered_rows
+ .iter()
+ .all(|row| row["frontier_id"].as_str() == Some(frontier_id.as_str()))
+ );
+
+ let global_best = harness.call_tool(
+ 84,
+ "metric.best",
+ json!({
+ "key": "wall_clock_s",
+ "source": "run_metric",
+ "limit": 5
+ }),
+ )?;
+ assert_eq!(global_best["result"]["isError"].as_bool(), Some(false));
+ let global_rows = must_some(
+ tool_content(&global_best).as_array(),
+ "global metric best array",
+ )?;
+ assert_eq!(
+ global_rows[0]["candidate_checkpoint_id"].as_str(),
+ Some(third_candidate_checkpoint_id.as_str())
+ );
+ assert_eq!(
+ global_rows[0]["candidate_commit_hash"].as_str(),
+ Some(third_commit.as_str())
+ );
+
+ let migrate = harness.call_tool(85, "metric.migrate", json!({}))?;
+ assert_eq!(migrate["result"]["isError"].as_bool(), Some(false));
+ assert_eq!(
+ tool_content(&migrate)["inserted_metric_definitions"].as_u64(),
+ Some(0)
+ );
+ assert_eq!(
+ tool_content(&migrate)["inserted_dimension_definitions"].as_u64(),
+ Some(0)
+ );
+ assert_eq!(
+ tool_content(&migrate)["inserted_dimension_values"].as_u64(),
+ Some(0)
+ );
+
+ assert_ne!(
+ first_candidate_checkpoint_id,
+ second_candidate_checkpoint_id
+ );
+ assert_ne!(
+ second_candidate_checkpoint_id,
+ third_candidate_checkpoint_id
+ );
+ Ok(())
+}
diff --git a/crates/fidget-spinner-core/src/lib.rs b/crates/fidget-spinner-core/src/lib.rs
index b5e2b23..c0d6fe2 100644
--- a/crates/fidget-spinner-core/src/lib.rs
+++ b/crates/fidget-spinner-core/src/lib.rs
@@ -20,8 +20,9 @@ pub use crate::model::{
DagEdge, DagNode, DiagnosticSeverity, EdgeKind, EvaluationProtocol, ExecutionBackend,
ExperimentResult, FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote,
FrontierProjection, FrontierRecord, FrontierStatus, FrontierVerdict, GitCommitHash,
- InferencePolicy, JsonObject, MetricObservation, MetricSpec, MetricUnit, NodeAnnotation,
- NodeClass, NodeDiagnostics, NodePayload, NodeTrack, NonEmptyText, OptimizationObjective,
- PayloadSchemaRef, ProjectFieldSpec, ProjectSchema, RunRecord, RunStatus, TagName, TagRecord,
+ InferencePolicy, JsonObject, MetricDefinition, MetricObservation, MetricSpec, MetricUnit,
+ MetricValue, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NodeTrack, NonEmptyText,
+ OptimizationObjective, PayloadSchemaRef, ProjectFieldSpec, ProjectSchema,
+ RunDimensionDefinition, RunDimensionValue, RunRecord, RunStatus, TagName, TagRecord,
ValidationDiagnostic,
};
diff --git a/crates/fidget-spinner-core/src/model.rs b/crates/fidget-spinner-core/src/model.rs
index 2de3705..a77566f 100644
--- a/crates/fidget-spinner-core/src/model.rs
+++ b/crates/fidget-spinner-core/src/model.rs
@@ -177,6 +177,17 @@ pub enum DiagnosticSeverity {
Info,
}
+impl DiagnosticSeverity {
+ #[must_use]
+ pub const fn as_str(self) -> &'static str {
+ match self {
+ Self::Error => "error",
+ Self::Warning => "warning",
+ Self::Info => "info",
+ }
+ }
+}
+
#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)]
pub enum FieldPresence {
Required,
@@ -184,6 +195,17 @@ pub enum FieldPresence {
Optional,
}
+impl FieldPresence {
+ #[must_use]
+ pub const fn as_str(self) -> &'static str {
+ match self {
+ Self::Required => "required",
+ Self::Recommended => "recommended",
+ Self::Optional => "optional",
+ }
+ }
+}
+
#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)]
pub enum FieldRole {
Index,
@@ -192,12 +214,34 @@ pub enum FieldRole {
Opaque,
}
+impl FieldRole {
+ #[must_use]
+ pub const fn as_str(self) -> &'static str {
+ match self {
+ Self::Index => "index",
+ Self::ProjectionGate => "projection_gate",
+ Self::RenderOnly => "render_only",
+ Self::Opaque => "opaque",
+ }
+ }
+}
+
#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)]
pub enum InferencePolicy {
ManualOnly,
ModelMayInfer,
}
+impl InferencePolicy {
+ #[must_use]
+ pub const fn as_str(self) -> &'static str {
+ match self {
+ Self::ManualOnly => "manual_only",
+ Self::ModelMayInfer => "model_may_infer",
+ }
+ }
+}
+
#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)]
#[serde(rename_all = "snake_case")]
pub enum FieldValueType {
@@ -269,6 +313,96 @@ pub enum OptimizationObjective {
Target,
}
+#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
+pub struct MetricDefinition {
+ pub key: NonEmptyText,
+ pub unit: MetricUnit,
+ pub objective: OptimizationObjective,
+ pub description: Option<NonEmptyText>,
+ pub created_at: OffsetDateTime,
+}
+
+impl MetricDefinition {
+ #[must_use]
+ pub fn new(
+ key: NonEmptyText,
+ unit: MetricUnit,
+ objective: OptimizationObjective,
+ description: Option<NonEmptyText>,
+ ) -> Self {
+ Self {
+ key,
+ unit,
+ objective,
+ description,
+ created_at: OffsetDateTime::now_utc(),
+ }
+ }
+}
+
+#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
+#[serde(rename_all = "snake_case", tag = "type", content = "value")]
+pub enum RunDimensionValue {
+ String(NonEmptyText),
+ Numeric(f64),
+ Boolean(bool),
+ Timestamp(NonEmptyText),
+}
+
+impl RunDimensionValue {
+ #[must_use]
+ pub const fn value_type(&self) -> FieldValueType {
+ match self {
+ Self::String(_) => FieldValueType::String,
+ Self::Numeric(_) => FieldValueType::Numeric,
+ Self::Boolean(_) => FieldValueType::Boolean,
+ Self::Timestamp(_) => FieldValueType::Timestamp,
+ }
+ }
+
+ #[must_use]
+ pub fn as_json(&self) -> Value {
+ match self {
+ Self::String(value) | Self::Timestamp(value) => Value::String(value.to_string()),
+ Self::Numeric(value) => {
+ serde_json::Number::from_f64(*value).map_or(Value::Null, Value::Number)
+ }
+ Self::Boolean(value) => Value::Bool(*value),
+ }
+ }
+}
+
+#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
+pub struct RunDimensionDefinition {
+ pub key: NonEmptyText,
+ pub value_type: FieldValueType,
+ pub description: Option<NonEmptyText>,
+ pub created_at: OffsetDateTime,
+}
+
+impl RunDimensionDefinition {
+ #[must_use]
+ pub fn new(
+ key: NonEmptyText,
+ value_type: FieldValueType,
+ description: Option<NonEmptyText>,
+ ) -> Self {
+ Self {
+ key,
+ value_type,
+ description,
+ created_at: OffsetDateTime::now_utc(),
+ }
+ }
+}
+
+#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
+pub struct MetricValue {
+ #[serde(alias = "metric_key")]
+ pub key: NonEmptyText,
+ pub value: f64,
+}
+
#[derive(Clone, Copy, Debug, Deserialize, Eq, PartialEq, Serialize)]
pub enum RunStatus {
Queued,
@@ -710,7 +844,7 @@ pub struct CheckpointRecord {
pub created_at: OffsetDateTime,
}
-#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
+#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct RunRecord {
pub node_id: NodeId,
pub run_id: RunId,
@@ -718,7 +852,7 @@ pub struct RunRecord {
pub status: RunStatus,
pub backend: ExecutionBackend,
pub code_snapshot: Option<CodeSnapshotRef>,
- pub benchmark_suite: Option<NonEmptyText>,
+ pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
pub command: CommandRecipe,
pub started_at: Option<OffsetDateTime>,
pub finished_at: Option<OffsetDateTime>,
@@ -726,9 +860,9 @@ pub struct RunRecord {
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct ExperimentResult {
- pub benchmark_suite: NonEmptyText,
- pub primary_metric: MetricObservation,
- pub supporting_metrics: Vec<MetricObservation>,
+ pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
+ pub primary_metric: MetricValue,
+ pub supporting_metrics: Vec<MetricValue>,
pub benchmark_bundle: Option<ArtifactId>,
}
diff --git a/crates/fidget-spinner-store-sqlite/src/lib.rs b/crates/fidget-spinner-store-sqlite/src/lib.rs
index da9fa42..b970629 100644
--- a/crates/fidget-spinner-store-sqlite/src/lib.rs
+++ b/crates/fidget-spinner-store-sqlite/src/lib.rs
@@ -1,4 +1,5 @@
-use std::collections::BTreeSet;
+use std::cmp::Ordering;
+use std::collections::{BTreeMap, BTreeSet};
use std::fmt::Write as _;
use std::fs;
use std::io;
@@ -7,11 +8,13 @@ use std::process::Command;
use camino::{Utf8Path, Utf8PathBuf};
use fidget_spinner_core::{
AnnotationVisibility, CheckpointDisposition, CheckpointRecord, CheckpointSnapshotRef,
- CodeSnapshotRef, CommandRecipe, CompletedExperiment, DagEdge, DagNode, EdgeKind,
- ExecutionBackend, ExperimentResult, FrontierContract, FrontierNote, FrontierProjection,
- FrontierRecord, FrontierStatus, FrontierVerdict, GitCommitHash, JsonObject, MetricObservation,
- MetricSpec, MetricUnit, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NonEmptyText,
- OptimizationObjective, ProjectSchema, RunRecord, RunStatus, TagName, TagRecord,
+ CodeSnapshotRef, CommandRecipe, CompletedExperiment, DagEdge, DagNode, DiagnosticSeverity,
+ EdgeKind, ExecutionBackend, ExperimentResult, FieldPresence, FieldRole, FieldValueType,
+ FrontierContract, FrontierNote, FrontierProjection, FrontierRecord, FrontierStatus,
+ FrontierVerdict, GitCommitHash, InferencePolicy, JsonObject, MetricDefinition, MetricSpec,
+ MetricUnit, MetricValue, NodeAnnotation, NodeClass, NodeDiagnostics, NodePayload, NonEmptyText,
+ OptimizationObjective, ProjectFieldSpec, ProjectSchema, RunDimensionDefinition,
+ RunDimensionValue, RunRecord, RunStatus, TagName, TagRecord,
};
use rusqlite::types::Value as SqlValue;
use rusqlite::{Connection, OptionalExtension, Transaction, params, params_from_iter};
@@ -63,8 +66,50 @@ pub enum StoreError {
DuplicateTag(TagName),
#[error("note nodes require an explicit tag list; use an empty list if no tags apply")]
NoteTagsRequired,
+ #[error("{0} nodes require a non-empty summary")]
+ ProseSummaryRequired(NodeClass),
+ #[error("{0} nodes require a non-empty string payload field `body`")]
+ ProseBodyRequired(NodeClass),
#[error("git repository inspection failed for {0}")]
GitInspectionFailed(Utf8PathBuf),
+ #[error("metric `{0}` is not registered")]
+ UnknownMetricDefinition(NonEmptyText),
+ #[error(
+ "metric `{key}` conflicts with existing definition ({existing_unit}/{existing_objective} vs {new_unit}/{new_objective})"
+ )]
+ ConflictingMetricDefinition {
+ key: String,
+ existing_unit: String,
+ existing_objective: String,
+ new_unit: String,
+ new_objective: String,
+ },
+ #[error("run dimension `{0}` is not registered")]
+ UnknownRunDimension(NonEmptyText),
+ #[error("run dimension `{0}` already exists")]
+ DuplicateRunDimension(NonEmptyText),
+ #[error(
+ "run dimension `{key}` conflicts with existing definition ({existing_type} vs {new_type})"
+ )]
+ ConflictingRunDimensionDefinition {
+ key: String,
+ existing_type: String,
+ new_type: String,
+ },
+ #[error("run dimension `{key}` expects {expected} values, got {observed}")]
+ InvalidRunDimensionValue {
+ key: String,
+ expected: String,
+ observed: String,
+ },
+ #[error("schema field `{0}` was not found")]
+ SchemaFieldNotFound(String),
+ #[error("metric key `{key}` is ambiguous across sources: {sources}")]
+ AmbiguousMetricKey { key: String, sources: String },
+ #[error("metric key `{key}` for source `{metric_source}` requires an explicit order")]
+ MetricOrderRequired { key: String, metric_source: String },
+ #[error("metric key `{key}` for source `{metric_source}` has conflicting semantics")]
+ MetricSemanticsAmbiguous { key: String, metric_source: String },
}
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
@@ -165,6 +210,155 @@ pub struct NodeSummary {
pub updated_at: OffsetDateTime,
}
+#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)]
+#[serde(rename_all = "snake_case")]
+pub enum MetricFieldSource {
+ RunMetric,
+ ChangePayload,
+ RunPayload,
+ AnalysisPayload,
+ DecisionPayload,
+}
+
+impl MetricFieldSource {
+ #[must_use]
+ pub const fn as_str(self) -> &'static str {
+ match self {
+ Self::RunMetric => "run_metric",
+ Self::ChangePayload => "change_payload",
+ Self::RunPayload => "run_payload",
+ Self::AnalysisPayload => "analysis_payload",
+ Self::DecisionPayload => "decision_payload",
+ }
+ }
+
+ #[must_use]
+ pub const fn from_payload_class(class: NodeClass) -> Option<Self> {
+ match class {
+ NodeClass::Change => Some(Self::ChangePayload),
+ NodeClass::Run => Some(Self::RunPayload),
+ NodeClass::Analysis => Some(Self::AnalysisPayload),
+ NodeClass::Decision => Some(Self::DecisionPayload),
+ NodeClass::Contract | NodeClass::Research | NodeClass::Enabling | NodeClass::Note => {
+ None
+ }
+ }
+ }
+}
+
+#[derive(Clone, Copy, Debug, Deserialize, Eq, Ord, PartialEq, PartialOrd, Serialize)]
+#[serde(rename_all = "snake_case")]
+pub enum MetricRankOrder {
+ Asc,
+ Desc,
+}
+
+impl MetricRankOrder {
+ #[must_use]
+ pub const fn as_str(self) -> &'static str {
+ match self {
+ Self::Asc => "asc",
+ Self::Desc => "desc",
+ }
+ }
+}
+
+#[derive(Clone, Debug)]
+pub struct MetricBestQuery {
+ pub key: NonEmptyText,
+ pub frontier_id: Option<fidget_spinner_core::FrontierId>,
+ pub source: Option<MetricFieldSource>,
+ pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
+ pub order: Option<MetricRankOrder>,
+ pub limit: u32,
+}
+
+#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
+pub struct MetricKeySummary {
+ pub key: NonEmptyText,
+ pub source: MetricFieldSource,
+ pub experiment_count: u64,
+ pub unit: Option<MetricUnit>,
+ pub objective: Option<OptimizationObjective>,
+ pub description: Option<NonEmptyText>,
+ pub requires_order: bool,
+}
+
+#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
+pub struct MetricBestEntry {
+ pub key: NonEmptyText,
+ pub source: MetricFieldSource,
+ pub value: f64,
+ pub order: MetricRankOrder,
+ pub experiment_id: fidget_spinner_core::ExperimentId,
+ pub frontier_id: fidget_spinner_core::FrontierId,
+ pub change_node_id: fidget_spinner_core::NodeId,
+ pub change_title: NonEmptyText,
+ pub run_id: fidget_spinner_core::RunId,
+ pub verdict: FrontierVerdict,
+ pub candidate_checkpoint_id: fidget_spinner_core::CheckpointId,
+ pub candidate_commit_hash: GitCommitHash,
+ pub unit: Option<MetricUnit>,
+ pub objective: Option<OptimizationObjective>,
+ pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
+}
+
+#[derive(Clone, Debug, Default)]
+pub struct MetricKeyQuery {
+ pub frontier_id: Option<fidget_spinner_core::FrontierId>,
+ pub source: Option<MetricFieldSource>,
+ pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
+}
+
+#[derive(Clone, Debug)]
+pub struct DefineMetricRequest {
+ pub key: NonEmptyText,
+ pub unit: MetricUnit,
+ pub objective: OptimizationObjective,
+ pub description: Option<NonEmptyText>,
+}
+
+#[derive(Clone, Debug)]
+pub struct DefineRunDimensionRequest {
+ pub key: NonEmptyText,
+ pub value_type: FieldValueType,
+ pub description: Option<NonEmptyText>,
+}
+
+#[derive(Clone, Debug)]
+pub struct UpsertSchemaFieldRequest {
+ pub name: NonEmptyText,
+ pub node_classes: BTreeSet<NodeClass>,
+ pub presence: FieldPresence,
+ pub severity: DiagnosticSeverity,
+ pub role: FieldRole,
+ pub inference_policy: InferencePolicy,
+ pub value_type: Option<FieldValueType>,
+}
+
+#[derive(Clone, Debug)]
+pub struct RemoveSchemaFieldRequest {
+ pub name: NonEmptyText,
+ pub node_classes: Option<BTreeSet<NodeClass>>,
+}
+
+#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
+pub struct RunDimensionSummary {
+ pub key: NonEmptyText,
+ pub value_type: FieldValueType,
+ pub description: Option<NonEmptyText>,
+ pub observed_run_count: u64,
+ pub distinct_value_count: u64,
+ pub sample_values: Vec<Value>,
+}
+
+#[derive(Clone, Debug, Default, Deserialize, Eq, PartialEq, Serialize)]
+pub struct MetricPlaneMigrationReport {
+ pub inserted_metric_definitions: u64,
+ pub inserted_dimension_definitions: u64,
+ pub inserted_dimension_values: u64,
+}
+
#[derive(Clone, Debug)]
pub struct CreateFrontierRequest {
pub label: NonEmptyText,
@@ -190,11 +384,11 @@ pub struct CloseExperimentRequest {
pub run_title: NonEmptyText,
pub run_summary: Option<NonEmptyText>,
pub backend: ExecutionBackend,
- pub benchmark_suite: NonEmptyText,
+ pub dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
pub command: CommandRecipe,
pub code_snapshot: Option<CodeSnapshotRef>,
- pub primary_metric: MetricObservation,
- pub supporting_metrics: Vec<MetricObservation>,
+ pub primary_metric: MetricValue,
+ pub supporting_metrics: Vec<MetricValue>,
pub note: FrontierNote,
pub verdict: FrontierVerdict,
pub decision_title: NonEmptyText,
@@ -233,8 +427,8 @@ impl ProjectStore {
let schema = ProjectSchema::default_with_namespace(schema_namespace);
write_json_file(&state_root.join(PROJECT_SCHEMA_NAME), &schema)?;
- let connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?;
- migrate(&connection)?;
+ let mut connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?;
+ upgrade_store(&mut connection)?;
Ok(Self {
project_root,
@@ -252,8 +446,8 @@ impl ProjectStore {
let state_root = state_root(&project_root);
let config = read_json_file::<ProjectConfig>(&state_root.join(PROJECT_CONFIG_NAME))?;
let schema = read_json_file::<ProjectSchema>(&state_root.join(PROJECT_SCHEMA_NAME))?;
- let connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?;
- migrate(&connection)?;
+ let mut connection = Connection::open(state_root.join(STATE_DB_NAME).as_std_path())?;
+ upgrade_store(&mut connection)?;
Ok(Self {
project_root,
state_root,
@@ -273,6 +467,59 @@ impl ProjectStore {
&self.schema
}
+ pub fn upsert_schema_field(
+ &mut self,
+ request: UpsertSchemaFieldRequest,
+ ) -> Result<ProjectFieldSpec, StoreError> {
+ let field = ProjectFieldSpec {
+ name: request.name,
+ node_classes: request.node_classes,
+ presence: request.presence,
+ severity: request.severity,
+ role: request.role,
+ inference_policy: request.inference_policy,
+ value_type: request.value_type,
+ };
+ if let Some(existing) = self.schema.fields.iter_mut().find(|existing| {
+ existing.name == field.name && existing.node_classes == field.node_classes
+ }) {
+ if *existing == field {
+ return Ok(field);
+ }
+ *existing = field.clone();
+ } else {
+ self.schema.fields.push(field.clone());
+ }
+ sort_schema_fields(&mut self.schema.fields);
+ self.bump_schema_version();
+ self.save_schema()?;
+ Ok(field)
+ }
+
+ pub fn remove_schema_field(
+ &mut self,
+ request: RemoveSchemaFieldRequest,
+ ) -> Result<u64, StoreError> {
+ let before = self.schema.fields.len();
+ self.schema.fields.retain(|field| {
+ field.name != request.name
+ || request
+ .node_classes
+ .as_ref()
+ .is_some_and(|node_classes| field.node_classes != *node_classes)
+ });
+ let removed = before.saturating_sub(self.schema.fields.len()) as u64;
+ if removed == 0 {
+ return Err(StoreError::SchemaFieldNotFound(
+ request.name.as_str().to_owned(),
+ ));
+ }
+ sort_schema_fields(&mut self.schema.fields);
+ self.bump_schema_version();
+ self.save_schema()?;
+ Ok(removed)
+ }
+
#[must_use]
pub fn project_root(&self) -> &Utf8Path {
&self.project_root
@@ -283,6 +530,14 @@ impl ProjectStore {
&self.state_root
}
+ fn bump_schema_version(&mut self) {
+ self.schema.version = self.schema.version.saturating_add(1);
+ }
+
+ fn save_schema(&self) -> Result<(), StoreError> {
+ write_json_file(&self.state_root.join(PROJECT_SCHEMA_NAME), &self.schema)
+ }
+
pub fn create_frontier(
&mut self,
request: CreateFrontierRequest,
@@ -304,6 +559,31 @@ impl ProjectStore {
let frontier = FrontierRecord::with_id(frontier_id, request.label, contract_node.id);
let tx = self.connection.transaction()?;
+ let _ = upsert_metric_definition_tx(
+ &tx,
+ &MetricDefinition::new(
+ request
+ .contract
+ .evaluation
+ .primary_metric
+ .metric_key
+ .clone(),
+ request.contract.evaluation.primary_metric.unit,
+ request.contract.evaluation.primary_metric.objective,
+ None,
+ ),
+ )?;
+ for metric in &request.contract.evaluation.supporting_metrics {
+ let _ = upsert_metric_definition_tx(
+ &tx,
+ &MetricDefinition::new(
+ metric.metric_key.clone(),
+ metric.unit,
+ metric.objective,
+ None,
+ ),
+ )?;
+ }
insert_node(&tx, &contract_node)?;
insert_frontier(&tx, &frontier)?;
if let Some(seed) = request.initial_checkpoint {
@@ -330,6 +610,75 @@ impl ProjectStore {
self.frontier_projection(frontier.id)
}
+ pub fn define_metric(
+ &mut self,
+ request: DefineMetricRequest,
+ ) -> Result<MetricDefinition, StoreError> {
+ let record = MetricDefinition::new(
+ request.key,
+ request.unit,
+ request.objective,
+ request.description,
+ );
+ let tx = self.connection.transaction()?;
+ let _ = upsert_metric_definition_tx(&tx, &record)?;
+ tx.commit()?;
+ Ok(record)
+ }
+
+ pub fn list_metric_definitions(&self) -> Result<Vec<MetricDefinition>, StoreError> {
+ let mut statement = self.connection.prepare(
+ "SELECT metric_key, unit, objective, description, created_at
+ FROM metric_definitions
+ ORDER BY metric_key ASC",
+ )?;
+ let mut rows = statement.query([])?;
+ let mut items = Vec::new();
+ while let Some(row) = rows.next()? {
+ items.push(MetricDefinition {
+ key: NonEmptyText::new(row.get::<_, String>(0)?)?,
+ unit: decode_metric_unit(&row.get::<_, String>(1)?)?,
+ objective: decode_optimization_objective(&row.get::<_, String>(2)?)?,
+ description: row
+ .get::<_, Option<String>>(3)?
+ .map(NonEmptyText::new)
+ .transpose()?,
+ created_at: decode_timestamp(&row.get::<_, String>(4)?)?,
+ });
+ }
+ Ok(items)
+ }
+
+ pub fn define_run_dimension(
+ &mut self,
+ request: DefineRunDimensionRequest,
+ ) -> Result<RunDimensionDefinition, StoreError> {
+ let record =
+ RunDimensionDefinition::new(request.key, request.value_type, request.description);
+ let tx = self.connection.transaction()?;
+ let _ = insert_run_dimension_definition_tx(&tx, &record)?;
+ tx.commit()?;
+ Ok(record)
+ }
+
+ pub fn list_run_dimensions(&self) -> Result<Vec<RunDimensionSummary>, StoreError> {
+ load_run_dimension_summaries(self)
+ }
+
+ pub fn coerce_run_dimensions(
+ &self,
+ raw_dimensions: BTreeMap<String, Value>,
+ ) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, StoreError> {
+ coerce_run_dimension_map(&run_dimension_definitions_by_key(self)?, raw_dimensions)
+ }
+
+ pub fn migrate_metric_plane(&mut self) -> Result<MetricPlaneMigrationReport, StoreError> {
+ let tx = self.connection.transaction()?;
+ let report = normalize_metric_plane_tx(&tx)?;
+ tx.commit()?;
+ Ok(report)
+ }
+
pub fn add_tag(
&mut self,
name: TagName,
@@ -372,6 +721,7 @@ impl ProjectStore {
}
pub fn add_node(&mut self, request: CreateNodeRequest) -> Result<DagNode, StoreError> {
+ validate_prose_node_request(&request)?;
let diagnostics = self.schema.validate_node(request.class, &request.payload);
let mut node = DagNode::new(
request.class,
@@ -406,6 +756,100 @@ impl ProjectStore {
Ok(node)
}
+ pub fn list_metric_keys(&self) -> Result<Vec<MetricKeySummary>, StoreError> {
+ self.list_metric_keys_filtered(MetricKeyQuery::default())
+ }
+
+ pub fn list_metric_keys_filtered(
+ &self,
+ query: MetricKeyQuery,
+ ) -> Result<Vec<MetricKeySummary>, StoreError> {
+ let mut summaries = collect_metric_samples(self, &query)?
+ .into_iter()
+ .fold(
+ BTreeMap::<(MetricFieldSource, String), MetricKeyAccumulator>::new(),
+ |mut accumulators, sample| {
+ let key = (sample.source, sample.key.as_str().to_owned());
+ let _ = accumulators
+ .entry(key)
+ .and_modify(|entry| entry.observe(&sample))
+ .or_insert_with(|| MetricKeyAccumulator::from_sample(&sample));
+ accumulators
+ },
+ )
+ .into_values()
+ .map(MetricKeyAccumulator::finish)
+ .collect::<Vec<_>>();
+ if query
+ .source
+ .is_none_or(|source| source == MetricFieldSource::RunMetric)
+ {
+ merge_registered_run_metric_summaries(self, &mut summaries)?;
+ }
+ summaries.sort_by(|left, right| {
+ left.key
+ .cmp(&right.key)
+ .then(left.source.cmp(&right.source))
+ });
+ Ok(summaries)
+ }
+
+ pub fn best_metrics(&self, query: MetricBestQuery) -> Result<Vec<MetricBestEntry>, StoreError> {
+ let matching = collect_metric_samples(
+ self,
+ &MetricKeyQuery {
+ frontier_id: query.frontier_id,
+ source: query.source,
+ dimensions: query.dimensions.clone(),
+ },
+ )?
+ .into_iter()
+ .filter(|sample| sample.key == query.key)
+ .collect::<Vec<_>>();
+ if matching.is_empty() {
+ return Ok(Vec::new());
+ }
+
+ let source = if let Some(source) = query.source {
+ source
+ } else {
+ let sources = matching
+ .iter()
+ .map(|sample| sample.source)
+ .collect::<BTreeSet<_>>();
+ if sources.len() != 1 {
+ return Err(StoreError::AmbiguousMetricKey {
+ key: query.key.as_str().to_owned(),
+ sources: sources
+ .into_iter()
+ .map(MetricFieldSource::as_str)
+ .collect::<Vec<_>>()
+ .join(", "),
+ });
+ }
+ let Some(source) = sources.iter().copied().next() else {
+ return Ok(Vec::new());
+ };
+ source
+ };
+
+ let mut matching = matching
+ .into_iter()
+ .filter(|sample| sample.source == source)
+ .collect::<Vec<_>>();
+ if matching.is_empty() {
+ return Ok(Vec::new());
+ }
+
+ let order = resolve_metric_order(&matching, &query, source)?;
+ matching.sort_by(|left, right| compare_metric_samples(left, right, order));
+ matching.truncate(query.limit as usize);
+ Ok(matching
+ .into_iter()
+ .map(|sample| sample.into_entry(order))
+ .collect())
+ }
+
pub fn archive_node(&mut self, node_id: fidget_spinner_core::NodeId) -> Result<(), StoreError> {
let updated_at = encode_timestamp(OffsetDateTime::now_utc())?;
let changed = self.connection.execute(
@@ -668,11 +1112,26 @@ impl ProjectStore {
if base_checkpoint.frontier_id != request.frontier_id {
return Err(StoreError::CheckpointNotFound(request.base_checkpoint_id));
}
+ let tx = self.connection.transaction()?;
+ let dimensions = validate_run_dimensions_tx(&tx, &request.dimensions)?;
+ let primary_metric_definition =
+ load_metric_definition_tx(&tx, &request.primary_metric.key)?.ok_or_else(|| {
+ StoreError::UnknownMetricDefinition(request.primary_metric.key.clone())
+ })?;
+ let supporting_metric_definitions = request
+ .supporting_metrics
+ .iter()
+ .map(|metric| {
+ load_metric_definition_tx(&tx, &metric.key)?
+ .ok_or_else(|| StoreError::UnknownMetricDefinition(metric.key.clone()))
+ })
+ .collect::<Result<Vec<_>, StoreError>>()?;
+ let benchmark_suite = benchmark_suite_label(&dimensions);
let run_payload = NodePayload::with_schema(
self.schema.schema_ref(),
json_object(json!({
- "benchmark_suite": request.benchmark_suite.as_str(),
+ "dimensions": run_dimensions_json(&dimensions),
"backend": format!("{:?}", request.backend),
"command": request.command.argv.iter().map(NonEmptyText::as_str).collect::<Vec<_>>(),
}))?,
@@ -695,7 +1154,7 @@ impl ProjectStore {
status: RunStatus::Succeeded,
backend: request.backend,
code_snapshot: request.code_snapshot,
- benchmark_suite: Some(request.benchmark_suite.clone()),
+ dimensions: dimensions.clone(),
command: request.command,
started_at: Some(now),
finished_at: Some(now),
@@ -748,7 +1207,7 @@ impl ProjectStore {
analysis_node_id: request.analysis_node_id,
decision_node_id: decision_node.id,
result: ExperimentResult {
- benchmark_suite: request.benchmark_suite,
+ dimensions: dimensions.clone(),
primary_metric: request.primary_metric,
supporting_metrics: request.supporting_metrics,
benchmark_bundle: None,
@@ -757,8 +1216,6 @@ impl ProjectStore {
verdict: request.verdict,
created_at: now,
};
-
- let tx = self.connection.transaction()?;
insert_node(&tx, &run_node)?;
insert_node(&tx, &decision_node)?;
insert_edge(
@@ -780,9 +1237,13 @@ impl ProjectStore {
insert_run(
&tx,
&run,
+ benchmark_suite.as_deref(),
&experiment.result.primary_metric,
+ &primary_metric_definition,
&experiment.result.supporting_metrics,
+ supporting_metric_definitions.as_slice(),
)?;
+ insert_run_dimensions(&tx, run.run_id, &dimensions)?;
match request.verdict {
FrontierVerdict::PromoteToChampion => {
demote_previous_champion(&tx, request.frontier_id)?;
@@ -887,6 +1348,375 @@ impl ProjectStore {
}
}
+fn upgrade_store(connection: &mut Connection) -> Result<(), StoreError> {
+ migrate(connection)?;
+ backfill_prose_summaries(connection)?;
+ let tx = connection.transaction()?;
+ let _ = normalize_metric_plane_tx(&tx)?;
+ tx.commit()?;
+ Ok(())
+}
+
+fn validate_prose_node_request(request: &CreateNodeRequest) -> Result<(), StoreError> {
+ if !matches!(request.class, NodeClass::Note | NodeClass::Research) {
+ return Ok(());
+ }
+ if request.summary.is_none() {
+ return Err(StoreError::ProseSummaryRequired(request.class));
+ }
+ match request.payload.field("body") {
+ Some(Value::String(body)) if !body.trim().is_empty() => Ok(()),
+ _ => Err(StoreError::ProseBodyRequired(request.class)),
+ }
+}
+
+#[derive(Clone, Debug)]
+struct MetricSample {
+ key: NonEmptyText,
+ source: MetricFieldSource,
+ value: f64,
+ frontier_id: fidget_spinner_core::FrontierId,
+ experiment_id: fidget_spinner_core::ExperimentId,
+ change_node_id: fidget_spinner_core::NodeId,
+ change_title: NonEmptyText,
+ run_id: fidget_spinner_core::RunId,
+ verdict: FrontierVerdict,
+ candidate_checkpoint_id: fidget_spinner_core::CheckpointId,
+ candidate_commit_hash: GitCommitHash,
+ unit: Option<MetricUnit>,
+ objective: Option<OptimizationObjective>,
+ dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
+}
+
+impl MetricSample {
+ fn into_entry(self, order: MetricRankOrder) -> MetricBestEntry {
+ MetricBestEntry {
+ key: self.key,
+ source: self.source,
+ value: self.value,
+ order,
+ experiment_id: self.experiment_id,
+ frontier_id: self.frontier_id,
+ change_node_id: self.change_node_id,
+ change_title: self.change_title,
+ run_id: self.run_id,
+ verdict: self.verdict,
+ candidate_checkpoint_id: self.candidate_checkpoint_id,
+ candidate_commit_hash: self.candidate_commit_hash,
+ unit: self.unit,
+ objective: self.objective,
+ dimensions: self.dimensions,
+ }
+ }
+}
+
+#[derive(Clone, Debug)]
+struct MetricKeyAccumulator {
+ key: NonEmptyText,
+ source: MetricFieldSource,
+ experiment_ids: BTreeSet<fidget_spinner_core::ExperimentId>,
+ unit: Option<MetricUnit>,
+ objective: Option<OptimizationObjective>,
+ ambiguous_semantics: bool,
+}
+
+impl MetricKeyAccumulator {
+ fn from_sample(sample: &MetricSample) -> Self {
+ Self {
+ key: sample.key.clone(),
+ source: sample.source,
+ experiment_ids: BTreeSet::from([sample.experiment_id]),
+ unit: sample.unit,
+ objective: sample.objective,
+ ambiguous_semantics: false,
+ }
+ }
+
+ fn observe(&mut self, sample: &MetricSample) {
+ let _ = self.experiment_ids.insert(sample.experiment_id);
+ if self.unit != sample.unit || self.objective != sample.objective {
+ self.ambiguous_semantics = true;
+ self.unit = None;
+ self.objective = None;
+ }
+ }
+
+ fn finish(self) -> MetricKeySummary {
+ MetricKeySummary {
+ key: self.key,
+ source: self.source,
+ experiment_count: self.experiment_ids.len() as u64,
+ unit: self.unit,
+ objective: self.objective,
+ description: None,
+ requires_order: self.source != MetricFieldSource::RunMetric
+ || self.ambiguous_semantics
+ || !matches!(
+ self.objective,
+ Some(OptimizationObjective::Minimize | OptimizationObjective::Maximize)
+ ),
+ }
+ }
+}
+
+fn collect_metric_samples(
+ store: &ProjectStore,
+ query: &MetricKeyQuery,
+) -> Result<Vec<MetricSample>, StoreError> {
+ let rows = load_experiment_rows(store)?;
+ let metric_definitions = metric_definitions_by_key(store)?;
+ let mut samples = Vec::new();
+ for row in rows {
+ if query
+ .frontier_id
+ .is_some_and(|frontier_id| row.frontier_id != frontier_id)
+ {
+ continue;
+ }
+ if !dimensions_match(&row.dimensions, &query.dimensions) {
+ continue;
+ }
+ samples.extend(metric_samples_for_row(
+ store.schema(),
+ &row,
+ &metric_definitions,
+ ));
+ }
+ Ok(if let Some(source) = query.source {
+ samples
+ .into_iter()
+ .filter(|sample| sample.source == source)
+ .collect()
+ } else {
+ samples
+ })
+}
+
+fn resolve_metric_order(
+ matching: &[MetricSample],
+ query: &MetricBestQuery,
+ source: MetricFieldSource,
+) -> Result<MetricRankOrder, StoreError> {
+ if let Some(order) = query.order {
+ return Ok(order);
+ }
+ if source != MetricFieldSource::RunMetric {
+ return Err(StoreError::MetricOrderRequired {
+ key: query.key.as_str().to_owned(),
+ metric_source: source.as_str().to_owned(),
+ });
+ }
+ let objectives = matching
+ .iter()
+ .map(|sample| sample.objective)
+ .collect::<BTreeSet<_>>();
+ match objectives.len() {
+ 1 => match objectives.into_iter().next().flatten() {
+ Some(OptimizationObjective::Minimize) => Ok(MetricRankOrder::Asc),
+ Some(OptimizationObjective::Maximize) => Ok(MetricRankOrder::Desc),
+ Some(OptimizationObjective::Target) | None => Err(StoreError::MetricOrderRequired {
+ key: query.key.as_str().to_owned(),
+ metric_source: source.as_str().to_owned(),
+ }),
+ },
+ _ => Err(StoreError::MetricSemanticsAmbiguous {
+ key: query.key.as_str().to_owned(),
+ metric_source: source.as_str().to_owned(),
+ }),
+ }
+}
+
+fn compare_metric_samples(
+ left: &MetricSample,
+ right: &MetricSample,
+ order: MetricRankOrder,
+) -> Ordering {
+ let metric_order = match order {
+ MetricRankOrder::Asc => left
+ .value
+ .partial_cmp(&right.value)
+ .unwrap_or(Ordering::Equal),
+ MetricRankOrder::Desc => right
+ .value
+ .partial_cmp(&left.value)
+ .unwrap_or(Ordering::Equal),
+ };
+ metric_order
+ .then_with(|| right.experiment_id.cmp(&left.experiment_id))
+ .then_with(|| left.key.cmp(&right.key))
+}
+
+#[derive(Clone, Debug)]
+struct ExperimentMetricRow {
+ experiment_id: fidget_spinner_core::ExperimentId,
+ frontier_id: fidget_spinner_core::FrontierId,
+ run_id: fidget_spinner_core::RunId,
+ verdict: FrontierVerdict,
+ candidate_checkpoint: CheckpointRecord,
+ change_node: DagNode,
+ run_node: DagNode,
+ analysis_node: Option<DagNode>,
+ decision_node: DagNode,
+ primary_metric: MetricValue,
+ supporting_metrics: Vec<MetricValue>,
+ dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
+}
+
+fn load_experiment_rows(store: &ProjectStore) -> Result<Vec<ExperimentMetricRow>, StoreError> {
+ let run_dimensions = load_run_dimensions_by_run_id(store)?;
+ let mut statement = store.connection.prepare(
+ "SELECT
+ id,
+ frontier_id,
+ run_id,
+ change_node_id,
+ run_node_id,
+ analysis_node_id,
+ decision_node_id,
+ candidate_checkpoint_id,
+ primary_metric_json,
+ supporting_metrics_json,
+ verdict
+ FROM experiments",
+ )?;
+ let mut rows = statement.query([])?;
+ let mut items = Vec::new();
+ while let Some(row) = rows.next()? {
+ let change_node_id = parse_node_id(&row.get::<_, String>(3)?)?;
+ let run_id = parse_run_id(&row.get::<_, String>(2)?)?;
+ let run_node_id = parse_node_id(&row.get::<_, String>(4)?)?;
+ let analysis_node_id = row
+ .get::<_, Option<String>>(5)?
+ .map(|raw| parse_node_id(&raw))
+ .transpose()?;
+ let decision_node_id = parse_node_id(&row.get::<_, String>(6)?)?;
+ let candidate_checkpoint_id = parse_checkpoint_id(&row.get::<_, String>(7)?)?;
+ items.push(ExperimentMetricRow {
+ experiment_id: parse_experiment_id(&row.get::<_, String>(0)?)?,
+ frontier_id: parse_frontier_id(&row.get::<_, String>(1)?)?,
+ run_id,
+ verdict: parse_frontier_verdict(&row.get::<_, String>(10)?)?,
+ candidate_checkpoint: store
+ .load_checkpoint(candidate_checkpoint_id)?
+ .ok_or(StoreError::CheckpointNotFound(candidate_checkpoint_id))?,
+ change_node: store
+ .get_node(change_node_id)?
+ .ok_or(StoreError::NodeNotFound(change_node_id))?,
+ run_node: store
+ .get_node(run_node_id)?
+ .ok_or(StoreError::NodeNotFound(run_node_id))?,
+ analysis_node: analysis_node_id
+ .map(|node_id| {
+ store
+ .get_node(node_id)?
+ .ok_or(StoreError::NodeNotFound(node_id))
+ })
+ .transpose()?,
+ decision_node: store
+ .get_node(decision_node_id)?
+ .ok_or(StoreError::NodeNotFound(decision_node_id))?,
+ primary_metric: decode_json(&row.get::<_, String>(8)?)?,
+ supporting_metrics: decode_json(&row.get::<_, String>(9)?)?,
+ dimensions: run_dimensions.get(&run_id).cloned().unwrap_or_default(),
+ });
+ }
+ Ok(items)
+}
+
+fn metric_samples_for_row(
+ schema: &ProjectSchema,
+ row: &ExperimentMetricRow,
+ metric_definitions: &BTreeMap<String, MetricDefinition>,
+) -> Vec<MetricSample> {
+ let mut samples = vec![metric_sample_from_observation(
+ row,
+ &row.primary_metric,
+ metric_definitions,
+ MetricFieldSource::RunMetric,
+ )];
+ samples.extend(row.supporting_metrics.iter().map(|metric| {
+ metric_sample_from_observation(
+ row,
+ metric,
+ metric_definitions,
+ MetricFieldSource::RunMetric,
+ )
+ }));
+ samples.extend(metric_samples_from_payload(schema, row, &row.change_node));
+ samples.extend(metric_samples_from_payload(schema, row, &row.run_node));
+ if let Some(node) = row.analysis_node.as_ref() {
+ samples.extend(metric_samples_from_payload(schema, row, node));
+ }
+ samples.extend(metric_samples_from_payload(schema, row, &row.decision_node));
+ samples
+}
+
+fn metric_sample_from_observation(
+ row: &ExperimentMetricRow,
+ metric: &MetricValue,
+ metric_definitions: &BTreeMap<String, MetricDefinition>,
+ source: MetricFieldSource,
+) -> MetricSample {
+ let registry = metric_definitions.get(metric.key.as_str());
+ MetricSample {
+ key: metric.key.clone(),
+ source,
+ value: metric.value,
+ frontier_id: row.frontier_id,
+ experiment_id: row.experiment_id,
+ change_node_id: row.change_node.id,
+ change_title: row.change_node.title.clone(),
+ run_id: row.run_id,
+ verdict: row.verdict,
+ candidate_checkpoint_id: row.candidate_checkpoint.id,
+ candidate_commit_hash: row.candidate_checkpoint.snapshot.commit_hash.clone(),
+ unit: registry.map(|definition| definition.unit),
+ objective: registry.map(|definition| definition.objective),
+ dimensions: row.dimensions.clone(),
+ }
+}
+
+fn metric_samples_from_payload(
+ schema: &ProjectSchema,
+ row: &ExperimentMetricRow,
+ node: &DagNode,
+) -> Vec<MetricSample> {
+ let Some(source) = MetricFieldSource::from_payload_class(node.class) else {
+ return Vec::new();
+ };
+ node.payload
+ .fields
+ .iter()
+ .filter_map(|(key, value)| {
+ let value = value.as_f64()?;
+ let spec = schema.field_spec(node.class, key);
+ if spec.is_some_and(|field| {
+ field
+ .value_type
+ .is_some_and(|kind| kind != FieldValueType::Numeric)
+ }) {
+ return None;
+ }
+ Some(MetricSample {
+ key: NonEmptyText::new(key.clone()).ok()?,
+ source,
+ value,
+ frontier_id: row.frontier_id,
+ experiment_id: row.experiment_id,
+ change_node_id: row.change_node.id,
+ change_title: row.change_node.title.clone(),
+ run_id: row.run_id,
+ verdict: row.verdict,
+ candidate_checkpoint_id: row.candidate_checkpoint.id,
+ candidate_commit_hash: row.candidate_checkpoint.snapshot.commit_hash.clone(),
+ unit: None,
+ objective: None,
+ dimensions: row.dimensions.clone(),
+ })
+ })
+ .collect()
+}
+
fn migrate(connection: &Connection) -> Result<(), StoreError> {
connection.execute_batch(
"
@@ -986,6 +1816,32 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> {
value REAL NOT NULL
);
+ CREATE TABLE IF NOT EXISTS metric_definitions (
+ metric_key TEXT PRIMARY KEY,
+ unit TEXT NOT NULL,
+ objective TEXT NOT NULL,
+ description TEXT,
+ created_at TEXT NOT NULL
+ );
+
+ CREATE TABLE IF NOT EXISTS run_dimension_definitions (
+ dimension_key TEXT PRIMARY KEY,
+ value_type TEXT NOT NULL,
+ description TEXT,
+ created_at TEXT NOT NULL
+ );
+
+ CREATE TABLE IF NOT EXISTS run_dimensions (
+ run_id TEXT NOT NULL REFERENCES runs(run_id) ON DELETE CASCADE,
+ dimension_key TEXT NOT NULL REFERENCES run_dimension_definitions(dimension_key) ON DELETE RESTRICT,
+ value_type TEXT NOT NULL,
+ value_text TEXT,
+ value_numeric REAL,
+ value_boolean INTEGER,
+ value_timestamp TEXT,
+ PRIMARY KEY (run_id, dimension_key)
+ );
+
CREATE TABLE IF NOT EXISTS experiments (
id TEXT PRIMARY KEY,
frontier_id TEXT NOT NULL REFERENCES frontiers(id) ON DELETE CASCADE,
@@ -1005,6 +1861,12 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> {
created_at TEXT NOT NULL
);
+ CREATE INDEX IF NOT EXISTS metrics_by_key ON metrics(metric_key);
+ CREATE INDEX IF NOT EXISTS run_dimensions_by_key_text ON run_dimensions(dimension_key, value_text);
+ CREATE INDEX IF NOT EXISTS run_dimensions_by_key_numeric ON run_dimensions(dimension_key, value_numeric);
+ CREATE INDEX IF NOT EXISTS run_dimensions_by_run ON run_dimensions(run_id, dimension_key);
+ CREATE INDEX IF NOT EXISTS experiments_by_frontier ON experiments(frontier_id, created_at DESC);
+
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entity_kind TEXT NOT NULL,
@@ -1018,6 +1880,672 @@ fn migrate(connection: &Connection) -> Result<(), StoreError> {
Ok(())
}
+fn backfill_prose_summaries(connection: &Connection) -> Result<(), StoreError> {
+ let mut statement = connection.prepare(
+ "SELECT id, payload_json
+ FROM nodes
+ WHERE class IN ('note', 'research')
+ AND (summary IS NULL OR trim(summary) = '')",
+ )?;
+ let mut rows = statement.query([])?;
+ let mut updates = Vec::new();
+ while let Some(row) = rows.next()? {
+ let node_id = row.get::<_, String>(0)?;
+ let payload = decode_json::<NodePayload>(&row.get::<_, String>(1)?)?;
+ let Some(Value::String(body)) = payload.field("body") else {
+ continue;
+ };
+ let Some(summary) = derive_summary_from_body(body) else {
+ continue;
+ };
+ updates.push((node_id, summary));
+ }
+ for (node_id, summary) in updates {
+ let _ = connection.execute(
+ "UPDATE nodes SET summary = ?1 WHERE id = ?2",
+ params![summary.as_str(), node_id],
+ )?;
+ }
+ Ok(())
+}
+
+fn sort_schema_fields(fields: &mut [ProjectFieldSpec]) {
+ fields.sort_by(|left, right| {
+ left.name
+ .cmp(&right.name)
+ .then_with(|| left.node_classes.iter().cmp(right.node_classes.iter()))
+ });
+}
+
+fn normalize_metric_plane_tx(
+ tx: &Transaction<'_>,
+) -> Result<MetricPlaneMigrationReport, StoreError> {
+ let mut report = MetricPlaneMigrationReport::default();
+
+ if insert_run_dimension_definition_tx(
+ tx,
+ &RunDimensionDefinition::new(
+ NonEmptyText::new("benchmark_suite")?,
+ FieldValueType::String,
+ Some(NonEmptyText::new("Legacy coarse benchmark label")?),
+ ),
+ )? {
+ report.inserted_dimension_definitions += 1;
+ }
+
+ {
+ let mut statement = tx.prepare(
+ "SELECT DISTINCT metric_key, unit, objective
+ FROM metrics
+ ORDER BY metric_key ASC",
+ )?;
+ let mut rows = statement.query([])?;
+ while let Some(row) = rows.next()? {
+ let definition = MetricDefinition::new(
+ NonEmptyText::new(row.get::<_, String>(0)?)?,
+ decode_metric_unit(&row.get::<_, String>(1)?)?,
+ decode_optimization_objective(&row.get::<_, String>(2)?)?,
+ None,
+ );
+ if upsert_metric_definition_tx(tx, &definition)? {
+ report.inserted_metric_definitions += 1;
+ }
+ }
+ }
+
+ {
+ let mut statement = tx.prepare(
+ "SELECT payload_json
+ FROM nodes
+ WHERE class = 'contract'",
+ )?;
+ let mut rows = statement.query([])?;
+ while let Some(row) = rows.next()? {
+ let payload = decode_json::<NodePayload>(&row.get::<_, String>(0)?)?;
+ for definition in contract_metric_definitions(&payload)? {
+ if upsert_metric_definition_tx(tx, &definition)? {
+ report.inserted_metric_definitions += 1;
+ }
+ }
+ }
+ }
+
+ {
+ let mut statement = tx.prepare(
+ "SELECT run_id, benchmark_suite
+ FROM runs
+ WHERE benchmark_suite IS NOT NULL
+ AND trim(benchmark_suite) != ''",
+ )?;
+ let mut rows = statement.query([])?;
+ while let Some(row) = rows.next()? {
+ let run_id = parse_run_id(&row.get::<_, String>(0)?)?;
+ let value = RunDimensionValue::String(NonEmptyText::new(row.get::<_, String>(1)?)?);
+ if insert_run_dimension_value_tx(
+ tx,
+ run_id,
+ &NonEmptyText::new("benchmark_suite")?,
+ &value,
+ )? {
+ report.inserted_dimension_values += 1;
+ }
+ }
+ }
+
+ Ok(report)
+}
+
+fn contract_metric_definitions(payload: &NodePayload) -> Result<Vec<MetricDefinition>, StoreError> {
+ let mut definitions = Vec::new();
+ if let Some(primary) = payload.field("primary_metric") {
+ definitions.push(metric_definition_from_json(primary, None)?);
+ }
+ if let Some(Value::Array(items)) = payload.field("supporting_metrics") {
+ for item in items {
+ definitions.push(metric_definition_from_json(item, None)?);
+ }
+ }
+ Ok(definitions)
+}
+
+fn metric_definition_from_json(
+ value: &Value,
+ description: Option<NonEmptyText>,
+) -> Result<MetricDefinition, StoreError> {
+ let Some(object) = value.as_object() else {
+ return Err(StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "metric definition payload must be an object",
+ ))));
+ };
+ let key = object
+ .get("metric_key")
+ .or_else(|| object.get("key"))
+ .and_then(Value::as_str)
+ .ok_or_else(|| {
+ StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "metric definition missing key",
+ )))
+ })?;
+ let unit = object.get("unit").and_then(Value::as_str).ok_or_else(|| {
+ StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "metric definition missing unit",
+ )))
+ })?;
+ let objective = object
+ .get("objective")
+ .and_then(Value::as_str)
+ .ok_or_else(|| {
+ StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "metric definition missing objective",
+ )))
+ })?;
+ Ok(MetricDefinition::new(
+ NonEmptyText::new(key)?,
+ decode_metric_unit(unit)?,
+ decode_optimization_objective(objective)?,
+ description,
+ ))
+}
+
+fn upsert_metric_definition_tx(
+ tx: &Transaction<'_>,
+ definition: &MetricDefinition,
+) -> Result<bool, StoreError> {
+ let existing = tx
+ .query_row(
+ "SELECT unit, objective, description
+ FROM metric_definitions
+ WHERE metric_key = ?1",
+ params![definition.key.as_str()],
+ |row| {
+ Ok((
+ row.get::<_, String>(0)?,
+ row.get::<_, String>(1)?,
+ row.get::<_, Option<String>>(2)?,
+ ))
+ },
+ )
+ .optional()?;
+ if let Some((existing_unit, existing_objective, existing_description)) = existing {
+ let new_unit = encode_metric_unit(definition.unit).to_owned();
+ let new_objective = encode_optimization_objective(definition.objective).to_owned();
+ if existing_unit != new_unit || existing_objective != new_objective {
+ return Err(StoreError::ConflictingMetricDefinition {
+ key: definition.key.as_str().to_owned(),
+ existing_unit,
+ existing_objective,
+ new_unit,
+ new_objective,
+ });
+ }
+ if existing_description.is_none() && definition.description.is_some() {
+ let _ = tx.execute(
+ "UPDATE metric_definitions SET description = ?2 WHERE metric_key = ?1",
+ params![
+ definition.key.as_str(),
+ definition.description.as_ref().map(NonEmptyText::as_str)
+ ],
+ )?;
+ }
+ Ok(false)
+ } else {
+ let _ = tx.execute(
+ "INSERT INTO metric_definitions (metric_key, unit, objective, description, created_at)
+ VALUES (?1, ?2, ?3, ?4, ?5)",
+ params![
+ definition.key.as_str(),
+ encode_metric_unit(definition.unit),
+ encode_optimization_objective(definition.objective),
+ definition.description.as_ref().map(NonEmptyText::as_str),
+ encode_timestamp(definition.created_at)?,
+ ],
+ )?;
+ Ok(true)
+ }
+}
+
+fn insert_run_dimension_definition_tx(
+ tx: &Transaction<'_>,
+ definition: &RunDimensionDefinition,
+) -> Result<bool, StoreError> {
+ let existing = tx
+ .query_row(
+ "SELECT value_type, description
+ FROM run_dimension_definitions
+ WHERE dimension_key = ?1",
+ params![definition.key.as_str()],
+ |row| Ok((row.get::<_, String>(0)?, row.get::<_, Option<String>>(1)?)),
+ )
+ .optional()?;
+ if let Some((existing_type, existing_description)) = existing {
+ let new_type = encode_field_value_type(definition.value_type).to_owned();
+ if existing_type != new_type {
+ return Err(StoreError::ConflictingRunDimensionDefinition {
+ key: definition.key.as_str().to_owned(),
+ existing_type,
+ new_type,
+ });
+ }
+ if existing_description.is_none() && definition.description.is_some() {
+ let _ = tx.execute(
+ "UPDATE run_dimension_definitions SET description = ?2 WHERE dimension_key = ?1",
+ params![
+ definition.key.as_str(),
+ definition.description.as_ref().map(NonEmptyText::as_str)
+ ],
+ )?;
+ }
+ Ok(false)
+ } else {
+ let _ = tx.execute(
+ "INSERT INTO run_dimension_definitions (dimension_key, value_type, description, created_at)
+ VALUES (?1, ?2, ?3, ?4)",
+ params![
+ definition.key.as_str(),
+ encode_field_value_type(definition.value_type),
+ definition.description.as_ref().map(NonEmptyText::as_str),
+ encode_timestamp(definition.created_at)?,
+ ],
+ )?;
+ Ok(true)
+ }
+}
+
+fn load_metric_definition_tx(
+ tx: &Transaction<'_>,
+ key: &NonEmptyText,
+) -> Result<Option<MetricDefinition>, StoreError> {
+ tx.query_row(
+ "SELECT metric_key, unit, objective, description, created_at
+ FROM metric_definitions
+ WHERE metric_key = ?1",
+ params![key.as_str()],
+ |row| {
+ Ok(MetricDefinition {
+ key: NonEmptyText::new(row.get::<_, String>(0)?)
+ .map_err(core_to_sql_conversion_error)?,
+ unit: decode_metric_unit(&row.get::<_, String>(1)?)
+ .map_err(to_sql_conversion_error)?,
+ objective: decode_optimization_objective(&row.get::<_, String>(2)?)
+ .map_err(to_sql_conversion_error)?,
+ description: row
+ .get::<_, Option<String>>(3)?
+ .map(NonEmptyText::new)
+ .transpose()
+ .map_err(core_to_sql_conversion_error)?,
+ created_at: decode_timestamp(&row.get::<_, String>(4)?)
+ .map_err(to_sql_conversion_error)?,
+ })
+ },
+ )
+ .optional()
+ .map_err(StoreError::from)
+}
+
+fn metric_definitions_by_key(
+ store: &ProjectStore,
+) -> Result<BTreeMap<String, MetricDefinition>, StoreError> {
+ Ok(store
+ .list_metric_definitions()?
+ .into_iter()
+ .map(|definition| (definition.key.as_str().to_owned(), definition))
+ .collect())
+}
+
+fn run_dimension_definitions_by_key(
+ store: &ProjectStore,
+) -> Result<BTreeMap<String, RunDimensionDefinition>, StoreError> {
+ let mut statement = store.connection.prepare(
+ "SELECT dimension_key, value_type, description, created_at
+ FROM run_dimension_definitions",
+ )?;
+ let mut rows = statement.query([])?;
+ let mut items = BTreeMap::new();
+ while let Some(row) = rows.next()? {
+ let definition = RunDimensionDefinition {
+ key: NonEmptyText::new(row.get::<_, String>(0)?)?,
+ value_type: decode_field_value_type(&row.get::<_, String>(1)?)?,
+ description: row
+ .get::<_, Option<String>>(2)?
+ .map(NonEmptyText::new)
+ .transpose()?,
+ created_at: decode_timestamp(&row.get::<_, String>(3)?)?,
+ };
+ let _ = items.insert(definition.key.as_str().to_owned(), definition);
+ }
+ Ok(items)
+}
+
+fn coerce_run_dimension_map(
+ definitions: &BTreeMap<String, RunDimensionDefinition>,
+ raw_dimensions: BTreeMap<String, Value>,
+) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, StoreError> {
+ let mut dimensions = BTreeMap::new();
+ for (raw_key, raw_value) in raw_dimensions {
+ let key = NonEmptyText::new(raw_key)?;
+ let Some(definition) = definitions.get(key.as_str()) else {
+ return Err(StoreError::UnknownRunDimension(key));
+ };
+ let value = coerce_run_dimension_value(definition, raw_value)?;
+ let _ = dimensions.insert(key, value);
+ }
+ Ok(dimensions)
+}
+
+fn coerce_run_dimension_value(
+ definition: &RunDimensionDefinition,
+ raw_value: Value,
+) -> Result<RunDimensionValue, StoreError> {
+ match definition.value_type {
+ FieldValueType::String => match raw_value {
+ Value::String(value) => Ok(RunDimensionValue::String(NonEmptyText::new(value)?)),
+ other => Err(StoreError::InvalidRunDimensionValue {
+ key: definition.key.as_str().to_owned(),
+ expected: definition.value_type.as_str().to_owned(),
+ observed: value_kind_name(&other).to_owned(),
+ }),
+ },
+ FieldValueType::Numeric => match raw_value.as_f64() {
+ Some(value) => Ok(RunDimensionValue::Numeric(value)),
+ None => Err(StoreError::InvalidRunDimensionValue {
+ key: definition.key.as_str().to_owned(),
+ expected: definition.value_type.as_str().to_owned(),
+ observed: value_kind_name(&raw_value).to_owned(),
+ }),
+ },
+ FieldValueType::Boolean => match raw_value {
+ Value::Bool(value) => Ok(RunDimensionValue::Boolean(value)),
+ other => Err(StoreError::InvalidRunDimensionValue {
+ key: definition.key.as_str().to_owned(),
+ expected: definition.value_type.as_str().to_owned(),
+ observed: value_kind_name(&other).to_owned(),
+ }),
+ },
+ FieldValueType::Timestamp => match raw_value {
+ Value::String(value) => {
+ let _ = OffsetDateTime::parse(&value, &Rfc3339)?;
+ Ok(RunDimensionValue::Timestamp(NonEmptyText::new(value)?))
+ }
+ other => Err(StoreError::InvalidRunDimensionValue {
+ key: definition.key.as_str().to_owned(),
+ expected: definition.value_type.as_str().to_owned(),
+ observed: value_kind_name(&other).to_owned(),
+ }),
+ },
+ }
+}
+
+fn insert_run_dimension_value_tx(
+ tx: &Transaction<'_>,
+ run_id: fidget_spinner_core::RunId,
+ key: &NonEmptyText,
+ value: &RunDimensionValue,
+) -> Result<bool, StoreError> {
+ let (value_text, value_numeric, value_boolean, value_timestamp) =
+ encode_run_dimension_columns(value)?;
+ let changed = tx.execute(
+ "INSERT OR IGNORE INTO run_dimensions (
+ run_id,
+ dimension_key,
+ value_type,
+ value_text,
+ value_numeric,
+ value_boolean,
+ value_timestamp
+ ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)",
+ params![
+ run_id.to_string(),
+ key.as_str(),
+ encode_field_value_type(value.value_type()),
+ value_text,
+ value_numeric,
+ value_boolean,
+ value_timestamp,
+ ],
+ )?;
+ Ok(changed > 0)
+}
+
+fn insert_run_dimensions(
+ tx: &Transaction<'_>,
+ run_id: fidget_spinner_core::RunId,
+ dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>,
+) -> Result<(), StoreError> {
+ for (key, value) in dimensions {
+ let _ = insert_run_dimension_value_tx(tx, run_id, key, value)?;
+ }
+ Ok(())
+}
+
+fn validate_run_dimensions_tx(
+ tx: &Transaction<'_>,
+ dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>,
+) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, StoreError> {
+ for (key, value) in dimensions {
+ let Some(expected_type) = tx
+ .query_row(
+ "SELECT value_type
+ FROM run_dimension_definitions
+ WHERE dimension_key = ?1",
+ params![key.as_str()],
+ |row| row.get::<_, String>(0),
+ )
+ .optional()?
+ else {
+ return Err(StoreError::UnknownRunDimension(key.clone()));
+ };
+ let expected_type = decode_field_value_type(&expected_type)?;
+ let observed_type = value.value_type();
+ if expected_type != observed_type {
+ return Err(StoreError::InvalidRunDimensionValue {
+ key: key.as_str().to_owned(),
+ expected: expected_type.as_str().to_owned(),
+ observed: observed_type.as_str().to_owned(),
+ });
+ }
+ if matches!(value, RunDimensionValue::Timestamp(raw) if OffsetDateTime::parse(raw.as_str(), &Rfc3339).is_err())
+ {
+ return Err(StoreError::InvalidRunDimensionValue {
+ key: key.as_str().to_owned(),
+ expected: FieldValueType::Timestamp.as_str().to_owned(),
+ observed: "string".to_owned(),
+ });
+ }
+ }
+ Ok(dimensions.clone())
+}
+
+fn load_run_dimensions_by_run_id(
+ store: &ProjectStore,
+) -> Result<
+ BTreeMap<fidget_spinner_core::RunId, BTreeMap<NonEmptyText, RunDimensionValue>>,
+ StoreError,
+> {
+ let mut statement = store.connection.prepare(
+ "SELECT run_id, dimension_key, value_type, value_text, value_numeric, value_boolean, value_timestamp
+ FROM run_dimensions
+ ORDER BY dimension_key ASC",
+ )?;
+ let mut rows = statement.query([])?;
+ let mut values =
+ BTreeMap::<fidget_spinner_core::RunId, BTreeMap<NonEmptyText, RunDimensionValue>>::new();
+ while let Some(row) = rows.next()? {
+ let run_id = parse_run_id(&row.get::<_, String>(0)?)?;
+ let key = NonEmptyText::new(row.get::<_, String>(1)?)?;
+ let value_type = decode_field_value_type(&row.get::<_, String>(2)?)?;
+ let value = decode_run_dimension_value(
+ value_type,
+ row.get::<_, Option<String>>(3)?,
+ row.get::<_, Option<f64>>(4)?,
+ row.get::<_, Option<i64>>(5)?,
+ row.get::<_, Option<String>>(6)?,
+ )?;
+ let _ = values.entry(run_id).or_default().insert(key, value);
+ }
+ Ok(values)
+}
+
+fn load_run_dimension_summaries(
+ store: &ProjectStore,
+) -> Result<Vec<RunDimensionSummary>, StoreError> {
+ let definitions = {
+ let mut statement = store.connection.prepare(
+ "SELECT dimension_key, value_type, description, created_at
+ FROM run_dimension_definitions
+ ORDER BY dimension_key ASC",
+ )?;
+ let mut rows = statement.query([])?;
+ let mut items = Vec::new();
+ while let Some(row) = rows.next()? {
+ items.push(RunDimensionDefinition {
+ key: NonEmptyText::new(row.get::<_, String>(0)?)?,
+ value_type: decode_field_value_type(&row.get::<_, String>(1)?)?,
+ description: row
+ .get::<_, Option<String>>(2)?
+ .map(NonEmptyText::new)
+ .transpose()?,
+ created_at: decode_timestamp(&row.get::<_, String>(3)?)?,
+ });
+ }
+ items
+ };
+
+ let mut summaries = Vec::new();
+ for definition in definitions {
+ let mut statement = store.connection.prepare(
+ "SELECT value_text, value_numeric, value_boolean, value_timestamp
+ FROM run_dimensions
+ WHERE dimension_key = ?1",
+ )?;
+ let mut rows = statement.query(params![definition.key.as_str()])?;
+ let mut observed_run_count = 0_u64;
+ let mut distinct = BTreeSet::new();
+ let mut sample_values = Vec::new();
+ while let Some(row) = rows.next()? {
+ observed_run_count += 1;
+ let value = decode_run_dimension_value(
+ definition.value_type,
+ row.get::<_, Option<String>>(0)?,
+ row.get::<_, Option<f64>>(1)?,
+ row.get::<_, Option<i64>>(2)?,
+ row.get::<_, Option<String>>(3)?,
+ )?;
+ let serialized = encode_json(&value.as_json())?;
+ if distinct.insert(serialized) && sample_values.len() < 5 {
+ sample_values.push(value.as_json());
+ }
+ }
+ summaries.push(RunDimensionSummary {
+ key: definition.key,
+ value_type: definition.value_type,
+ description: definition.description,
+ observed_run_count,
+ distinct_value_count: distinct.len() as u64,
+ sample_values,
+ });
+ }
+ Ok(summaries)
+}
+
+fn merge_registered_run_metric_summaries(
+ store: &ProjectStore,
+ summaries: &mut Vec<MetricKeySummary>,
+) -> Result<(), StoreError> {
+ let definitions = store.list_metric_definitions()?;
+ for definition in definitions {
+ if let Some(summary) = summaries.iter_mut().find(|summary| {
+ summary.source == MetricFieldSource::RunMetric && summary.key == definition.key
+ }) {
+ summary.unit = Some(definition.unit);
+ summary.objective = Some(definition.objective);
+ summary.description.clone_from(&definition.description);
+ summary.requires_order = matches!(definition.objective, OptimizationObjective::Target);
+ continue;
+ }
+ summaries.push(MetricKeySummary {
+ key: definition.key,
+ source: MetricFieldSource::RunMetric,
+ experiment_count: 0,
+ unit: Some(definition.unit),
+ objective: Some(definition.objective),
+ description: definition.description,
+ requires_order: matches!(definition.objective, OptimizationObjective::Target),
+ });
+ }
+ Ok(())
+}
+
+fn dimensions_match(
+ haystack: &BTreeMap<NonEmptyText, RunDimensionValue>,
+ needle: &BTreeMap<NonEmptyText, RunDimensionValue>,
+) -> bool {
+ needle
+ .iter()
+ .all(|(key, value)| haystack.get(key) == Some(value))
+}
+
+fn run_dimensions_json(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> Value {
+ Value::Object(
+ dimensions
+ .iter()
+ .map(|(key, value)| (key.to_string(), value.as_json()))
+ .collect::<serde_json::Map<String, Value>>(),
+ )
+}
+
+fn benchmark_suite_label(dimensions: &BTreeMap<NonEmptyText, RunDimensionValue>) -> Option<String> {
+ dimensions
+ .get(&NonEmptyText::new("benchmark_suite").ok()?)
+ .and_then(|value| match value {
+ RunDimensionValue::String(item) => Some(item.to_string()),
+ _ => None,
+ })
+ .or_else(|| {
+ if dimensions.is_empty() {
+ None
+ } else {
+ Some(
+ dimensions
+ .iter()
+ .map(|(key, value)| format!("{key}={}", dimension_value_text(value)))
+ .collect::<Vec<_>>()
+ .join(", "),
+ )
+ }
+ })
+}
+
+fn derive_summary_from_body(body: &str) -> Option<NonEmptyText> {
+ const MAX_SUMMARY_CHARS: usize = 240;
+
+ let paragraph = body
+ .split("\n\n")
+ .map(collapse_inline_whitespace)
+ .map(|text| text.trim().to_owned())
+ .find(|text| !text.is_empty())?;
+ let summary = truncate_chars(&paragraph, MAX_SUMMARY_CHARS);
+ NonEmptyText::new(summary).ok()
+}
+
+fn collapse_inline_whitespace(raw: &str) -> String {
+ raw.split_whitespace().collect::<Vec<_>>().join(" ")
+}
+
+fn truncate_chars(value: &str, max_chars: usize) -> String {
+ if value.chars().count() <= max_chars {
+ return value.to_owned();
+ }
+ let mut truncated = value.chars().take(max_chars).collect::<String>();
+ if let Some(index) = truncated.rfind(char::is_whitespace) {
+ truncated.truncate(index);
+ }
+ format!("{}…", truncated.trim_end())
+}
+
fn insert_node(tx: &Transaction<'_>, node: &DagNode) -> Result<(), StoreError> {
let schema_namespace = node
.payload
@@ -1210,8 +2738,11 @@ fn insert_checkpoint(
fn insert_run(
tx: &Transaction<'_>,
run: &RunRecord,
- primary_metric: &MetricObservation,
- supporting_metrics: &[MetricObservation],
+ benchmark_suite: Option<&str>,
+ primary_metric: &MetricValue,
+ primary_metric_definition: &MetricDefinition,
+ supporting_metrics: &[MetricValue],
+ supporting_metric_definitions: &[MetricDefinition],
) -> Result<(), StoreError> {
let (repo_root, worktree_root, worktree_name, head_commit, dirty_paths) = run
.code_snapshot
@@ -1273,7 +2804,7 @@ fn insert_run(
worktree_name.map(|item| item.to_string()),
head_commit.map(|item| item.to_string()),
dirty_paths_json,
- run.benchmark_suite.as_ref().map(NonEmptyText::as_str),
+ benchmark_suite,
run.command.working_directory.as_str(),
encode_json(&run.command.argv)?,
encode_json(&run.command.env)?,
@@ -1282,15 +2813,19 @@ fn insert_run(
],
)?;
- for metric in std::iter::once(primary_metric).chain(supporting_metrics.iter()) {
+ for (metric, definition) in std::iter::once((primary_metric, primary_metric_definition)).chain(
+ supporting_metrics
+ .iter()
+ .zip(supporting_metric_definitions.iter()),
+ ) {
let _ = tx.execute(
"INSERT INTO metrics (run_id, metric_key, unit, objective, value)
VALUES (?1, ?2, ?3, ?4, ?5)",
params![
run.run_id.to_string(),
- metric.metric_key.as_str(),
- encode_metric_unit(metric.unit),
- encode_optimization_objective(metric.objective),
+ metric.key.as_str(),
+ encode_metric_unit(definition.unit),
+ encode_optimization_objective(definition.objective),
metric.value,
],
)?;
@@ -1331,7 +2866,7 @@ fn insert_experiment(
experiment.run_id.to_string(),
experiment.analysis_node_id.map(|id| id.to_string()),
experiment.decision_node_id.to_string(),
- experiment.result.benchmark_suite.as_str(),
+ benchmark_suite_label(&experiment.result.dimensions),
encode_json(&experiment.result.primary_metric)?,
encode_json(&experiment.result.supporting_metrics)?,
experiment.note.summary.as_str(),
@@ -1623,6 +3158,16 @@ fn parse_checkpoint_id(raw: &str) -> Result<fidget_spinner_core::CheckpointId, S
)?))
}
+fn parse_experiment_id(raw: &str) -> Result<fidget_spinner_core::ExperimentId, StoreError> {
+ Ok(fidget_spinner_core::ExperimentId::from_uuid(parse_uuid(
+ raw,
+ )?))
+}
+
+fn parse_run_id(raw: &str) -> Result<fidget_spinner_core::RunId, StoreError> {
+ Ok(fidget_spinner_core::RunId::from_uuid(parse_uuid(raw)?))
+}
+
fn parse_agent_session_id(raw: &str) -> Result<fidget_spinner_core::AgentSessionId, StoreError> {
Ok(fidget_spinner_core::AgentSessionId::from_uuid(parse_uuid(
raw,
@@ -1762,6 +3307,23 @@ fn encode_backend(backend: ExecutionBackend) -> &'static str {
}
}
+fn encode_field_value_type(value_type: FieldValueType) -> &'static str {
+ value_type.as_str()
+}
+
+fn decode_field_value_type(raw: &str) -> Result<FieldValueType, StoreError> {
+ match raw {
+ "string" => Ok(FieldValueType::String),
+ "numeric" => Ok(FieldValueType::Numeric),
+ "boolean" => Ok(FieldValueType::Boolean),
+ "timestamp" => Ok(FieldValueType::Timestamp),
+ other => Err(StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ format!("unknown field value type `{other}`"),
+ )))),
+ }
+}
+
fn encode_metric_unit(unit: MetricUnit) -> &'static str {
match unit {
MetricUnit::Seconds => "seconds",
@@ -1772,6 +3334,20 @@ fn encode_metric_unit(unit: MetricUnit) -> &'static str {
}
}
+fn decode_metric_unit(raw: &str) -> Result<MetricUnit, StoreError> {
+ match raw {
+ "seconds" => Ok(MetricUnit::Seconds),
+ "bytes" => Ok(MetricUnit::Bytes),
+ "count" => Ok(MetricUnit::Count),
+ "ratio" => Ok(MetricUnit::Ratio),
+ "custom" => Ok(MetricUnit::Custom),
+ other => Err(StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ format!("unknown metric unit `{other}`"),
+ )))),
+ }
+}
+
fn encode_optimization_objective(objective: OptimizationObjective) -> &'static str {
match objective {
OptimizationObjective::Minimize => "minimize",
@@ -1780,6 +3356,18 @@ fn encode_optimization_objective(objective: OptimizationObjective) -> &'static s
}
}
+fn decode_optimization_objective(raw: &str) -> Result<OptimizationObjective, StoreError> {
+ match raw {
+ "minimize" => Ok(OptimizationObjective::Minimize),
+ "maximize" => Ok(OptimizationObjective::Maximize),
+ "target" => Ok(OptimizationObjective::Target),
+ other => Err(StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ format!("unknown optimization objective `{other}`"),
+ )))),
+ }
+}
+
fn encode_frontier_verdict(verdict: FrontierVerdict) -> &'static str {
match verdict {
FrontierVerdict::PromoteToChampion => "promote-to-champion",
@@ -1790,18 +3378,117 @@ fn encode_frontier_verdict(verdict: FrontierVerdict) -> &'static str {
}
}
+fn parse_frontier_verdict(raw: &str) -> Result<FrontierVerdict, StoreError> {
+ match raw {
+ "promote-to-champion" => Ok(FrontierVerdict::PromoteToChampion),
+ "keep-on-frontier" => Ok(FrontierVerdict::KeepOnFrontier),
+ "revert-to-champion" => Ok(FrontierVerdict::RevertToChampion),
+ "archive-dead-end" => Ok(FrontierVerdict::ArchiveDeadEnd),
+ "needs-more-evidence" => Ok(FrontierVerdict::NeedsMoreEvidence),
+ other => Err(StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ format!("unknown frontier verdict `{other}`"),
+ )))),
+ }
+}
+
+type RunDimensionColumns = (Option<String>, Option<f64>, Option<i64>, Option<String>);
+
+fn encode_run_dimension_columns(
+ value: &RunDimensionValue,
+) -> Result<RunDimensionColumns, StoreError> {
+ match value {
+ RunDimensionValue::String(item) => Ok((Some(item.to_string()), None, None, None)),
+ RunDimensionValue::Numeric(item) => Ok((None, Some(*item), None, None)),
+ RunDimensionValue::Boolean(item) => Ok((None, None, Some(i64::from(*item)), None)),
+ RunDimensionValue::Timestamp(item) => {
+ let _ = OffsetDateTime::parse(item.as_str(), &Rfc3339)?;
+ Ok((None, None, None, Some(item.to_string())))
+ }
+ }
+}
+
+fn decode_run_dimension_value(
+ value_type: FieldValueType,
+ value_text: Option<String>,
+ value_numeric: Option<f64>,
+ value_boolean: Option<i64>,
+ value_timestamp: Option<String>,
+) -> Result<RunDimensionValue, StoreError> {
+ match value_type {
+ FieldValueType::String => Ok(RunDimensionValue::String(NonEmptyText::new(
+ value_text.ok_or_else(|| {
+ StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "missing string dimension value",
+ )))
+ })?,
+ )?)),
+ FieldValueType::Numeric => Ok(RunDimensionValue::Numeric(value_numeric.ok_or_else(
+ || {
+ StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "missing numeric dimension value",
+ )))
+ },
+ )?)),
+ FieldValueType::Boolean => Ok(RunDimensionValue::Boolean(
+ value_boolean.ok_or_else(|| {
+ StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "missing boolean dimension value",
+ )))
+ })? != 0,
+ )),
+ FieldValueType::Timestamp => {
+ let value = value_timestamp.ok_or_else(|| {
+ StoreError::Json(serde_json::Error::io(io::Error::new(
+ io::ErrorKind::InvalidData,
+ "missing timestamp dimension value",
+ )))
+ })?;
+ let _ = OffsetDateTime::parse(&value, &Rfc3339)?;
+ Ok(RunDimensionValue::Timestamp(NonEmptyText::new(value)?))
+ }
+ }
+}
+
+fn dimension_value_text(value: &RunDimensionValue) -> String {
+ match value {
+ RunDimensionValue::String(item) | RunDimensionValue::Timestamp(item) => item.to_string(),
+ RunDimensionValue::Numeric(item) => item.to_string(),
+ RunDimensionValue::Boolean(item) => item.to_string(),
+ }
+}
+
+fn value_kind_name(value: &Value) -> &'static str {
+ match value {
+ Value::Null => "null",
+ Value::Bool(_) => "boolean",
+ Value::Number(_) => "numeric",
+ Value::String(_) => "string",
+ Value::Array(_) => "array",
+ Value::Object(_) => "object",
+ }
+}
+
#[cfg(test)]
mod tests {
- use std::collections::BTreeSet;
+ use std::collections::{BTreeMap, BTreeSet};
use serde_json::json;
use super::{
- CreateFrontierRequest, CreateNodeRequest, ListNodesQuery, PROJECT_SCHEMA_NAME, ProjectStore,
+ CloseExperimentRequest, CreateFrontierRequest, CreateNodeRequest, DefineMetricRequest,
+ DefineRunDimensionRequest, ListNodesQuery, MetricBestQuery, MetricFieldSource,
+ MetricKeyQuery, MetricRankOrder, PROJECT_SCHEMA_NAME, ProjectStore,
+ RemoveSchemaFieldRequest, UpsertSchemaFieldRequest,
};
use fidget_spinner_core::{
- CheckpointSnapshotRef, EvaluationProtocol, FrontierContract, MetricSpec, MetricUnit,
- NodeAnnotation, NodeClass, NodePayload, NonEmptyText, OptimizationObjective, TagName,
+ CheckpointSnapshotRef, CommandRecipe, DiagnosticSeverity, EvaluationProtocol,
+ FieldPresence, FieldRole, FieldValueType, FrontierContract, FrontierNote, FrontierVerdict,
+ GitCommitHash, InferencePolicy, MetricSpec, MetricUnit, MetricValue, NodeAnnotation,
+ NodeClass, NodePayload, NonEmptyText, OptimizationObjective, RunDimensionValue, TagName,
};
fn temp_project_root(label: &str) -> camino::Utf8PathBuf {
@@ -1893,7 +3580,7 @@ mod tests {
repo_root: root.clone(),
worktree_root: root,
worktree_name: Some(NonEmptyText::new("main")?),
- commit_hash: fidget_spinner_core::GitCommitHash::new("0123456789abcdef")?,
+ commit_hash: GitCommitHash::new("0123456789abcdef")?,
},
}),
})?;
@@ -1914,7 +3601,7 @@ mod tests {
class: NodeClass::Note,
frontier_id: None,
title: NonEmptyText::new("quick note")?,
- summary: None,
+ summary: Some(NonEmptyText::new("quick note summary")?),
tags: Some(BTreeSet::new()),
payload: NodePayload::with_schema(
store.schema().schema_ref(),
@@ -1987,7 +3674,7 @@ mod tests {
class: NodeClass::Note,
frontier_id: None,
title: NonEmptyText::new("quick note")?,
- summary: None,
+ summary: Some(NonEmptyText::new("quick note summary")?),
tags: None,
payload: NodePayload::with_schema(
store.schema().schema_ref(),
@@ -2021,7 +3708,7 @@ mod tests {
class: NodeClass::Note,
frontier_id: None,
title: NonEmptyText::new("tagged note")?,
- summary: None,
+ summary: Some(NonEmptyText::new("tagged note summary")?),
tags: Some(BTreeSet::from([cuts.name.clone(), heuristics.name.clone()])),
payload: NodePayload::with_schema(
store.schema().schema_ref(),
@@ -2044,4 +3731,506 @@ mod tests {
assert_eq!(filtered[0].tags.len(), 2);
Ok(())
}
+
+ #[test]
+ fn prose_nodes_require_summary_and_body() -> Result<(), super::StoreError> {
+ let root = temp_project_root("prose-summary");
+ let mut store = ProjectStore::init(
+ &root,
+ NonEmptyText::new("test project")?,
+ NonEmptyText::new("local.test")?,
+ )?;
+
+ let missing_summary = store.add_node(CreateNodeRequest {
+ class: NodeClass::Research,
+ frontier_id: None,
+ title: NonEmptyText::new("research note")?,
+ summary: None,
+ tags: None,
+ payload: NodePayload::with_schema(
+ store.schema().schema_ref(),
+ super::json_object(json!({"body": "research body"}))?,
+ ),
+ annotations: Vec::new(),
+ attachments: Vec::new(),
+ });
+ assert!(matches!(
+ missing_summary,
+ Err(super::StoreError::ProseSummaryRequired(NodeClass::Research))
+ ));
+
+ let missing_body = store.add_node(CreateNodeRequest {
+ class: NodeClass::Note,
+ frontier_id: None,
+ title: NonEmptyText::new("quick note")?,
+ summary: Some(NonEmptyText::new("quick note summary")?),
+ tags: Some(BTreeSet::new()),
+ payload: NodePayload::with_schema(store.schema().schema_ref(), serde_json::Map::new()),
+ annotations: Vec::new(),
+ attachments: Vec::new(),
+ });
+ assert!(matches!(
+ missing_body,
+ Err(super::StoreError::ProseBodyRequired(NodeClass::Note))
+ ));
+ Ok(())
+ }
+
+ #[test]
+ fn opening_store_backfills_missing_prose_summaries() -> Result<(), super::StoreError> {
+ let root = temp_project_root("summary-backfill");
+ let mut store = ProjectStore::init(
+ &root,
+ NonEmptyText::new("test project")?,
+ NonEmptyText::new("local.test")?,
+ )?;
+ let node = store.add_node(CreateNodeRequest {
+ class: NodeClass::Research,
+ frontier_id: None,
+ title: NonEmptyText::new("research note")?,
+ summary: Some(NonEmptyText::new("temporary summary")?),
+ tags: None,
+ payload: NodePayload::with_schema(
+ store.schema().schema_ref(),
+ super::json_object(json!({"body": "First paragraph.\n\nSecond paragraph."}))?,
+ ),
+ annotations: Vec::new(),
+ attachments: Vec::new(),
+ })?;
+ drop(store);
+
+ let connection = rusqlite::Connection::open(
+ root.join(super::STORE_DIR_NAME)
+ .join(super::STATE_DB_NAME)
+ .as_std_path(),
+ )?;
+ let _ = connection.execute(
+ "UPDATE nodes SET summary = NULL WHERE id = ?1",
+ rusqlite::params![node.id.to_string()],
+ )?;
+ drop(connection);
+
+ let reopened = ProjectStore::open(&root)?;
+ let loaded = reopened
+ .get_node(node.id)?
+ .ok_or(super::StoreError::NodeNotFound(node.id))?;
+ assert_eq!(
+ loaded.summary.as_ref().map(NonEmptyText::as_str),
+ Some("First paragraph.")
+ );
+ Ok(())
+ }
+
+ #[test]
+ fn schema_field_upsert_remove_persists_and_bumps_version() -> Result<(), super::StoreError> {
+ let root = temp_project_root("schema-upsert-remove");
+ let mut store = ProjectStore::init(
+ &root,
+ NonEmptyText::new("test project")?,
+ NonEmptyText::new("local.test")?,
+ )?;
+ let initial_version = store.schema().version;
+
+ let field = store.upsert_schema_field(UpsertSchemaFieldRequest {
+ name: NonEmptyText::new("scenario")?,
+ node_classes: BTreeSet::from([NodeClass::Change, NodeClass::Analysis]),
+ presence: FieldPresence::Recommended,
+ severity: DiagnosticSeverity::Warning,
+ role: FieldRole::ProjectionGate,
+ inference_policy: InferencePolicy::ManualOnly,
+ value_type: Some(FieldValueType::String),
+ })?;
+ assert_eq!(field.name.as_str(), "scenario");
+ assert_eq!(store.schema().version, initial_version + 1);
+ assert!(
+ store
+ .schema()
+ .fields
+ .iter()
+ .any(|item| item.name.as_str() == "scenario")
+ );
+ drop(store);
+
+ let mut reopened = ProjectStore::open(&root)?;
+ assert_eq!(reopened.schema().version, initial_version + 1);
+ assert!(
+ reopened
+ .schema()
+ .fields
+ .iter()
+ .any(|item| item.name.as_str() == "scenario")
+ );
+
+ let removed = reopened.remove_schema_field(RemoveSchemaFieldRequest {
+ name: NonEmptyText::new("scenario")?,
+ node_classes: Some(BTreeSet::from([NodeClass::Change, NodeClass::Analysis])),
+ })?;
+ assert_eq!(removed, 1);
+ assert_eq!(reopened.schema().version, initial_version + 2);
+ assert!(
+ !reopened
+ .schema()
+ .fields
+ .iter()
+ .any(|item| item.name.as_str() == "scenario")
+ );
+ Ok(())
+ }
+
+ #[test]
+ fn metric_queries_surface_canonical_and_payload_numeric_fields() -> Result<(), super::StoreError>
+ {
+ let root = temp_project_root("metric-best");
+ let mut store = ProjectStore::init(
+ &root,
+ NonEmptyText::new("test project")?,
+ NonEmptyText::new("local.test")?,
+ )?;
+ let projection = store.create_frontier(CreateFrontierRequest {
+ label: NonEmptyText::new("optimization frontier")?,
+ contract_title: NonEmptyText::new("contract root")?,
+ contract_summary: None,
+ contract: FrontierContract {
+ objective: NonEmptyText::new("improve wall time")?,
+ evaluation: EvaluationProtocol {
+ benchmark_suites: BTreeSet::from([NonEmptyText::new("smoke")?]),
+ primary_metric: MetricSpec {
+ metric_key: NonEmptyText::new("wall_clock_s")?,
+ unit: MetricUnit::Seconds,
+ objective: OptimizationObjective::Minimize,
+ },
+ supporting_metrics: BTreeSet::new(),
+ },
+ promotion_criteria: vec![NonEmptyText::new("strict speedup")?],
+ },
+ initial_checkpoint: Some(super::CheckpointSeed {
+ summary: NonEmptyText::new("seed")?,
+ snapshot: checkpoint_snapshot(&root, "aaaaaaaaaaaaaaaa")?,
+ }),
+ })?;
+ let frontier_id = projection.frontier.id;
+ let base_checkpoint_id = projection
+ .champion_checkpoint_id
+ .ok_or_else(|| super::StoreError::MissingChampionCheckpoint { frontier_id })?;
+ let _ = store.define_metric(DefineMetricRequest {
+ key: NonEmptyText::new("wall_clock_s")?,
+ unit: MetricUnit::Seconds,
+ objective: OptimizationObjective::Minimize,
+ description: Some(NonEmptyText::new("elapsed wall time")?),
+ })?;
+ let _ = store.define_run_dimension(DefineRunDimensionRequest {
+ key: NonEmptyText::new("scenario")?,
+ value_type: FieldValueType::String,
+ description: Some(NonEmptyText::new("workload family")?),
+ })?;
+ let _ = store.define_run_dimension(DefineRunDimensionRequest {
+ key: NonEmptyText::new("duration_s")?,
+ value_type: FieldValueType::Numeric,
+ description: Some(NonEmptyText::new("time budget in seconds")?),
+ })?;
+
+ let first_change = store.add_node(CreateNodeRequest {
+ class: NodeClass::Change,
+ frontier_id: Some(frontier_id),
+ title: NonEmptyText::new("first change")?,
+ summary: Some(NonEmptyText::new("first change summary")?),
+ tags: None,
+ payload: NodePayload::with_schema(
+ store.schema().schema_ref(),
+ super::json_object(json!({"body": "first body", "latency_hint": 14.0}))?,
+ ),
+ annotations: Vec::new(),
+ attachments: Vec::new(),
+ })?;
+ let second_change = store.add_node(CreateNodeRequest {
+ class: NodeClass::Change,
+ frontier_id: Some(frontier_id),
+ title: NonEmptyText::new("second change")?,
+ summary: Some(NonEmptyText::new("second change summary")?),
+ tags: None,
+ payload: NodePayload::with_schema(
+ store.schema().schema_ref(),
+ super::json_object(json!({"body": "second body", "latency_hint": 7.0}))?,
+ ),
+ annotations: Vec::new(),
+ attachments: Vec::new(),
+ })?;
+
+ let first_receipt = store.close_experiment(experiment_request(
+ &root,
+ frontier_id,
+ base_checkpoint_id,
+ first_change.id,
+ "bbbbbbbbbbbbbbbb",
+ "first run",
+ 10.0,
+ run_dimensions("belt_4x5", 20.0)?,
+ )?)?;
+ let second_receipt = store.close_experiment(experiment_request(
+ &root,
+ frontier_id,
+ base_checkpoint_id,
+ second_change.id,
+ "cccccccccccccccc",
+ "second run",
+ 5.0,
+ run_dimensions("belt_4x5", 60.0)?,
+ )?)?;
+
+ let keys = store.list_metric_keys()?;
+ assert!(keys.iter().any(|key| {
+ key.key.as_str() == "wall_clock_s" && key.source == MetricFieldSource::RunMetric
+ }));
+ assert!(keys.iter().any(|key| {
+ key.key.as_str() == "latency_hint" && key.source == MetricFieldSource::ChangePayload
+ }));
+ assert!(keys.iter().any(|key| {
+ key.key.as_str() == "wall_clock_s"
+ && key.source == MetricFieldSource::RunMetric
+ && key.description.as_ref().map(NonEmptyText::as_str) == Some("elapsed wall time")
+ }));
+
+ let filtered_keys = store.list_metric_keys_filtered(MetricKeyQuery {
+ frontier_id: Some(frontier_id),
+ source: Some(MetricFieldSource::RunMetric),
+ dimensions: run_dimensions("belt_4x5", 60.0)?,
+ })?;
+ assert_eq!(filtered_keys.len(), 1);
+ assert_eq!(filtered_keys[0].experiment_count, 1);
+
+ let dimension_summaries = store.list_run_dimensions()?;
+ assert!(dimension_summaries.iter().any(|dimension| {
+ dimension.key.as_str() == "benchmark_suite"
+ && dimension.value_type == FieldValueType::String
+ && dimension.observed_run_count == 2
+ }));
+ assert!(dimension_summaries.iter().any(|dimension| {
+ dimension.key.as_str() == "scenario"
+ && dimension.description.as_ref().map(NonEmptyText::as_str)
+ == Some("workload family")
+ }));
+ assert!(dimension_summaries.iter().any(|dimension| {
+ dimension.key.as_str() == "duration_s"
+ && dimension.value_type == FieldValueType::Numeric
+ && dimension.distinct_value_count == 2
+ }));
+
+ let canonical_best = store.best_metrics(MetricBestQuery {
+ key: NonEmptyText::new("wall_clock_s")?,
+ frontier_id: Some(frontier_id),
+ source: Some(MetricFieldSource::RunMetric),
+ dimensions: run_dimensions("belt_4x5", 60.0)?,
+ order: None,
+ limit: 5,
+ })?;
+ assert_eq!(canonical_best.len(), 1);
+ assert_eq!(canonical_best[0].value, 5.0);
+ assert_eq!(
+ canonical_best[0].candidate_checkpoint_id,
+ second_receipt.checkpoint.id
+ );
+ assert_eq!(
+ canonical_best[0]
+ .dimensions
+ .get(&NonEmptyText::new("duration_s")?),
+ Some(&RunDimensionValue::Numeric(60.0))
+ );
+
+ let payload_best = store.best_metrics(MetricBestQuery {
+ key: NonEmptyText::new("latency_hint")?,
+ frontier_id: Some(frontier_id),
+ source: Some(MetricFieldSource::ChangePayload),
+ dimensions: run_dimensions("belt_4x5", 60.0)?,
+ order: Some(MetricRankOrder::Asc),
+ limit: 5,
+ })?;
+ assert_eq!(payload_best.len(), 1);
+ assert_eq!(payload_best[0].value, 7.0);
+ assert_eq!(payload_best[0].change_node_id, second_change.id);
+
+ let missing_order = store.best_metrics(MetricBestQuery {
+ key: NonEmptyText::new("latency_hint")?,
+ frontier_id: Some(frontier_id),
+ source: Some(MetricFieldSource::ChangePayload),
+ dimensions: BTreeMap::new(),
+ order: None,
+ limit: 5,
+ });
+ assert!(matches!(
+ missing_order,
+ Err(super::StoreError::MetricOrderRequired { .. })
+ ));
+ assert_eq!(
+ first_receipt.checkpoint.snapshot.commit_hash.as_str(),
+ "bbbbbbbbbbbbbbbb"
+ );
+ Ok(())
+ }
+
+ #[test]
+ fn opening_store_backfills_legacy_benchmark_suite_dimensions() -> Result<(), super::StoreError>
+ {
+ let root = temp_project_root("metric-plane-backfill");
+ let mut store = ProjectStore::init(
+ &root,
+ NonEmptyText::new("test project")?,
+ NonEmptyText::new("local.test")?,
+ )?;
+ let projection = store.create_frontier(CreateFrontierRequest {
+ label: NonEmptyText::new("migration frontier")?,
+ contract_title: NonEmptyText::new("migration contract")?,
+ contract_summary: None,
+ contract: FrontierContract {
+ objective: NonEmptyText::new("exercise metric migration")?,
+ evaluation: EvaluationProtocol {
+ benchmark_suites: BTreeSet::from([NonEmptyText::new("smoke")?]),
+ primary_metric: MetricSpec {
+ metric_key: NonEmptyText::new("wall_clock_s")?,
+ unit: MetricUnit::Seconds,
+ objective: OptimizationObjective::Minimize,
+ },
+ supporting_metrics: BTreeSet::new(),
+ },
+ promotion_criteria: vec![NonEmptyText::new("keep the metric plane queryable")?],
+ },
+ initial_checkpoint: Some(super::CheckpointSeed {
+ summary: NonEmptyText::new("seed")?,
+ snapshot: checkpoint_snapshot(&root, "aaaaaaaaaaaaaaaa")?,
+ }),
+ })?;
+ let frontier_id = projection.frontier.id;
+ let base_checkpoint_id = projection
+ .champion_checkpoint_id
+ .ok_or_else(|| super::StoreError::MissingChampionCheckpoint { frontier_id })?;
+ let change = store.add_node(CreateNodeRequest {
+ class: NodeClass::Change,
+ frontier_id: Some(frontier_id),
+ title: NonEmptyText::new("candidate change")?,
+ summary: Some(NonEmptyText::new("candidate change summary")?),
+ tags: None,
+ payload: NodePayload::with_schema(
+ store.schema().schema_ref(),
+ super::json_object(json!({"latency_hint": 9.0}))?,
+ ),
+ annotations: Vec::new(),
+ attachments: Vec::new(),
+ })?;
+ let _ = store.close_experiment(experiment_request(
+ &root,
+ frontier_id,
+ base_checkpoint_id,
+ change.id,
+ "bbbbbbbbbbbbbbbb",
+ "migration run",
+ 11.0,
+ BTreeMap::from([(
+ NonEmptyText::new("benchmark_suite")?,
+ RunDimensionValue::String(NonEmptyText::new("smoke")?),
+ )]),
+ )?)?;
+ drop(store);
+
+ let connection = rusqlite::Connection::open(
+ root.join(super::STORE_DIR_NAME)
+ .join(super::STATE_DB_NAME)
+ .as_std_path(),
+ )?;
+ let _ = connection.execute("DELETE FROM run_dimensions", [])?;
+ drop(connection);
+
+ let reopened = ProjectStore::open(&root)?;
+ let dimensions = reopened.list_run_dimensions()?;
+ assert!(dimensions.iter().any(|dimension| {
+ dimension.key.as_str() == "benchmark_suite" && dimension.observed_run_count == 1
+ }));
+
+ let best = reopened.best_metrics(MetricBestQuery {
+ key: NonEmptyText::new("wall_clock_s")?,
+ frontier_id: Some(frontier_id),
+ source: Some(MetricFieldSource::RunMetric),
+ dimensions: BTreeMap::from([(
+ NonEmptyText::new("benchmark_suite")?,
+ RunDimensionValue::String(NonEmptyText::new("smoke")?),
+ )]),
+ order: None,
+ limit: 5,
+ })?;
+ assert_eq!(best.len(), 1);
+ assert_eq!(best[0].value, 11.0);
+ Ok(())
+ }
+
+ fn checkpoint_snapshot(
+ root: &camino::Utf8Path,
+ commit: &str,
+ ) -> Result<CheckpointSnapshotRef, super::StoreError> {
+ Ok(CheckpointSnapshotRef {
+ repo_root: root.to_path_buf(),
+ worktree_root: root.to_path_buf(),
+ worktree_name: Some(NonEmptyText::new("main")?),
+ commit_hash: GitCommitHash::new(commit)?,
+ })
+ }
+
+ fn experiment_request(
+ root: &camino::Utf8Path,
+ frontier_id: fidget_spinner_core::FrontierId,
+ base_checkpoint_id: fidget_spinner_core::CheckpointId,
+ change_node_id: fidget_spinner_core::NodeId,
+ candidate_commit: &str,
+ run_title: &str,
+ wall_clock_s: f64,
+ dimensions: BTreeMap<NonEmptyText, RunDimensionValue>,
+ ) -> Result<CloseExperimentRequest, super::StoreError> {
+ Ok(CloseExperimentRequest {
+ frontier_id,
+ base_checkpoint_id,
+ change_node_id,
+ candidate_summary: NonEmptyText::new(format!("candidate {candidate_commit}"))?,
+ candidate_snapshot: checkpoint_snapshot(root, candidate_commit)?,
+ run_title: NonEmptyText::new(run_title)?,
+ run_summary: Some(NonEmptyText::new("run summary")?),
+ backend: fidget_spinner_core::ExecutionBackend::WorktreeProcess,
+ dimensions,
+ command: CommandRecipe::new(
+ root.to_path_buf(),
+ vec![NonEmptyText::new("true")?],
+ BTreeMap::new(),
+ )?,
+ code_snapshot: None,
+ primary_metric: MetricValue {
+ key: NonEmptyText::new("wall_clock_s")?,
+ value: wall_clock_s,
+ },
+ supporting_metrics: Vec::new(),
+ note: FrontierNote {
+ summary: NonEmptyText::new("note summary")?,
+ next_hypotheses: Vec::new(),
+ },
+ verdict: FrontierVerdict::KeepOnFrontier,
+ decision_title: NonEmptyText::new("decision")?,
+ decision_rationale: NonEmptyText::new("decision rationale")?,
+ analysis_node_id: None,
+ })
+ }
+
+ fn run_dimensions(
+ scenario: &str,
+ duration_s: f64,
+ ) -> Result<BTreeMap<NonEmptyText, RunDimensionValue>, super::StoreError> {
+ Ok(BTreeMap::from([
+ (
+ NonEmptyText::new("benchmark_suite")?,
+ RunDimensionValue::String(NonEmptyText::new("smoke")?),
+ ),
+ (
+ NonEmptyText::new("scenario")?,
+ RunDimensionValue::String(NonEmptyText::new(scenario)?),
+ ),
+ (
+ NonEmptyText::new("duration_s")?,
+ RunDimensionValue::Numeric(duration_s),
+ ),
+ ]))
+ }
}
diff --git a/docs/architecture.md b/docs/architecture.md
index 5db53fb..37f5c55 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -320,7 +320,7 @@ This projection is derived from canonical state and intentionally rebuildable.
These are intentionally cheap:
- `note.quick`, but only with explicit tags from the repo-local registry
-- `research.record`
+- `research.record`, optionally tagged into the same repo-local taxonomy
- generic `node.create` for escape-hatch use
- `node.annotate`
@@ -351,6 +351,13 @@ framing. The public server is a stable host. It owns initialization state,
replay policy, telemetry, and host rollout. Execution happens in a disposable
worker subprocess.
+Presentation is orthogonal to payload detail:
+
+- `render=porcelain|json`
+- `detail=concise|full`
+
+Porcelain is the terse model-facing surface, not a pretty-printed JSON dump.
+
### Host responsibilities
- own the public JSON-RPC session
@@ -436,6 +443,8 @@ Implemented tools:
- `project.bind`
- `project.status`
- `project.schema`
+- `schema.field.upsert`
+- `schema.field.remove`
- `tag.add`
- `tag.list`
- `frontier.list`
@@ -449,6 +458,12 @@ Implemented tools:
- `node.archive`
- `note.quick`
- `research.record`
+- `metric.define`
+- `metric.keys`
+- `metric.best`
+- `metric.migrate`
+- `run.dimension.define`
+- `run.dimension.list`
- `experiment.close`
- `skill.list`
- `skill.show`
@@ -464,7 +479,9 @@ Implemented resources:
### Operational tools
-`system.health` returns a typed operational snapshot:
+`system.health` returns a typed operational snapshot. Concise/default output
+stays on immediate session state; full detail widens to the entire health
+object:
- initialization state
- binding state
@@ -472,7 +489,7 @@ Implemented resources:
- current executable path
- launch-path stability
- rollout-pending state
-- last recorded fault
+- last recorded fault in full detail
`system.telemetry` returns cumulative counters:
@@ -482,6 +499,7 @@ Implemented resources:
- retries
- worker restarts
- host rollouts
+- last recorded fault
- per-operation counts and last latencies
### Rollout model
@@ -507,6 +525,8 @@ Current commands:
- `init`
- `schema show`
+- `schema upsert-field`
+- `schema remove-field`
- `frontier init`
- `frontier status`
- `node add`
@@ -515,16 +535,29 @@ Current commands:
- `node annotate`
- `node archive`
- `note quick`
+- `tag add`
+- `tag list`
- `research add`
+- `metric define`
+- `metric keys`
+- `metric best`
+- `metric migrate`
+- `dimension define`
+- `dimension list`
- `experiment close`
- `mcp serve`
+- `ui serve`
- hidden internal `mcp worker`
- `skill list`
- `skill install`
- `skill show`
The CLI is not the strategic write plane, but it is the easiest repair and
-bootstrap surface.
+bootstrap surface. Its naming is intentionally parallel but not identical to
+the MCP surface:
+
+- CLI subcommands use spaces such as `schema upsert-field` and `dimension define`
+- MCP tools use dotted names such as `schema.field.upsert` and `run.dimension.define`
## Bundled Skill