swarm repositories / source
aboutsummaryrefslogtreecommitdiff
path: root/assets/codex-skills/mcp-bootstrap/references/bootstrap-fresh.md
blob: aefe25d4e8611ebe2a163c7cb4b841cb80e1114b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# Fresh Bootstrap

Use this when the MCP can still adopt the hard posture directly.

## 1. Durable host, disposable worker

Long-lived MCPs should separate durable session ownership from fragile backend
execution.

- The host owns the MCP transport, initialization state, request IDs, replay
  policy, rollout, and user-facing error shaping.
- The worker owns backend runtimes, backend-specific retries, and tool
  execution.
- Use `libmcp`'s host-session kernel and snapshot-file handoff instead of
  rolling custom initialize seed and reexec glue.

If the worker dies, the session should survive.

## 2. Replay as a typed contract

Every request surface needs an explicit replay class:

- `Convergent`
- `ProbeRequired`
- `NeverReplay`

Do not add blanket retry or replay logic. The replay class belongs in code, not
in scattered comments.

## 3. Typed faults

Represent failures as operational faults with recovery semantics.

Baseline classes:

- transport
- process
- protocol
- timeout
- downstream response
- resource

Faults should flow through health, telemetry, and user-facing shaping.

## 4. Porcelain by default

Nontrivial tools should default to `render=porcelain`.

Porcelain should be:

- line-oriented
- deterministic
- bounded
- summary-first

Structured `render=json` should remain available.

Use library rendering helpers where possible. Do not default to pretty-printed
JSON dumps and call that porcelain.

## 5. Boundary normalization

Normalize model-facing input where it is clearly safe:

- field aliases
- integer-like strings
- `file://` URIs
- stable path style controls

The goal is to eliminate trivial friction, not to hide real ambiguity.

## 6. Health and telemetry

Ship explicit operational tooling:

- health snapshot
- telemetry snapshot
- append-only event telemetry

Do this before feature sprawl, not after the first outage.

## 7. Test the failure posture

Build fake runtimes and integration tests that exercise:

- crash recovery
- replay legality
- rollout or restart churn
- model-facing output shaping
- routing correctness where the backend is root-sensitive