1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
|
# Fresh Bootstrap
Use this when the MCP can still adopt the hard posture directly.
## 1. Durable host, disposable worker
Long-lived MCPs should separate durable session ownership from fragile backend
execution.
- The host owns the MCP transport, initialization state, request IDs, replay
policy, rollout, and user-facing error shaping.
- The worker owns backend runtimes, backend-specific retries, and tool
execution.
- Use `libmcp`'s host-session kernel and snapshot-file handoff instead of
rolling custom initialize seed and reexec glue.
If the worker dies, the session should survive.
## 2. Replay as a typed contract
Every request surface needs an explicit replay class:
- `Convergent`
- `ProbeRequired`
- `NeverReplay`
Do not add blanket retry or replay logic. The replay class belongs in code, not
in scattered comments.
## 3. Typed faults
Represent failures as operational faults with recovery semantics.
Baseline classes:
- transport
- process
- protocol
- timeout
- downstream response
- resource
Faults should flow through health, telemetry, and user-facing shaping.
## 4. Porcelain by default
Nontrivial tools should default to `render=porcelain`.
`render` and detail are separate axes.
- `render=porcelain|json`
- `detail=concise|full`
Porcelain should be:
- line-oriented
- deterministic
- bounded
- summary-first
Structured `render=json` should remain available.
`json + concise` should be a structured summary, not merely the full payload in
different clothes.
Use library rendering helpers where possible. Do not default to pretty-printed
JSON dumps and call that porcelain.
## 5. Boundary normalization
Normalize model-facing input where it is clearly safe:
- field aliases
- integer-like strings
- `file://` URIs
- stable path style controls
The goal is to eliminate trivial friction, not to hide real ambiguity.
## 6. Health and telemetry
Ship explicit operational tooling:
- health snapshot
- telemetry snapshot
- append-only event telemetry
Do this before feature sprawl, not after the first outage.
## 7. Test the failure posture
Build fake runtimes and integration tests that exercise:
- crash recovery
- replay legality
- rollout or restart churn
- model-facing output shaping
- routing correctness where the backend is root-sensitive
|