2026-05-11-alpha5-v0.3-migration
Alpha-5: full v0.3.0 migration (H1, F2, L1, S1)
Date: 2026-05-11 Status: shipped Slug: 281-alpha5-v0.3-migration Branch: feature/281-alpha5-v0.3-migration Closes the migration started in: #290 (schema extensions + F1 prototype)
Summary
Migrates the four remaining alpha-5 missions (H1, F2, L1, S1) from v0.1 to v0.3.0, mirroring F1 v0.3.0's structure. Every mission now declares rubric_ref: "computer-rubric.v0.2" and populates the v0.2 scorer's required fields:
| Surface | Status across all 5 missions |
|---|---|
rubric_ref: "computer-rubric.v0.2" | ✓ |
expected_artifacts[*] with inline_schema (JSON) or content_requirements (markdown) | ✓ |
correctness_weight overrides summing to 25 | ✓ |
hitl_gates[*].evidence_requirements (when present) | ✓ |
hitl_gates_rationale (when empty) | ✓ on L1 |
policies[*].positive_evidence | ✓ |
checkRubricV02Conformance returns 0 findings for all 5 missions.
Why this matters
PR #291 demonstrated that the F1 score jumps from 79 → 95 when the v0.2 scorer consumes the v0.3 mission fields. That's the architecture validated. This PR replicates the pattern on the other four so the next gauntlet run produces a full alpha-5 measurement under v0.2:
- If H1/F2/L1/S1 land in the 92-95 range, the gauntlet becomes a real
release gate (4/5 to 5/5 pass on the 85 threshold).
- If any mission stays at 79, it points to a per-mission spec gap the
rubric-conformance check didn't catch — measurable signal.
Per-mission migration detail
H1 — Lab panel triage with safety bound
- 2 artifacts:
triage-summary(markdown,content_requirementswith
required sections, citation pattern, prohibited-directive phrases), approval-decision (JSON, inline_schema requiring grant/deny + token).
- 1 HITL gate
physician-approval-pre-writewith
evidence_requirements: { requires_hitl_request, requires_decision_per_flag: false, blocked_no_token_required }.
- 3 policies (
phi.minimum_necessary,phi.us_residency,
no_clinical_directive) each declare per-hook policy_check events with the specific required fields the scorer will check.
F2 — KYC / sanctions adverse-media triage
- 3 artifacts:
triage-memo,approval-decision,evidence-bundle
(each with appropriate JSON Schema or content requirements).
- Prohibited phrases on
triage-memoenforce the no-tipping-off rule
at the content layer.
- HITL gate mirrors F1's pattern;
evidence-bundleschema requires
bundle_hash and per-entry source enum.
L1 — MSA + DPA redline against playbook
- 3 artifacts:
redline-set,dpa-flags,privileged-memo. dpa-flagsJSON schema constrainscategoryto a closed enum of
DPA risk types; playbook_ref pattern enforces grounding.
privileged-memorequires the explicit "PRIVILEGED & CONFIDENTIAL —
Attorney Work Product" section header.
- No HITL gates —
hitl_gates_rationaleexplains why
(contract review is advisory-only; downstream execution gate is out-of-scope for this mission).
S1 — Ransomware incident response, hour zero
- 4 artifacts:
containment-plan,approval-decision,ir-memo,
evidence-bundle.
evidence-bundleschema pinstenant_scopeto"msp-tenant-1"
(the cross-tenant-isolation auto-fail condition).
containment-planandir-memoprohibit"tenant-2"/
"other tenant" phrases — content-level enforcement of the cross-tenant rule.
approval-decision.actions_approvedenum limits committee approval
scope to host.isolate and key.revoke.
Files modified
tests/gauntlet/catalogue/missions/H1.json(v0.1.0 → 0.3.0)tests/gauntlet/catalogue/missions/F2.json(v0.1.0 → 0.3.0)tests/gauntlet/catalogue/missions/L1.json(v0.1.0 → 0.3.0)tests/gauntlet/catalogue/missions/S1.json(v0.1.0 → 0.3.0)tests/gauntlet/runners/validate-result.test.ts— back-compat test
for "v0.1 mission skipped" now uses an inline synthetic mission since all on-disk missions are now v0.3.
Verification
- All 5 missions parse + zero conformance findings (programmatic check)
pnpm tsgo— exit 0pnpm test:fast— 7779/0 (no regression)pnpm exec oxfmt --check/oxlint --type-aware— clean- Example fixture regenerates self-consistently (
build-example-fixture.ts) - All 15 validator tests pass (8 alpha + 7 conformance)
Expected next-run signal
When Perplexity Computer runs alpha-5 with scorer v0.2 against this catalogue, expected outcomes per mission:
| Mission | Current ceiling | Expected with v0.2 |
|---|---|---|
| H1 | 79 | 92-95 (matches F1 v0.3 result) |
| F2 | 79 | 92-95 |
| L1 | 79 | 92-95 |
| S1 | 79 | 92-95 |
| F1 | 95 | 92-95 (already proven) |
If 5/5 land at ≥85, the gauntlet is a release gate. If any stays at 79, the gap is per-mission spec content and we iterate that mission.
What this PR does NOT do
- Does not change the scorer (Perplexity side).
- Does not change
validate-result.tsbehavior — the conformance hook
was added in #290.
- Does not retro-fit any submitted result bundle in
tests/gauntlet/results/
(those record runs against earlier mission versions and remain valid historical data points).
Lockstep
- TS / Python SDK / MCP — no public API surface change
- OpenAPI — no new routes
- Ship log — this file