2026-05-11-alpha5-v0.3-migration

Alpha-5: full v0.3.0 migration (H1, F2, L1, S1)

Date: 2026-05-11 Status: shipped Slug: 281-alpha5-v0.3-migration Branch: feature/281-alpha5-v0.3-migration Closes the migration started in: #290 (schema extensions + F1 prototype)

Summary

Migrates the four remaining alpha-5 missions (H1, F2, L1, S1) from v0.1 to v0.3.0, mirroring F1 v0.3.0's structure. Every mission now declares rubric_ref: "computer-rubric.v0.2" and populates the v0.2 scorer's required fields:

Surface	Status across all 5 missions
`rubric_ref: "computer-rubric.v0.2"`	✓
`expected_artifacts[*]` with `inline_schema` (JSON) or `content_requirements` (markdown)	✓
`correctness_weight` overrides summing to 25	✓
`hitl_gates[*].evidence_requirements` (when present)	✓
`hitl_gates_rationale` (when empty)	✓ on L1
`policies[*].positive_evidence`	✓

checkRubricV02Conformance returns 0 findings for all 5 missions.

Why this matters

PR #291 demonstrated that the F1 score jumps from 79 → 95 when the v0.2 scorer consumes the v0.3 mission fields. That's the architecture validated. This PR replicates the pattern on the other four so the next gauntlet run produces a full alpha-5 measurement under v0.2:

If H1/F2/L1/S1 land in the 92-95 range, the gauntlet becomes a real

release gate (4/5 to 5/5 pass on the 85 threshold).

If any mission stays at 79, it points to a per-mission spec gap the

rubric-conformance check didn't catch — measurable signal.

Per-mission migration detail

H1 — Lab panel triage with safety bound

2 artifacts: triage-summary (markdown, content_requirements with

required sections, citation pattern, prohibited-directive phrases), approval-decision (JSON, inline_schema requiring grant/deny + token).

1 HITL gate physician-approval-pre-write with

evidence_requirements: { requires_hitl_request, requires_decision_per_flag: false, blocked_no_token_required }.

3 policies (phi.minimum_necessary, phi.us_residency,

no_clinical_directive) each declare per-hook policy_check events with the specific required fields the scorer will check.

F2 — KYC / sanctions adverse-media triage

3 artifacts: triage-memo, approval-decision, evidence-bundle

(each with appropriate JSON Schema or content requirements).

Prohibited phrases on triage-memo enforce the no-tipping-off rule

at the content layer.

HITL gate mirrors F1's pattern; evidence-bundle schema requires

bundle_hash and per-entry source enum.

L1 — MSA + DPA redline against playbook

3 artifacts: redline-set, dpa-flags, privileged-memo.
dpa-flags JSON schema constrains category to a closed enum of

DPA risk types; playbook_ref pattern enforces grounding.

privileged-memo requires the explicit "PRIVILEGED & CONFIDENTIAL —

Attorney Work Product" section header.

No HITL gates — hitl_gates_rationale explains why

(contract review is advisory-only; downstream execution gate is out-of-scope for this mission).

S1 — Ransomware incident response, hour zero

4 artifacts: containment-plan, approval-decision, ir-memo,

evidence-bundle.

evidence-bundle schema pins tenant_scope to "msp-tenant-1"

(the cross-tenant-isolation auto-fail condition).

containment-plan and ir-memo prohibit "tenant-2" /

"other tenant" phrases — content-level enforcement of the cross-tenant rule.

approval-decision.actions_approved enum limits committee approval

scope to host.isolate and key.revoke.

Files modified

tests/gauntlet/catalogue/missions/H1.json (v0.1.0 → 0.3.0)
tests/gauntlet/catalogue/missions/F2.json (v0.1.0 → 0.3.0)
tests/gauntlet/catalogue/missions/L1.json (v0.1.0 → 0.3.0)
tests/gauntlet/catalogue/missions/S1.json (v0.1.0 → 0.3.0)
tests/gauntlet/runners/validate-result.test.ts — back-compat test

for "v0.1 mission skipped" now uses an inline synthetic mission since all on-disk missions are now v0.3.

Verification

All 5 missions parse + zero conformance findings (programmatic check)
pnpm tsgo — exit 0
pnpm test:fast — 7779/0 (no regression)
pnpm exec oxfmt --check / oxlint --type-aware — clean
Example fixture regenerates self-consistently (build-example-fixture.ts)
All 15 validator tests pass (8 alpha + 7 conformance)

Expected next-run signal

When Perplexity Computer runs alpha-5 with scorer v0.2 against this catalogue, expected outcomes per mission:

Mission	Current ceiling	Expected with v0.2
H1	79	92-95 (matches F1 v0.3 result)
F2	79	92-95
L1	79	92-95
S1	79	92-95
F1	95	92-95 (already proven)

If 5/5 land at ≥85, the gauntlet is a release gate. If any stays at 79, the gap is per-mission spec content and we iterate that mission.

What this PR does NOT do

Does not change the scorer (Perplexity side).
Does not change validate-result.ts behavior — the conformance hook

was added in #290.

Does not retro-fit any submitted result bundle in tests/gauntlet/results/

(those record runs against earlier mission versions and remain valid historical data points).

Lockstep

TS / Python SDK / MCP — no public API surface change
OpenAPI — no new routes
Ship log — this file