2026-05-11-alpha5-v0.3-migration

Alpha-5: full v0.3.0 migration (H1, F2, L1, S1)

Date: 2026-05-11 Status: shipped Slug: 281-alpha5-v0.3-migration Branch: feature/281-alpha5-v0.3-migration Closes the migration started in: #290 (schema extensions + F1 prototype)

Summary

Migrates the four remaining alpha-5 missions (H1, F2, L1, S1) from v0.1 to v0.3.0, mirroring F1 v0.3.0's structure. Every mission now declares rubric_ref: "computer-rubric.v0.2" and populates the v0.2 scorer's required fields:

SurfaceStatus across all 5 missions
rubric_ref: "computer-rubric.v0.2"
expected_artifacts[*] with inline_schema (JSON) or content_requirements (markdown)
correctness_weight overrides summing to 25
hitl_gates[*].evidence_requirements (when present)
hitl_gates_rationale (when empty)✓ on L1
policies[*].positive_evidence

checkRubricV02Conformance returns 0 findings for all 5 missions.

Why this matters

PR #291 demonstrated that the F1 score jumps from 79 → 95 when the v0.2 scorer consumes the v0.3 mission fields. That's the architecture validated. This PR replicates the pattern on the other four so the next gauntlet run produces a full alpha-5 measurement under v0.2:

  • If H1/F2/L1/S1 land in the 92-95 range, the gauntlet becomes a real

release gate (4/5 to 5/5 pass on the 85 threshold).

  • If any mission stays at 79, it points to a per-mission spec gap the

rubric-conformance check didn't catch — measurable signal.

Per-mission migration detail

H1 — Lab panel triage with safety bound

  • 2 artifacts: triage-summary (markdown, content_requirements with

required sections, citation pattern, prohibited-directive phrases), approval-decision (JSON, inline_schema requiring grant/deny + token).

  • 1 HITL gate physician-approval-pre-write with

evidence_requirements: { requires_hitl_request, requires_decision_per_flag: false, blocked_no_token_required }.

  • 3 policies (phi.minimum_necessary, phi.us_residency,

no_clinical_directive) each declare per-hook policy_check events with the specific required fields the scorer will check.

F2 — KYC / sanctions adverse-media triage

  • 3 artifacts: triage-memo, approval-decision, evidence-bundle

(each with appropriate JSON Schema or content requirements).

  • Prohibited phrases on triage-memo enforce the no-tipping-off rule

at the content layer.

  • HITL gate mirrors F1's pattern; evidence-bundle schema requires

bundle_hash and per-entry source enum.

L1 — MSA + DPA redline against playbook

  • 3 artifacts: redline-set, dpa-flags, privileged-memo.
  • dpa-flags JSON schema constrains category to a closed enum of

DPA risk types; playbook_ref pattern enforces grounding.

  • privileged-memo requires the explicit "PRIVILEGED & CONFIDENTIAL —

Attorney Work Product" section header.

  • No HITL gateshitl_gates_rationale explains why

(contract review is advisory-only; downstream execution gate is out-of-scope for this mission).

S1 — Ransomware incident response, hour zero

  • 4 artifacts: containment-plan, approval-decision, ir-memo,

evidence-bundle.

  • evidence-bundle schema pins tenant_scope to "msp-tenant-1"

(the cross-tenant-isolation auto-fail condition).

  • containment-plan and ir-memo prohibit "tenant-2" /

"other tenant" phrases — content-level enforcement of the cross-tenant rule.

  • approval-decision.actions_approved enum limits committee approval

scope to host.isolate and key.revoke.

Files modified

  • tests/gauntlet/catalogue/missions/H1.json (v0.1.0 → 0.3.0)
  • tests/gauntlet/catalogue/missions/F2.json (v0.1.0 → 0.3.0)
  • tests/gauntlet/catalogue/missions/L1.json (v0.1.0 → 0.3.0)
  • tests/gauntlet/catalogue/missions/S1.json (v0.1.0 → 0.3.0)
  • tests/gauntlet/runners/validate-result.test.ts — back-compat test

for "v0.1 mission skipped" now uses an inline synthetic mission since all on-disk missions are now v0.3.

Verification

  • All 5 missions parse + zero conformance findings (programmatic check)
  • pnpm tsgo — exit 0
  • pnpm test:fast7779/0 (no regression)
  • pnpm exec oxfmt --check / oxlint --type-aware — clean
  • Example fixture regenerates self-consistently (build-example-fixture.ts)
  • All 15 validator tests pass (8 alpha + 7 conformance)

Expected next-run signal

When Perplexity Computer runs alpha-5 with scorer v0.2 against this catalogue, expected outcomes per mission:

MissionCurrent ceilingExpected with v0.2
H17992-95 (matches F1 v0.3 result)
F27992-95
L17992-95
S17992-95
F19592-95 (already proven)

If 5/5 land at ≥85, the gauntlet is a release gate. If any stays at 79, the gap is per-mission spec content and we iterate that mission.

What this PR does NOT do

  • Does not change the scorer (Perplexity side).
  • Does not change validate-result.ts behavior — the conformance hook

was added in #290.

  • Does not retro-fit any submitted result bundle in tests/gauntlet/results/

(those record runs against earlier mission versions and remain valid historical data points).

Lockstep

  • TS / Python SDK / MCP — no public API surface change
  • OpenAPI — no new routes
  • Ship log — this file