Model Self-Defense — security-aware system instructions per trust level

2026-03-23

security

What We Built

Security-aware system instruction prepending with 4 trust levels. Models become aware of attack patterns and flag suspicious input via structured markers.

Lockstep Checklist

  • [x] API: Self-defense module + response parsing
  • [x] Tests: 14 tests
  • [x] Docs: Ship log