Model Self-Defense — security-aware system instructions per trust level

2026-03-23

security

What We Built

Security-aware system instruction prepending with 4 trust levels. Models become aware of attack patterns and flag suspicious input via structured markers.

Lockstep Checklist

[x] API: Self-defense module + response parsing
[x] Tests: 14 tests
[x] Docs: Ship log