Model Self-Defense — security-aware system instructions per trust level
2026-03-23
What We Built
Security-aware system instruction prepending with 4 trust levels. Models become aware of attack patterns and flag suspicious input via structured markers.
Lockstep Checklist
- [x] API: Self-defense module + response parsing
- [x] Tests: 14 tests
- [x] Docs: Ship log