Intent Detection Engine — 4 adversarial intent types with confidence scoring
2026-03-23
What We Built
Pattern-based adversarial intent detection with 4 intent types, 29 detection patterns, confidence scoring, and configurable sensitivity levels.
Intent types: manipulation (bypass instructions), exfiltration (extract system prompts), privilege_escalation (gain unauthorized access), social_engineering (urgency, authority).
4 sensitivity levels: low (0.8 threshold), medium (0.6), high (0.4), paranoid (0.2).
Lockstep Checklist
- [x] API: Intent detection module + risk score
- [x] Tests: 16 tests for all intent types, risk scoring, performance, sensitivity
- [x] Docs: Ship log