Intent Detection Engine — 4 adversarial intent types with confidence scoring

2026-03-23

security

What We Built

Pattern-based adversarial intent detection with 4 intent types, 29 detection patterns, confidence scoring, and configurable sensitivity levels.

Intent types: manipulation (bypass instructions), exfiltration (extract system prompts), privilege_escalation (gain unauthorized access), social_engineering (urgency, authority).

4 sensitivity levels: low (0.8 threshold), medium (0.6), high (0.4), paranoid (0.2).

Lockstep Checklist

  • [x] API: Intent detection module + risk score
  • [x] Tests: 16 tests for all intent types, risk scoring, performance, sensitivity
  • [x] Docs: Ship log