Outcome feedback: every code compile/test/acceptance feeds Thompson sampling
2026-03-24
What We Built
POST /v1/feedback/:requestId — the outcome feedback endpoint that closes the Thompson sampling learning loop. When Brainstorm CLI reports whether the model's output actually worked (code compiled, tests passed, user accepted changes), BR converts these signals into a 0-1 reward and feeds it into ModelPerformanceTracker.record().
Five signal dimensions: code_compiled, tests_passed, user_accepted, tool_calls_correct, approach_worked. Each maps to the reward computation: success=1.0, partial=weighted average, failure=0.0 (with epsilon for partial signals).
Why It Matters
This is THE most important learning signal. Without feedback, Thompson sampling only sees latency and errors. With feedback, it learns which models actually produce code that compiles, passes tests, and gets accepted by users. Every brainstorm user's success/failure improves routing for everyone.
Lockstep Checklist
- [x] API Routes:
src/api/routes/feedback.ts— POST /v1/feedback/:requestId - [x] TS SDK:
packages/sdk-ts/src/resources/feedback.ts—feedback.report() - [x] Python SDK:
packages/sdk-py/src/brainstormrouter/resources/feedback.py—feedback.report() - [x] MCP Schemas:
br_feedback_reporttool added to manifest and server, dep wired