Outcome feedback: every code compile/test/acceptance feeds Thompson sampling

2026-03-24

performance-trackermodel-banditthompson-sampling

What We Built

POST /v1/feedback/:requestId — the outcome feedback endpoint that closes the Thompson sampling learning loop. When Brainstorm CLI reports whether the model's output actually worked (code compiled, tests passed, user accepted changes), BR converts these signals into a 0-1 reward and feeds it into ModelPerformanceTracker.record().

Five signal dimensions: code_compiled, tests_passed, user_accepted, tool_calls_correct, approach_worked. Each maps to the reward computation: success=1.0, partial=weighted average, failure=0.0 (with epsilon for partial signals).

Why It Matters

This is THE most important learning signal. Without feedback, Thompson sampling only sees latency and errors. With feedback, it learns which models actually produce code that compiles, passes tests, and gets accepted by users. Every brainstorm user's success/failure improves routing for everyone.

Lockstep Checklist

[x] API Routes: src/api/routes/feedback.ts — POST /v1/feedback/:requestId
[x] TS SDK: packages/sdk-ts/src/resources/feedback.ts — feedback.report()
[x] Python SDK: packages/sdk-py/src/brainstormrouter/resources/feedback.py — feedback.report()
[x] MCP Schemas: br_feedback_report tool added to manifest and server, dep wired