ECS deploy workflow now pins all 8 provider API keys (closes silent-disable class)

2026-05-08

infraecsci-cdingestors

What We Built

R23 closure verification surfaced a deploy-pipeline gap: PR #236 merged the Groq catalog ingestor code, the ECS deploy completed successfully, and /v1/intelligence/status reported only 7 ingestors registered. CW Logs explained why:

[router/model-router-init] Groq catalog ingestor not registered: GROQ_API_KEY not set

The ingestor's registration is gated on process.env.GROQ_API_KEY being present at boot. But GROQ_API_KEY was never wired into the ECS task definition's secrets array — the 7 prior provider keys (anthropic / openai / deepseek / google / perplexity / xai / moonshot) had been added to ECS task definitions manually, BEFORE the workflow's ensure_secret list became the deploy-time source of truth. New providers fell into a silent-disable hole.

Two coupled fixes:

  1. Manual rev 753: added GROQ_API_KEY (pointing to the new

brainstorm-router/production/groq-api-key secret) to the active task def. Service updated to 753; /v1/intelligence/status now reports 8/8 ingestors.

  1. Workflow hardening (this PR): pinned all 8 provider API

keys in the ecs-deploy.yml ensure_secret list. Future deploys will idempotently re-add any key that drifts out of the active task def, making the workflow authoritative and preventing the silent- disable class entirely.

Why It Matters

The silent-disable failure mode is the worst kind of catalog drift: the ingestor scheduler dutifully registers 7 of 8 ingestors, the service logs "not registered" once at boot, and unless an operator checks /v1/intelligence/status against the expected provider count, the gap is invisible. Pinning the keys in ensure_secret means a future operator could roll back the task def to revision 1 and the next CI deploy would re-add every provider key.

The 7 prior provider keys were a latent risk — same class of silent disable for any of them if a manual ECS rollback skipped a revision that had them. The fix preempts that risk too.

How It Works

SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret ANTHROPIC_API_KEY arn:...)"
SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret OPENAI_API_KEY arn:...)"
SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret DEEPSEEK_API_KEY arn:...)"
SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret GOOGLE_API_KEY arn:...)"
SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret PERPLEXITY_API_KEY arn:...)"
SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret XAI_API_KEY arn:...)"
SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret MOONSHOT_API_KEY arn:...)"
SECRET_FILTERS="$SECRET_FILTERS | $(ensure_secret GROQ_API_KEY arn:...)"

ensure_secret is idempotent — it only appends an entry if the secret name doesn't already exist in the container's secrets array. So this is a no-op on a healthy task def and a self-heal on a drifted one.

The Numbers

  • ensure_secret entries: 6 → 14 (8 new provider keys)
  • Production /v1/intelligence/status ingestor count: 7 → 8 (verified

post-rev-753 deploy)

  • 0 new code surface — workflow YAML only

Lockstep Checklist

  • [x] API Routes: unchanged
  • [x] TS SDK: unchanged
  • [x] Python SDK: unchanged
  • [x] MCP Schemas: unchanged
  • [x] Master Record: unchanged