Audio API — TTS + transcription with provider routing and cost tracking

2026-03-22

apirouter

What We Built

Two OpenAI-compatible audio endpoints:

  • POST /v1/audio/speech — Text-to-speech. Supports tts-1/tts-1-hd models, 6 voices, 6 output formats, configurable speed. Returns binary audio with proper Content-Type.
  • POST /v1/audio/transcriptions — Audio transcription (Whisper-compatible). Multipart file upload, 25MB limit, multiple output formats (json/text/srt/vtt).

Both route through ModelRouter with Guardian cost tracking (per-character for TTS, per-minute for transcription), BYOK support, and budget enforcement.

Lockstep Checklist

  • [x] API Routes: src/api/routes/audio.ts + mount in server.ts
  • [x] TS SDK: audio.speech() + audio.transcriptions()
  • [x] Python SDK: audio.speech() + audio.transcriptions() sync + async
  • [x] MCP Schemas: N/A — not agent-facing
  • [x] Master Record: N/A — auto-updated by llms-full.txt