Audio API — TTS + transcription with provider routing and cost tracking
2026-03-22
What We Built
Two OpenAI-compatible audio endpoints:
POST /v1/audio/speech— Text-to-speech. Supports tts-1/tts-1-hd models, 6 voices, 6 output formats, configurable speed. Returns binary audio with proper Content-Type.POST /v1/audio/transcriptions— Audio transcription (Whisper-compatible). Multipart file upload, 25MB limit, multiple output formats (json/text/srt/vtt).
Both route through ModelRouter with Guardian cost tracking (per-character for TTS, per-minute for transcription), BYOK support, and budget enforcement.
Lockstep Checklist
- [x] API Routes:
src/api/routes/audio.ts+ mount inserver.ts - [x] TS SDK:
audio.speech()+audio.transcriptions() - [x] Python SDK:
audio.speech()+audio.transcriptions()sync + async - [x] MCP Schemas: N/A — not agent-facing
- [x] Master Record: N/A — auto-updated by llms-full.txt