Cohere released Transcribe, a 2B-parameter open-source ASR model beating major rivals on Hugging Face's leaderboard with a 5.42 WER.
Cohere launched Transcribe, its first voice model — a 2B-parameter open-source automatic speech recognition model available via API for free and on its Model Vault platform. It supports 14 languages and processes 525 minutes of audio per minute, outperforming ElevenLabs Scribe v2, IBM Granite 4.0 1B, and Zoom Scribe v1 on the Hugging Face Open ASR leaderboard with an average WER of 5.42. Human evaluators gave it a 61% win rate across accuracy, coherence, and usability — though it underperforms on Portuguese, German, and Spanish. Cohere plans to integrate Transcribe into its enterprise agent platform, North.
Transcribe is a 2B-parameter model that runs on consumer GPUs and processes audio at 525x real-time — a rare combination of speed, cost, and deployability. Available via Cohere's API for free today, it undercuts paid ASR providers on both price and benchmark performance. The one caveat: WER on Portuguese, German, and Spanish lags behind competitors, so validate against your target language corpus before replacing your current provider.
Pull Transcribe via Cohere's API this week and benchmark it against your current ASR provider on a 10-minute real-world audio sample — if WER is within 1 point and latency holds, you've eliminated your transcription cost line entirely.
Go to dashboard.cohere.com, sign in, and grab your API key from the API Keys tab
Tags
Sources
Also today
Signals by role
Also today
Tools mentioned