The API transcription market has been shifting since 2024-2025 toward production audio — noisy meetings, accents, telephony — and Solaria-3, released by Gladia on June 10, 2026, formally acknowledges this shift through a deliberate trade-off: the model improves on real-world audio but drops by 36% on Multilingual LibriSpeech compared with Solaria-1. The Paris-based startup claims first place on Earnings22 Cleaned AA with a 6.4% WER, according to its own measurements.
This trade-off is explicit: Gladia keeps Solaria-1 running in parallel for broader multilingual use cases. The gains from Solaria-3 are not uniform across languages, according to Gladia (English -26%, German -3% on its internal audio).
A table that reads both ways
The figures published by Gladia show a clear shift in specialization. The model improves under the audio conditions typically found in call centers — 8 kHz telephony, multi-speaker meetings, non-native accents — and falls behind in lab-style conditions where Solaria-1 still holds the advantage. The table below reproduces the measurements published by Gladia on June 10, 2026 (WER = word error rate, the error rate on transcribed words).
| Benchmark | Audio condition | Solaria-3 WER | Reference | Source |
|---|---|---|---|---|
| Earnings22 Cleaned AA | financial / business speech | 6.4% | AssemblyAI Universal-2: 6.9% | Gladia |
| Switchboard | degraded 8 kHz telephony | 33.9% | ElevenLabs: 55.2% | Gladia |
| Noisy audio | background noise | 1.4% | Mistral Voxtral: 1.0% | Gladia |
| Multilingual LibriSpeech | studio read speech, multilingual | 8.0% | Solaria-1: 5.9% (+36%) | Gladia |
| VoxPopuli Cleaned AA | institutional / parliamentary audio | 2.9% | Solaria-1: 2.2% (+32%) | Gladia |
According to Gladia, Solaria-3 beats AssemblyAI Universal-2 on Earnings22 by 0.5 points (6.4% versus 6.9%) — a gap that should be interpreted within the typical noise margin of WER measurements. On Switchboard, the vendor presents its model as the only one in its in-house comparison to fall below 35%. The scope is narrower, however: Solaria-3 is optimized for five European languages (English, French, German, Spanish, Italian), while Solaria-1 is still said by Gladia to support more than 100 languages, including 42 exclusive ones. All these figures rely on Gladia’s internal dataset, owned and annotated in-house — it is not public, which makes third-party replication impossible for now.
What this trade-off says about the STT market for a B2B buyer
Earnings22, Switchboard, and VoxPopuli capture what a B2B buyer encounters every day: earnings calls, 8 kHz phone conversations, accented parliamentary speech. Since 2024-2025, the sector has been redefining itself around this second category, and Solaria-3 confirms that shift for Gladia, the Paris-based startup founded in 2022 and backed by a $16 million Series A round in October 2024.
The closest European competitor is Voxtral by Mistral AI, released in July 2024 and then iterated up to Voxtral Transcribe 2 in early 2026. Gladia chooses not to include it in its main comparison table, even though Voxtral outperforms it on noisy audio (1.0% versus 1.4% WER) in the details of its own publication. On compliance, Gladia highlights SOC 2 Type II, HIPAA, GDPR, and ISO 27001 certifications, with EU and US clusters — a sovereignty argument that should be weighed carefully: it applies to inference and customer data, not to training.
For a decision-maker selecting a transcription provider, the evaluation criterion therefore shifts with the market. A use case centered on meetings and call centers (close to Earnings22 and Switchboard) calls for testing Solaria-3; broader multilingual needs or clean audio (documentary transcription, institutional reading) point to Solaria-1 or a competitor. The point that will settle Gladia’s claim to the top spot comes down to one line: the publication, by a third-party evaluator, of WER measurements under the same audio conditions — Earnings22, Switchboard, noisy audio — including Voxtral, Whisper, and the APIs of major cloud providers absent from the in-house comparison.
