Fundraising round

Solaria-3: Gladia leads in production audio, according to its own measurements

STStephane Nachez · ·4 min
Solaria-3: Gladia leads in production audio, according to its own measurements
Contents

The API transcription market has been shifting since 2024-2025 toward production audio — noisy meetings, accents, telephony — and Solaria-3, released by Gladia on June 10, 2026, formally acknowledges this shift through a deliberate trade-off: the model improves on real-world audio but drops by 36% on Multilingual LibriSpeech compared with Solaria-1. The Paris-based startup claims first place on Earnings22 Cleaned AA with a 6.4% WER, according to its own measurements.

This trade-off is explicit: Gladia keeps Solaria-1 running in parallel for broader multilingual use cases. The gains from Solaria-3 are not uniform across languages, according to Gladia (English -26%, German -3% on its internal audio).

A table that reads both ways

The figures published by Gladia show a clear shift in specialization. The model improves under the audio conditions typically found in call centers — 8 kHz telephony, multi-speaker meetings, non-native accents — and falls behind in lab-style conditions where Solaria-1 still holds the advantage. The table below reproduces the measurements published by Gladia on June 10, 2026 (WER = word error rate, the error rate on transcribed words).

Benchmark Audio condition Solaria-3 WER Reference Source
Earnings22 Cleaned AA financial / business speech 6.4% AssemblyAI Universal-2: 6.9% Gladia
Switchboard degraded 8 kHz telephony 33.9% ElevenLabs: 55.2% Gladia
Noisy audio background noise 1.4% Mistral Voxtral: 1.0% Gladia
Multilingual LibriSpeech studio read speech, multilingual 8.0% Solaria-1: 5.9% (+36%) Gladia
VoxPopuli Cleaned AA institutional / parliamentary audio 2.9% Solaria-1: 2.2% (+32%) Gladia

According to Gladia, Solaria-3 beats AssemblyAI Universal-2 on Earnings22 by 0.5 points (6.4% versus 6.9%) — a gap that should be interpreted within the typical noise margin of WER measurements. On Switchboard, the vendor presents its model as the only one in its in-house comparison to fall below 35%. The scope is narrower, however: Solaria-3 is optimized for five European languages (English, French, German, Spanish, Italian), while Solaria-1 is still said by Gladia to support more than 100 languages, including 42 exclusive ones. All these figures rely on Gladia’s internal dataset, owned and annotated in-house — it is not public, which makes third-party replication impossible for now.

What this trade-off says about the STT market for a B2B buyer

Earnings22, Switchboard, and VoxPopuli capture what a B2B buyer encounters every day: earnings calls, 8 kHz phone conversations, accented parliamentary speech. Since 2024-2025, the sector has been redefining itself around this second category, and Solaria-3 confirms that shift for Gladia, the Paris-based startup founded in 2022 and backed by a $16 million Series A round in October 2024.

The closest European competitor is Voxtral by Mistral AI, released in July 2024 and then iterated up to Voxtral Transcribe 2 in early 2026. Gladia chooses not to include it in its main comparison table, even though Voxtral outperforms it on noisy audio (1.0% versus 1.4% WER) in the details of its own publication. On compliance, Gladia highlights SOC 2 Type II, HIPAA, GDPR, and ISO 27001 certifications, with EU and US clusters — a sovereignty argument that should be weighed carefully: it applies to inference and customer data, not to training.

For a decision-maker selecting a transcription provider, the evaluation criterion therefore shifts with the market. A use case centered on meetings and call centers (close to Earnings22 and Switchboard) calls for testing Solaria-3; broader multilingual needs or clean audio (documentary transcription, institutional reading) point to Solaria-1 or a competitor. The point that will settle Gladia’s claim to the top spot comes down to one line: the publication, by a third-party evaluator, of WER measurements under the same audio conditions — Earnings22, Switchboard, noisy audio — including Voxtral, Whisper, and the APIs of major cloud providers absent from the in-house comparison.

ST
Stephane Nachez
subscriber

ActuIA editorial team — news, data and analysis on artificial intelligence for decision-makers.