New model

Chatterbox: An Open Source Breakthrough in Speech Synthesis

The Canadian startup Resemble AI recently introduced Chatterbox, its first open-source TTS (Text-to-Speech) model. Distributed under the MIT license, this voice cloning model positions itself as a credible alternative to proprietary market solutions, while introducing unprecedented features for an open-source model.

STStephane Nachez · ·2 min
Chatterbox: An Open Source Breakthrough in Speech Synthesis
Contents
The Canadian startup Resemble AI recently introduced Chatterbox, its first open-source TTS (Text-to-Speech) model. Distributed under the MIT license, this voice cloning model positions itself as a credible alternative to proprietary market solutions, while introducing unprecedented features for an open-source model.
Chatterbox is based on a 0.5 billion parameter architecture, trained on 500,000 hours of cleaned data. 
Key model features:
  • Zero-Shot Voice Cloning: With just a few seconds of reference audio, Chatterbox can mimic any voice without requiring additional training;
  • Emotion Control: Unlike other speech synthesis models, Chatterbox allows the adjustment of the emotional intensity of speech, ranging from a monotone to dramatic expressiveness, according to user needs;
  • Real-Time Speech Synthesis: Thanks to alignment-based generation, the model operates faster than real-time inference, making it ideal for voice assistants, video games, and interactive applications.
  • Security Watermark: Every generated audio file includes a perceptual watermark (PerTh Watermarker), ensuring transparency and traceability of the generated content.
The use of Chatterbox is simplified thanks to a dedicated Python library (chatterbox-tts), compatible with CUDA. The model can be initialized locally or from pre-trained models. Developers can also provide custom voice samples (audio prompts) to adjust style or target voice.
Resemble AI compared Chatterbox to proprietary market models.


Chatterbox vs Competition

Feature
Chatterbox
ElevenLabs
Google TTS
Azure TTS
License
MIT (Free)
Proprietary
Proprietary
Proprietary
Emotion Control
✅ Advanced
✅ Basic
❌
❌
Latency
<200 ms
~300 ms
~400 ms
~500 ms
User Preference
63.75%
36.25%
N/A
N/A
Watermarking
✅ Integrated
❌
❌
❌
Voice Cloning
✅ Yes
✅ Yes
❌
✅ Limited
 
In a comparative test conducted by Podonos, listeners preferred Chatterbox in 63.75% of cases over the proprietary model from ElevenLabs, which is considered one of the market leaders.
Resemble AI provides a demonstration interface via Hugging Face (Gradio), allowing users to test the model without local installation. For more intensive or critical uses, the company offers a commercial version of the TTS engine with latency below 200 ms.
 
 
ST
Stephane Nachez

ActuIA editorial team — news, data and analysis on artificial intelligence for decision-makers.

Actors mentioned
REResemble AI
HUHugging Face
ELElevenLabs
GOGoogle
The ActuIA Weekly

Subscription confirmed, see you soon!