Just a month after introducing its TurboS reasoning model, Chinese conglomerate Tencent unveils the model it was based on: Hunyuan-T1. According to Tencent, thanks to large-scale post-training, its reasoning capability has been significantly expanded and aligned with human preferences, allowing it to compete with DeepSeek R1.
As of 2024, with V2, a high-performance language model offered at a competitive cost, DeepSeek sparked a price war in the Chinese AI market, prompting Tencent and its main competitors, including Zhipu AI, ByteDance, Alibaba, and Baidu, to lower their prices. As the technological war over AI between the United States and China continues to intensify since the emergence of R1, competition in the Middle Kingdom is also reaching new heights.
A Model Focused on Deep Reasoning
After Baidu and Alibaba, it is now the giant Tencent trying to establish itself in the Chinese market against DeepSeek.
T1 relies on the Hybrid-Transformer-Mamba MoE architecture, which, as its name suggests, combines the advantages of Transformers and Mamba models while integrating experts, allowing the limitation of active parameters. It is particularly suited for tasks requiring long-context processing and high precision. T1 thus reduces context losses and optimizes the use of computing resources while being twice as fast in decoding.
Thanks to post-training based on RLHF (Reinforcement Learning with Human Feedback), Tencent positions its model as a serious competitor to OpenAI o1 and DeepSeek R1.
According to evaluations shared by Tencent, Hunyuan-T1 displays performances:
- Superior or equivalent on some benchmarks (MMLU-pro, CEval, AIME, Zebra Logic);
- Particularly strong in mathematics, with an impressive score of 96.2 on MATH-500;
- Robust in engineering and coding, demonstrating advanced ability to solve technical problems.


Benchmarks provided by Tencent
To better understand
What is the Hybrid-Transformer-Mamba MoE architecture and why is it used in Hunyuan-T1?
The Hybrid-Transformer-Mamba MoE architecture combines the benefits of Transformers and Mamba models, incorporating experts to limit the number of active parameters. It is used to reduce context loss and optimize computing resources, particularly for tasks requiring long context processing and precision. This boosts efficiency and decoding speed, making Hunyuan-T1 competitive for complex tasks.