IndexTTS2, the most realistic and expressive text-to-speech model so far, has leaked their demos ahead of the official launch! And... wow!

Posted by pilkyton@reddit | LocalLLaMA | View on Reddit | 137 comments

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

https://arxiv.org/abs/2506.21619

Features:

Here's a few real-world use cases:

So how did it leak?

I can't wait to play around with this. Absolutely crazy how realistic these emotions are!