Looking for High-Quality Open-Source Local TTS That’s Faster Than IndexTTS2

Posted by TomNaughtyy@reddit | LocalLLaMA | View on Reddit | 11 comments

Me and my cousin have been using IndexTTS2 for a while and really like the voice quality, it sounds natural and expressive. The only issue is that it’s slow. He’s getting around 1.6 RTF on his 3090, which makes it hard to generate longer audio efficiently (we work with long audio, not real-time use).

We’ve also tried Kokoro TTS and CosyVoice 2. Kokoro is super fast, but most of the voices sound too synthetic or “AI-like” for our needs. One voice we actually liked was “Nicole” in Kokoro, it has a more natural and calm tone that works well for us. CosyVoice 2 had better expressiveness and sounded promising, but it had a habit of changing words or pronouncing them weirdly, which broke the consistency.

We’re only interested in open-source models. No commercial or cloud APIs.

A few things to note: We’re not planning to use emotion vectors, style tokens, or any prompt engineering tricks, just clean, straightforward narration. We’re on strong hardware (3090 and 4090), so GPU resources aren’t a problem. Just want something with good voice quality that runs faster than IndexTTS2 and ideally has at least one solid voice that sounds natural.

Any models or voices you recommend?
Thanks