What is the best open source TTS model with multi language support?
Posted by Anxietrap@reddit | LocalLLaMA | View on Reddit | 22 comments
I'm currently developing an addon for Anki (an open source flashcard software). One part of my plan is to integrate an option to generate audio samples based on the preexisting content of the flashcards (for language learning). The point of it is using a local TTS model that doesn't require any paid services or APIs. To my knowledge the addons that are currently available for this have no option for a free version that still generate quite good audio.
I've looked a lot on HF but I struggle a bit to find out which models are actually suitable and versatile enough to support enough languages. My current bet would be XTTS2 due to the broad language support and its evaluation on leaderboards, but I find it to be a little "glitchy" at times.
I don't know if it's a good pick because it's mostly focussed on voice cloning. Could that be an issue? Do I have to think about some sort of legal concerns when using such a model? Which voice samples am I allowed to distribute to people so they can be used for voice cloning? I guess it wouldn't be user friendly to ask them to find their own 10s voice samples for generating audio.
So my question to my beloved local model nerds is:
Which models have you tested and which ones would you say are the most consistent and reliable?
Evening_Ad6637@reddit
Piper is very lightweight and supports a lot of languages.
Consistent_Ad_2309@reddit
Piper is archived on Oct 7, 2025
GeeKanJi@reddit
There are more and more high-quality open-source AI models emerging. One of the latest is VibeVoice, which delivers impressive audio quality but is currently limited to English and Chinese.
I’ve spent a lot of time testing different models. XTTS-v2 is a solid option, especially for its broad multilingual support, while Higgs Audio V2 stands out when it comes to reproducing emotions. There are also other alternatives depending on hardware constraints. I’ve put together an overview of the main open-source TTS models on my page:
https://cosmo-edge.com/best-open-source-tts-models-comparison/
Feedback is always welcome. TTS technology is evolving rapidly, as shown by the launch of VibeVoice. If you’ve experimented with other TTS models, I’d be very interested to hear your experiences.
Life_Current_4115@reddit
Google TTS and ElevenLabs are the best.
here are the details comparisons https://www.lavivienpost.com/comparison-of-text-to-speech-tts-models/
Frakur24@reddit
I just found this thread from a google search, because I am also looking for a local model to generate audio for my language learning Anki cards. Do post your addin if you get it working!
rbgo404@reddit
Check out this blog and hugging-face space, we have covered 12 latest OS-TTS models.
Here's a comparison table from the blog.
Demo Space: https://huggingface.co/spaces/Inferless/Open-Source-TTS-Gallary
Blog: https://www.inferless.com/learn/comparing-different-text-to-speech---tts--models-part-2
bafil596@reddit
xTTS V2 and Kokoro TTS are pretty good. There are also some other multi-lingual TTS models in this repo. You can try them out in Google Colab with the links.
JohnnyOR@reddit
Kokoro 82M is very lightweight and does something like 8 languages in its v1.0 release, but I think there are probably others by now, worth checking the TTS arenafor any promising leads
mocker_jks@reddit
I have tried it on English and Hindi languages , I have like 16 gb ram and mx450 gpu and it runs very fast on my stone age laptop and also the english voice is superb but the Hindi performance is mehh
JohnnyOR@reddit
Didn't IIT do a Hindi finetune of one of the big TTS models, like XTTS or F5 or something? I feel I remember seeing an ex-colleague post about it some months ago
mocker_jks@reddit
Oh now I remember, IITM has their ai4bharat tts which does support many languages!
Op can check this out
RickyRickC137@reddit
There's veena TTS for Indian languages.
Lazy-Pattern-5171@reddit
Kokoro hands down.
NullPointerJack@reddit
yeah, XTTS2 has a solid language range but leans voice cloning. i'd look at MozillaTTS if you want something more speech focused and open. piper is great for speed and low resource use but quality varies by language. for serious multi-lang consistency, ai4bharat's TTS models are worth testing. just double check model licenses if you're bundling voices.
Silver-Champion-4846@reddit
Mozilla TTS? Didn't that become Coqui TTS, which then became abandoned as the company shut down?
MaruluVR@reddit
GPT Sovits supports English, Chinese, Japanese, Korean and Cantonese with 0 shot voice cloning and custom voice fine tuning. You can make the voices sigh and laugh. It excels at Asian languages probably the best Japanese open source one out there, but not the best at English.
randomanoni@reddit
Piper is good. Otherwise Kaldi might be a good fit.
Ok_Needleworker_5247@reddit
You might want to explore MBROLA or eSpeak NG, which offer good multi-lang support and can work offline. Also, Mozilla TTS has been evolving well with diverse voice models. Consider checking licensing terms for each, especially if distributing voice samples. For distribution, AI models often need explicit permissions for voice cloning. This article might help in assessing model characteristics and compatibilities.
DeProgrammer99@reddit
Anki can use any installed TTS engine on Android; not sure about other OSes. I couldn't find a package for a TTS engine that's as good or better than Kokoro and supports Japanese, though. Would be awesome to see that.
Routine_Internal_771@reddit
(I wrote the code and didn't think people were going into the weeds with TTS)
AnkiDroid only uses the currently selected TTS engine for voice discovery
If this is causing you problems, please add an issue to our GitHub and we can fix it to use all system voices
Black-Mack@reddit
I think they meant a TTS model problem not an AnkiDroid problem.
In fact they mention AnkiDroid picking the system TTS as a feature not a bug.
Thank you for the work you've done to improve the app :)
utilitycoder@reddit
How do you develop an Anki add-on? I would be interested in the reverse speech and pronunciation scoring.