What is the best open source TTS model with multi language support?

Posted by Anxietrap@reddit | LocalLLaMA | View on Reddit | 22 comments

I'm currently developing an addon for Anki (an open source flashcard software). One part of my plan is to integrate an option to generate audio samples based on the preexisting content of the flashcards (for language learning). The point of it is using a local TTS model that doesn't require any paid services or APIs. To my knowledge the addons that are currently available for this have no option for a free version that still generate quite good audio.

I've looked a lot on HF but I struggle a bit to find out which models are actually suitable and versatile enough to support enough languages. My current bet would be XTTS2 due to the broad language support and its evaluation on leaderboards, but I find it to be a little "glitchy" at times.

I don't know if it's a good pick because it's mostly focussed on voice cloning. Could that be an issue? Do I have to think about some sort of legal concerns when using such a model? Which voice samples am I allowed to distribute to people so they can be used for voice cloning? I guess it wouldn't be user friendly to ask them to find their own 10s voice samples for generating audio.

So my question to my beloved local model nerds is:
Which models have you tested and which ones would you say are the most consistent and reliable?

[-]

Evening_Ad6637@reddit

Piper is very lightweight and supports a lot of languages.

Consistent_Ad_2309@reddit

Piper is archived on Oct 7, 2025

GeeKanJi@reddit

There are more and more high-quality open-source AI models emerging. One of the latest is VibeVoice, which delivers impressive audio quality but is currently limited to English and Chinese.

I’ve spent a lot of time testing different models. XTTS-v2 is a solid option, especially for its broad multilingual support, while Higgs Audio V2 stands out when it comes to reproducing emotions. There are also other alternatives depending on hardware constraints. I’ve put together an overview of the main open-source TTS models on my page:
https://cosmo-edge.com/best-open-source-tts-models-comparison/

Feedback is always welcome. TTS technology is evolving rapidly, as shown by the launch of VibeVoice. If you’ve experimented with other TTS models, I’d be very interested to hear your experiences.

Life_Current_4115@reddit

Google TTS and ElevenLabs are the best.

ElevenLabs: Supports 32 languages with over 3,000 voices, offering extensive flexibility for multilingual projects. Its advanced voice cloning and AI dubbing make it ideal for localized content creation, such as audiobooks or global media.
Google TTS: Supports a broad range of languages (likely exceeding 100, though exact counts vary by source), with robust SSML support for fine-tuning. Its extensive language coverage makes it a top choice for enterprise-grade, global applications.
Cartesia: Supports 15 languages with \~130 voices, suitable for many applications but less comprehensive than ElevenLabs or Google TTS.
Kokoro: Supports 8 languages with 54 voices, adequate for smaller-scale multilingual projects but limited compared to top performers.
Deepgram: Primarily supports English (with accents like British/Australian), with multilingual support in development. It is the least suitable for diverse language needs.
OpenAI TTS: Supports multiple languages (exact count unclear) but is constrained by only 10 voices, limiting its flexibility for multilingual applications.

here are the details comparisons https://www.lavivienpost.com/comparison-of-text-to-speech-tts-models/

Frakur24@reddit

I just found this thread from a google search, because I am also looking for a local model to generate audio for my language learning Anki cards. Do post your addin if you get it working!

rbgo404@reddit

Check out this blog and hugging-face space, we have covered 12 latest OS-TTS models.
Here's a comparison table from the blog.

Demo Space: https://huggingface.co/spaces/Inferless/Open-Source-TTS-Gallary
Blog: https://www.inferless.com/learn/comparing-different-text-to-speech---tts--models-part-2

bafil596@reddit

xTTS V2 and Kokoro TTS are pretty good. There are also some other multi-lingual TTS models in this repo. You can try them out in Google Colab with the links.

JohnnyOR@reddit

Kokoro 82M is very lightweight and does something like 8 languages in its v1.0 release, but I think there are probably others by now, worth checking the TTS arenafor any promising leads

mocker_jks@reddit

I have tried it on English and Hindi languages , I have like 16 gb ram and mx450 gpu and it runs very fast on my stone age laptop and also the english voice is superb but the Hindi performance is mehh

Didn't IIT do a Hindi finetune of one of the big TTS models, like XTTS or F5 or something? I feel I remember seeing an ex-colleague post about it some months ago

Oh now I remember, IITM has their ai4bharat tts which does support many languages!

Op can check this out

RickyRickC137@reddit

There's veena TTS for Indian languages.

Lazy-Pattern-5171@reddit

Kokoro hands down.

NullPointerJack@reddit

yeah, XTTS2 has a solid language range but leans voice cloning. i'd look at MozillaTTS if you want something more speech focused and open. piper is great for speed and low resource use but quality varies by language. for serious multi-lang consistency, ai4bharat's TTS models are worth testing. just double check model licenses if you're bundling voices.

Silver-Champion-4846@reddit

Mozilla TTS? Didn't that become Coqui TTS, which then became abandoned as the company shut down?

MaruluVR@reddit

GPT Sovits supports English, Chinese, Japanese, Korean and Cantonese with 0 shot voice cloning and custom voice fine tuning. You can make the voices sigh and laugh. It excels at Asian languages probably the best Japanese open source one out there, but not the best at English.

randomanoni@reddit

Piper is good. Otherwise Kaldi might be a good fit.

Ok_Needleworker_5247@reddit

You might want to explore MBROLA or eSpeak NG, which offer good multi-lang support and can work offline. Also, Mozilla TTS has been evolving well with diverse voice models. Consider checking licensing terms for each, especially if distributing voice samples. For distribution, AI models often need explicit permissions for voice cloning. This article might help in assessing model characteristics and compatibilities.

DeProgrammer99@reddit

Anki can use any installed TTS engine on Android; not sure about other OSes. I couldn't find a package for a TTS engine that's as good or better than Kokoro and supports Japanese, though. Would be awesome to see that.

Routine_Internal_771@reddit

(I wrote the code and didn't think people were going into the weeds with TTS)

AnkiDroid only uses the currently selected TTS engine for voice discovery

If this is causing you problems, please add an issue to our GitHub and we can fix it to use all system voices

Black-Mack@reddit

I think they meant a TTS model problem not an AnkiDroid problem.

In fact they mention AnkiDroid picking the system TTS as a feature not a bug.

Thank you for the work you've done to improve the app :)

utilitycoder@reddit

How do you develop an Anki add-on? I would be interested in the reverse speech and pronunciation scoring.