[RELEASE] - Finally, my first TTS model is out! 🎙️ Flare-TTS 28M

Posted by LH-Tech_AI@reddit | LocalLLaMA | View on Reddit | 25 comments

Hey r/LocalLLaMA !

I am back with a new model, and it's something special today 😃

It's Flare-TTS 28M, my first text to speech (TTS) model trained completely from scratch on a single A6000 GPU for \~24 hours, \~300 epochs and the full LJSpeech dataset!

Link to the HF model: https://huggingface.co/LH-Tech-AI/Flare-TTS-28M

Example result:
https://cdn-uploads.huggingface.co/production/uploads/697f2832c2c5e4daa93cece7/vluuHSnp9Ietk7Uk1-hvG.mpga

It speaks english, but it still sounds a bit robotish 😂

You can use if you want - it's free and open-source 😃

Have fun ❤️

[-]

LegacyRemaster@reddit

I also train LLMs and I know how much effort it takes! Great job!

[-]

LH-Tech_AI@reddit (OP)

Ok. Thanks.

Want to joing our CompactAI Discord Server? We are all people enjoying small models.

Link: https://discord.gg/y2jTct6Cxv

Please feel free join! :-)

[-]

crantob@reddit

Yeah but discord is a steaming hot pile of 拉屎

[-]

yoop001@reddit

Lovely! Can you do onnx+multi langual next

[-]

LH-Tech_AI@reddit (OP)

Thanks! :-)

I tried to export ONNX format but it somehow didn't work... :-/ Sorry...

Multilingual support would be even more complex and would need more datasets in different languages...but maybe v2 or v3 will do that... :-)

[-]

yoop001@reddit

No worries at all The project already looks super promising. Looking forward to seeing where v2/v3 goes.

[-]

Apart_Boat9666@reddit

Can you give short explanation and minimum specs needed to train model from scratch?

[-]

pip install git+https://github.com/idiap/coqui-tts.git
sudo apt update && sudo apt install espeak -y
sudo apt install ffmpeg libavcodec-dev libavformat-dev libavutil-dev -y
pip install "coqui-tts[codec]"
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xjf LJSpeech-1.1.tar.bz2
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python3 train_glowtts.pypip install git+https://github.com/idiap/coqui-tts.gitsudo apt update && sudo apt install espeak -ysudo apt install ffmpeg libavcodec-dev libavformat-dev libavutil-dev -ypip install "coqui-tts[codec]"wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2tar -xjf LJSpeech-1.1.tar.bz2PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python3 train_glowtts.py

Awwtifishal@reddit

That's actually a really cool kind of robotic voice, in my opinion.

[-]

Tested on some podcast transcripts yesterday. Works pretty decent for how small it is. But man, 8 seconds for a 30 second clip on CPU is brutal. You guys gonna do an ONNX version? Wanna run this on my phone or something.

[-]

LH-Tech_AI@reddit (OP)

Okay. Hey, thanks for your feedback.

ONNX isn't planned yet, but i can do it. Will do it tomorrow, put in on HF and also put an inference code for the ONNX model there, alright?

[-]

LH-Tech_AI@reddit (OP)

errr... it seems to be NOT possible to export and ONNX file, sorry... :-/
Maybe you can look yourself for converting it, plz?

[-]

No_Hunter_7786@reddit

That's impressive for 28M parameters and only 24 hours of training! Quality will definitely improve with more data and epochs. What architecture did you use for the vocoder?

[-]