[RELEASE] - Finally, my first TTS model is out! 🎙️ Flare-TTS 28M
Posted by LH-Tech_AI@reddit | LocalLLaMA | View on Reddit | 25 comments
Hey r/LocalLLaMA !
I am back with a new model, and it's something special today 😃
It's Flare-TTS 28M, my first text to speech (TTS) model trained completely from scratch on a single A6000 GPU for \~24 hours, \~300 epochs and the full LJSpeech dataset!
Link to the HF model: https://huggingface.co/LH-Tech-AI/Flare-TTS-28M
Example result:
https://cdn-uploads.huggingface.co/production/uploads/697f2832c2c5e4daa93cece7/vluuHSnp9Ietk7Uk1-hvG.mpga
It speaks english, but it still sounds a bit robotish 😂
You can use if you want - it's free and open-source 😃
Have fun ❤️
LegacyRemaster@reddit
I also train LLMs and I know how much effort it takes! Great job!
LH-Tech_AI@reddit (OP)
Ok. Thanks.
Want to joing our CompactAI Discord Server? We are all people enjoying small models.
Link: https://discord.gg/y2jTct6Cxv
Please feel free join! :-)
crantob@reddit
Yeah but discord is a steaming hot pile of 拉屎
yoop001@reddit
Lovely! Can you do onnx+multi langual next
LH-Tech_AI@reddit (OP)
Thanks! :-)
I tried to export ONNX format but it somehow didn't work... :-/ Sorry...
Multilingual support would be even more complex and would need more datasets in different languages...but maybe v2 or v3 will do that... :-)
yoop001@reddit
No worries at all The project already looks super promising. Looking forward to seeing where v2/v3 goes.
LH-Tech_AI@reddit (OP)
Happy about hearing that :)
Apart_Boat9666@reddit
Can you give short explanation and minimum specs needed to train model from scratch?
LH-Tech_AI@reddit (OP)
Thanks for your interest! :-)
You can read everything in the HF repo: https://huggingface.co/LH-Tech-AI/Flare-TTS-28M
You will need a A6000 GPU for example and \~24 hours time and need to download all .py scripts from the repo first, then run:
Awwtifishal@reddit
That's actually a really cool kind of robotic voice, in my opinion.
LH-Tech_AI@reddit (OP)
Thank you :-)
autonomousdev_@reddit
Tested on some podcast transcripts yesterday. Works pretty decent for how small it is. But man, 8 seconds for a 30 second clip on CPU is brutal. You guys gonna do an ONNX version? Wanna run this on my phone or something.
LH-Tech_AI@reddit (OP)
Okay. Hey, thanks for your feedback.
ONNX isn't planned yet, but i can do it. Will do it tomorrow, put in on HF and also put an inference code for the ONNX model there, alright?
LH-Tech_AI@reddit (OP)
errr... it seems to be NOT possible to export and ONNX file, sorry... :-/
Maybe you can look yourself for converting it, plz?
No_Hunter_7786@reddit
That's impressive for 28M parameters and only 24 hours of training! Quality will definitely improve with more data and epochs. What architecture did you use for the vocoder?
LH-Tech_AI@reddit (OP)
I sadly forgot to use a vocoder 😭😂
v2 will add one :-)
Thanks for the nice feedback.
2Norn@reddit
new rival for elevenlabs!
LH-Tech_AI@reddit (OP)
Haha, not really, but - a proof of concept of getting there :D
devvie@reddit
I think it's awesome. Creepy.
LH-Tech_AI@reddit (OP)
Thanks. Cool :-)
CoUsT@reddit
Love your enthusiasm and positive energy. Keep it up!
LH-Tech_AI@reddit (OP)
Thanks 🙏🏻👍🏻 I will definitely keep creating models. See all my models here: https://huggingface.co/LH-Tech-AI/ If you want
Connect-Bid9700@reddit
good
LH-Tech_AI@reddit (OP)
Thanks ❤️😊
LH-Tech_AI@reddit (OP)
And hey, v2 is definetely coming soon... 😃