Good TTS model for AMD GPUs?
Posted by Alternative-Ad5958@reddit | LocalLLaMA | View on Reddit | 10 comments
Hello, just wanted to know which models are supported in AMD hardware (specifically a single R9700 which is a 9070 XT with more VRAM).
Tried Qwen3-TTS via koboldcpp, but it's almost as slow via Vulkan than on CPU.
I would like something with lower time to generate.
Streaming and voice cloning not needed but would be a plus. Same if chunking can be done automatically.
Thanks! I love this community.
Seismoforg@reddit
Look at "Faster qwen3 TTS" on Google. I used it to implement a Qwen TTS streaming Server for my own Experiments. It works very good and fast. The quality from Qwen is by far the best ive seen so far. F5 TTS is also good and fast but it lacks with Multilang Support. Google Faster qwen its really good
JamesEvoAI@reddit
I've had pretty good luck with Omnivoice on my Strix Halo machine:
https://sleepingrobots.com/dreams/omnivoice-strix-halo/
Alternative-Ad5958@reddit (OP)
Just tested it and with some small changes over https://github.com/k2-fsa/OmniVoice/issues/67 I made it work, works great, thanks!
I'm fairly busy at the moment but will try to do a PR for it.
Alternative-Ad5958@reddit (OP)
Thanks! I will try it.
Adventurous-Paper566@reddit
You should try parakeet v3.
It's very fast on an AMD CPU (7900X3D), so you could keep your VRAM for your LLMs.
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
ttkciar@reddit
That appears to only be a STT model. Does it also do TTS?
Adventurous-Paper566@reddit
Well excuse me I was tired when I read your post, I only know kokoro that is fast even on a CPU but maybe bad for your needs and Qwen3 TTS that is very good but too slow even on a 5090 for a real time usage.
A good TTS needs more power than a LLM...
The problem with lights TTS is they struggles with multilanguages sentences, acronyms etc... With a light TTS you need to correct the TTS input text with a LLM to get perfects outputs :/
Bite_It_You_Scum@reddit
Not really for gpu, as it runs entirely on two CPU cores, but you should look into Pocket TTS. It supports cloning, it's very fast (faster than realtime) and the quality is around XTTS-level.
RudeboyRudolfo@reddit
https://github.com/TrevorS/voxtral-mini-realtime-rs
ttkciar@reddit
Following this as well. I'm going to need a fast Linux/AMD friendly TTS solution, and have been neglecting this side of the tech stack.