Good TTS model for AMD GPUs?

Posted by Alternative-Ad5958@reddit | LocalLLaMA | View on Reddit | 10 comments

Hello, just wanted to know which models are supported in AMD hardware (specifically a single R9700 which is a 9070 XT with more VRAM).

Tried Qwen3-TTS via koboldcpp, but it's almost as slow via Vulkan than on CPU.

I would like something with lower time to generate.
Streaming and voice cloning not needed but would be a plus. Same if chunking can be done automatically.

Thanks! I love this community.

[-]

Seismoforg@reddit

Look at "Faster qwen3 TTS" on Google. I used it to implement a Qwen TTS streaming Server for my own Experiments. It works very good and fast. The quality from Qwen is by far the best ive seen so far. F5 TTS is also good and fast but it lacks with Multilang Support. Google Faster qwen its really good

[-]

JamesEvoAI@reddit

I've had pretty good luck with Omnivoice on my Strix Halo machine:

https://sleepingrobots.com/dreams/omnivoice-strix-halo/

Config	Load Time	Avg Generation	Avg RTF
No warmup	29.9s	22.5s	5.97
+ warmup, fp16	31.6s	6.86s	1.45
+ warmup, fp16, HF cache mount	2.6s	6.90s	1.47

[-]

Alternative-Ad5958@reddit (OP)

Just tested it and with some small changes over https://github.com/k2-fsa/OmniVoice/issues/67 I made it work, works great, thanks!

I'm fairly busy at the moment but will try to do a PR for it.

[-]

Alternative-Ad5958@reddit (OP)

Thanks! I will try it.

[-]

Adventurous-Paper566@reddit

You should try parakeet v3.

It's very fast on an AMD CPU (7900X3D), so you could keep your VRAM for your LLMs.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

[-]

ttkciar@reddit

That appears to only be a STT model. Does it also do TTS?

[-]

Adventurous-Paper566@reddit

Well excuse me I was tired when I read your post, I only know kokoro that is fast even on a CPU but maybe bad for your needs and Qwen3 TTS that is very good but too slow even on a 5090 for a real time usage.

A good TTS needs more power than a LLM...

The problem with lights TTS is they struggles with multilanguages sentences, acronyms etc... With a light TTS you need to correct the TTS input text with a LLM to get perfects outputs :/

[-]