Tried to build a local voice cloning audiobook pipeline for Bulgarian — XTTS-v2 sounds Russian, Fish Speech 1.5 won't load on Windows. Anyone solved Cyrillic TTS locally?
Posted by Binqta@reddit | LocalLLaMA | View on Reddit | 9 comments
Hi Everyone,
I just tried this with the help of Claude couse I am not so familiar with CMD and Powershell etc.
Tried to build a local Bulgarian audiobook voice cloner — here's what actually happened
Spent a full day trying to clone my voice locally and use it to read a book in Bulgarian. Here's the honest breakdown.
My setup: RTX 5070 Ti, 64GB RAM, Windows 11
Attempt 1: XTTS-v2 (Coqui TTS)
Looked promising — voice cloning from just 30 seconds of audio, runs locally, free. Got it installed after fighting some transformers version conflicts. Generated audio successfully.
Result: sounds Russian. Not even close to Bulgarian. XTTS-v2 officially supports 13 languages and Bulgarian isn't one of them. Using language="ru" is the community workaround but the output is clearly Russian-accented. Also the voice similarity to my actual voice was poor regardless of language.
Attempt 2: Fish Speech 1.5
More promising on paper — trained on 80+ languages including Cyrillic scripts, no language-specific preprocessing needed. Got it installed. Still working through some model loading issues on Windows.
What made everything harder than it should be:
The RTX 5070 Ti (Blackwell architecture) isn't supported by stable PyTorch yet. Had to use nightly builds. Every single package install would silently downgrade PyTorch back to 2.5.1, breaking GPU support. Had to force reinstall the nightly after almost every step.
Bottom line so far:
There is no good free local TTS solution with voice cloning for Bulgarian right now. ElevenLabs supports it natively but it's paid beyond 10k characters. If anyone has actually solved this I'd love to know.
I aprecciate every help or suggestion, what software I can use to create my own audiobooks with good sounding cloned voice.
I tried also Elevenlabs, but they want so much money for creating one small book, I cant imagine what 1 book of 1000 pages would cost.
Its all for own purpose use. Not selling or sharing.
Thanks a lot. x.o.x.o...
Maximum_Ad64@reddit
I've searched a lot and I found this. It gets the job done for me:
https://huggingface.co/beleata74/BgTTS-38M-V2
i88i8i8y@reddit
Try this one
https://github.com/Kugelaudio/kugelaudio-open
https://huggingface.co/spaces/multimodalart/kugelaudio
Binqta@reddit (OP)
Just tried, you know isnt working for audiobook.
Sliouges@reddit
няма добър клонинг, и пазара в България е за съжаление много свит за "state of the art" продукт. с други думи, нема пари...
Binqta@reddit (OP)
Хареса ми, даже доста, но за лични цели като моята в прекалено скъпо, за инстаграм контент може би си заслужава.
Sliouges@reddit
Проверих. fish.audio току що го отвориха (S2 Pro) но за съжаление ударенията звучат ужасно. Ей сега го пробвах. По добре от предишния но това е жалко. Нямам други идеи.
Binqta@reddit (OP)
Давам ти пример, за Том 1ви на Чамкория. 100к на втория тиър не стигат дори за прочитането на всички 4/5 глави, а за целия експорт изискват още 400к големи или там както се нарича, намирам го за кощунство.
Jealous-Astronaut457@reddit
За съжаление няма читав TTS за Български език, освен платения elevenlabs. Мисля, че F5-TTS може да е добра основа за обучение на Български и след това вече клониране.
Binqta@reddit (OP)
Пробвах, всичко което gtp и Клауд ми предложиха, резултата е ужасен.