DramaBox - Most Expressive Voice model ever based on LTX 2.3
Posted by manmaynakhashi@reddit | LocalLLaMA | View on Reddit | 37 comments
The Most Expressive Voice Model.
Github: https://github.com/resemble-ai/DramaBox
Jeidoz@reddit
I am dumb dumb and GitHub's readme is not enough for me to run project. Can someone share more detailed instructions? I suppose I may need install some python dependencies, download and put somewhere models and toggle CUDA 13 usage?
toothpastespiders@reddit
I haven't tried it yet, but I'm always excited for this kind of thing just on a practical level for people with cancer or similar issues. People really don't get how horrible it is to have something so personal stolen by the thing killing you. It's not just about being able to say something out loud. It's about the personal nature of it being "your" voice, another thing that makes you who you are, being taken. Being able to clone your voice before its lost, or even reclaim it from old recordings, can be such a huge win just in terms of quality of life.
markeus101@reddit
Always happy to see new open source TTS. Would be nice if they could run on edge devices but i think if something like that existed it wont be open source
EndlessZone123@reddit
It feels like we hit 95% likeness but still robotic and low quality audio.
Euchale@reddit
Yeah, I feel like I am taking crazy pills when I hear people say this sounds great, its still far to echo-y
manmaynakhashi@reddit (OP)
use better reference audio and you'll get better sound, it can do voice cloning.
HelpfulHand3@reddit
It's based on LTX so it's going to sound bad even if it is expressive
nothing to do with reference voice - this is how the audio in the videos sound too
ghulamalchik@reddit
People talk about the fidelity, the expressiveness, you're talking about the quality. Both are true.
RAZA_2666R@reddit
Finally an open model that actually sounds like a real person emotes
silenceimpaired@reddit
Sounds like it was trained on the cartoon Joker from the Batman series.
Sanity_N0t_Included@reddit
Luke Skywalker?
manmaynakhashi@reddit (OP)
Nope it's just a reference for voice cloning , it's based on ltx2.3 so i think base model might have been trained on that, i have just repurposed it for audio only.
ShawnnSmuts90@reddit
close.. not there though
TheGoddessInari@reddit
Huh. Random Conan.
Guinness@reddit
/r/gonewildaudio (NSFW) would fucking love this.
dyeusyt@reddit
sounds perfect for indie game Devs to use this in their games.
Salt-Powered@reddit
Why would people who famously put their soul into their art, use a souless machine in their creation? AAA studios for sure though
wntersnw@reddit
Maybe finding and hiring voice actors, negotiating rates, licensing, budgeting, etc. feels more soulless than just creating what they want on their computer?
o5mfiHTNsH748KVq@reddit
Those types of people are vocal, but mot the majority of developers. You find more of that mentality around the indie dev scene and people that hold their products on a pedestal.
Anyway, the future of gaming is dynamic content on demand. Not all games, but this is an emerging genre.
Sixhaunt@reddit
lots of indie game devs have more specialized skillsets or enjoy certain aspects of game development more than others and so they would prefer to automate away the annoying parts and focus on the creative parts that they enjoy doing. Many people have a vision for something but not every single skillset required to pull it off.
iMakeSense@reddit
They say typing on the soulless machine they're on on this soulless website
polawiaczperel@reddit
Costs
manmaynakhashi@reddit (OP)
to save studio money ?
o5mfiHTNsH748KVq@reddit
Don’t let gamers see this comment, they’ll fucking panic.
Disposable110@reddit
No one has 24GB of free VRAM though, especially not when running a game on the side that already wants at least half of that.
manmaynakhashi@reddit (OP)
you can run it on 8 gb of vram , for indie game you can generate audios and use it in game , not literally running model inside the game lmao
Xp_12@reddit
they were probably thinking this was the other project. fwiw I got scenema running on a 5060ti 16gb, but it didn't sound great at int8 and was slow with CPU offload. I'll give your project a go.
manmaynakhashi@reddit (OP)
lot of usecases, more on the creative side then agentic side.
polawiaczperel@reddit
I remember your first post a while ago. Thanks for the code.
rm-rf-rm@reddit
yeah grateful as well as I wanted to find this again and given how many new projects i come across everyday I had no idea how I was going to find it, even in my github stars
manmaynakhashi@reddit (OP)
thank you for supporting.
ghulamalchik@reddit
Impressive fidelity, bad quality. I wish it didn't sound like they're speaking through a pipe.
Genebra_Checklist@reddit
it's comunnity only or can we use for monetized projects?
manmaynakhashi@reddit (OP)
i don't know it's based on ltx2.3 so i have to add the same license according to what they have mentioned , i think you will be fine untll you hit 10M , you can refer to the license , not a legal advice.
addictiveboi@reddit
This is AWESOME. I thought when I used LTX a couple of months ago "this has way better voice acting than TTS engines". You guys are awesome for actually creating this, and the fact that you have voice cloning aswell is just mind blowing to me. Gonna download this and try it in a little bit!!!
EveningIncrease7579@reddit
What about scenema audio, this is more lighter?
manmaynakhashi@reddit (OP)
yes much lighter if you offload gemma model you can do inference under 8 gb vram.