mtmd: add Gemma 4 audio conformer encoder support
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 10 comments
audio processing support for Gemma 4 models
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 10 comments
audio processing support for Gemma 4 models
andy2na@reddit
Would be amazing to somehow integrate this into home assistant voice assist as the STT
OpeningAd8687@reddit
Have you tried using the open source sesame ai software? For a natural voice
sersoniko@reddit
You can use the project wyoming_openai that is a middleware between the two protocols.
andy2na@reddit
currently do use that for parakeet, Ill mess with it and see if I can get it working and if its better than parakeet
ML-Future@reddit
We need a new benchmark for this.
sterby92@reddit
When will the change land in llama.cpp? Looking forward to use this for my agent setup and get rid of whisper :)
coder543@reddit
Gemma 4 only supports like 30 second audio clips, so… it’s a neat trick, but seems to be pretty limiting
sterby92@reddit
Looks like there is chunking in place?
From the PR: "30-second chunking (splits long audio into 30s segments)"
sterby92@reddit
:(
jacek2023@reddit (OP)
it's merged