mtmd: add Gemma 4 audio conformer encoder support

audio processing support for Gemma 4 models

[-]

Would be amazing to somehow integrate this into home assistant voice assist as the STT

[-]

Have you tried using the open source sesame ai software? For a natural voice

[-]

You can use the project wyoming_openai that is a middleware between the two protocols.

[-]

currently do use that for parakeet, Ill mess with it and see if I can get it working and if its better than parakeet

[-]

We need a new benchmark for this.

[-]

When will the change land in llama.cpp? Looking forward to use this for my agent setup and get rid of whisper :)

[-]

Gemma 4 only supports like 30 second audio clips, so… it’s a neat trick, but seems to be pretty limiting

[-]

Looks like there is chunking in place?

From the PR: "30-second chunking (splits long audio into 30s segments)"

[-]

[-]

it's merged