What framework support audio / video input for gemma 4?
Posted by ResponsibleTruck4717@reddit | LocalLLaMA | View on Reddit | 4 comments
I tried with transformers but it was too slow.
llama.cpp doesnt support it.
So any good framework?
TokenRingAI@reddit
I haven't verified that it supports Gemma 4 in particular, but VLLM supports single/multi image, video, and audio input.
KokaOP@reddit
not audio, tested it just now, the docs are sheet, gemma4 requires latest vllm which has command for image and audio, exmaples are wacked , TBH just wait for llama.cpp
TokenRingAI@reddit
VLLM supports audio, have not tested it specifically with Gemma 4
https://docs.vllm.ai/en/stable/features/multimodal_inputs/#audio-inputs_1
VLLM is miles ahead of llama.cpp when it comes to fully supporting model features.
No-Blood-9115@reddit
you can search github. I remember seeing a framework handling visual input. but I forgot the name. mlx VL?