Looking for Llama.cpp Alternative to Run Recent Vision Language Models on Apple Silicon

Posted by chibop1@reddit | LocalLLaMA | View on Reddit | 9 comments

I'm looking for a backend engine that can run recent VLMs (vision language Models) on Apple silicon.

I'm a huge fan of llama.cpp, but they hardly pay attention to VLMs any more since they dropped the VLM support from their server on March 7 2024.

Unfortunately none of the recent VLMs such as Qwen2-VL, Phi-3.5-vision, Idefics3, InternVL2, Yi-VL, Chameleon, CogVLM2, GLM-4v, etc are supported. Minicpm-v 2.6 is the only recent model that was added.

Instead of just waiting and wishing, I think it's time to move on and look for alternative. :(

Thank you for your help!

[-]

Qnt-@reddit

I have no idea what you ask but try LLAVA on Ollama, its wonderful

[-]

chibop1@reddit (OP)

Unfortunately There are better models out there than Llava now. Ollama uses llama.cpp, so it's the same situation.

[-]

chibop1@reddit (OP)

Thanks! It seems to support more recent models like idefics2, deepseek-vl, Phi-3.5-vision!

[-]

Hinged31@reddit

I would follow Awni Hannun for info about this. He’s been posting a lot lately about his experience with vision models. https://x.com/awnihannun?s=21&t=BVhfPLwVzzqRJOcJ7VU3tw

[-]

christianweyer@reddit

You could try mistral.rs - which supports some VLMs, such as Phi-3.5 Vision. https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md