Looking for Llama.cpp Alternative to Run Recent Vision Language Models on Apple Silicon
Posted by chibop1@reddit | LocalLLaMA | View on Reddit | 9 comments
I'm looking for a backend engine that can run recent VLMs (vision language Models) on Apple silicon.
I'm a huge fan of llama.cpp, but they hardly pay attention to VLMs any more since they dropped the VLM support from their server on March 7 2024.
Unfortunately none of the recent VLMs such as Qwen2-VL, Phi-3.5-vision, Idefics3, InternVL2, Yi-VL, Chameleon, CogVLM2, GLM-4v, etc are supported. Minicpm-v 2.6 is the only recent model that was added.
Instead of just waiting and wishing, I think it's time to move on and look for alternative. :(
Thank you for your help!
Qnt-@reddit
I have no idea what you ask but try LLAVA on Ollama, its wonderful
chibop1@reddit (OP)
Unfortunately There are better models out there than Llava now. Ollama uses llama.cpp, so it's the same situation.
dreamfoilcreations@reddit
Ollama just added Minicpm-v 2.6, but yeah we need more models, qwen2-vl is very good too, hoping they add it.
Qnt-@reddit
we just got new mistral :) ...hope its running normally on ollama soon,,,,that would be awesome
sammcj@reddit
Try out mistral.rs - it's pretty neat, it has vision and it has a few fancy tricks to run completely different models as MoEs
bobby-chan@reddit
Take a look at https://pypi.org/project/mlx-vlm/
Some of the models you mention are supported here https://huggingface.co/mlx-community
chibop1@reddit (OP)
Thanks! It seems to support more recent models like idefics2, deepseek-vl, Phi-3.5-vision!
Hinged31@reddit
I would follow Awni Hannun for info about this. He’s been posting a lot lately about his experience with vision models. https://x.com/awnihannun?s=21&t=BVhfPLwVzzqRJOcJ7VU3tw
christianweyer@reddit
You could try mistral.rs - which supports some VLMs, such as Phi-3.5 Vision. https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md