Mac Users: New Mistral Large MLX Quants for Apple Silicon (MLX)

Posted by thezachlandes@reddit | LocalLLaMA | View on Reddit | 15 comments

Hey! I’ve created q2 and q4 MLX quants of the new mistral large, for MLX (apple silicon). The q2 is up, and the q4 is uploading. I used the MLX-LM library for conversion and quantization from the full Mistral release.

With q2 I got 7.4 tokens/sec on my m4 max with 128GB RAM, and the model took about 42.3GB of RAM. These should run significantly faster than GGUF on M-series chips.

You can run this in LMStudio or any other system that supports MLX.

Models:

https://huggingface.co/zachlandes/Mistral-Large-Instruct-2411-Q2-MLX

https://huggingface.co/zachlandes/Mistral-Large-Instruct-2411-Q4-MLX