Mac Users: New Mistral Large MLX Quants for Apple Silicon (MLX)
Posted by thezachlandes@reddit | LocalLLaMA | View on Reddit | 15 comments
Hey! I’ve created q2 and q4 MLX quants of the new mistral large, for MLX (apple silicon). The q2 is up, and the q4 is uploading. I used the MLX-LM library for conversion and quantization from the full Mistral release.
With q2 I got 7.4 tokens/sec on my m4 max with 128GB RAM, and the model took about 42.3GB of RAM. These should run significantly faster than GGUF on M-series chips.
You can run this in LMStudio or any other system that supports MLX.
Models:
https://huggingface.co/zachlandes/Mistral-Large-Instruct-2411-Q2-MLX
https://huggingface.co/zachlandes/Mistral-Large-Instruct-2411-Q4-MLX
Durian881@reddit
Thank you very much!
thenomadexplorerlife@reddit
How good will be Mistral large q2 over llama 70b q4? I am getting a m4 pro 64gb but was feeling bad I cannot run mistral large q4 due to less memory.
matadorius@reddit
Damm i am wondering if I should go for 64gb rather than 48 now
thezachlandes@reddit (OP)
64GB on the max chip has a higher memory bandwidth than 48GB. Double check to be sure, but that's what I figured out from the table on the macbook pro wikipedia
matadorius@reddit
Yeah but if I get the 16max up I better pay the 600€ extra and get 128gb but it seems like a waste of money pay 2x of what I initially wanted
cm8ty@reddit
Curious to know the tok/sec w/ q4. Congrats on the new beast-of-a-machine btw
thezachlandes@reddit (OP)
Very slow. .58 tokens/sec. I'm sure there are use cases!
SomeOddCodeGuy@reddit
What processing time are you seeing a larger prompt? Really curious to see what the total time is for MLX vs ggufs; I've only ever tried ggufs on the mac.
MaxDPS@reddit
I did a comparison between MLX vs GGUF with Codestral earlier today. The difference was roughly ~20% faster on MLX.
thezachlandes@reddit (OP)
I had 20% in a test I did with another model
busylivin_322@reddit
Anyone know if mlx quants would work with Ollama?
thezachlandes@reddit (OP)
It should
jzn21@reddit
For some reason all mistral large models run very slow on my M2 Ultra. Will try this one!
Such_Advantage_6949@reddit
I just got my mac max and am new to mlx, what is the library to run it and is there any format enforcement option like enforce json etc?
Special_System_6627@reddit
Try LM studio