Mistral medium 3.5 128B, MLX 4bit, ~70 GB
Posted by ex-arman68@reddit | LocalLLaMA | View on Reddit | 12 comments
I converted Mistral medium 3.5 128B to MLX 4bit. Eagle model for speculative decoding is not yet supported by MLX.
Vision encoder included (full BF16 unquantized. Thinking mode works (reasoning_effort="high" gives you the [THINK]...[/THINK] chain), tool calling works, 256K context.
There was a bug in mlx-vlm's mistral3 sanitize function: it wasn't stripping the model. prefix from vision tower and projector keys. This caused 438 parameters to be skipped. I patched it locally before converting. Details in the HF readme.
I am getting \~5 tok/s on a 96 GB M2 Max. For sampling I recommend using temp 0.7 / top_p 0.95 / top_k 20 in reasoning mode, or temp 0.0–0.7 / top_p 0.8 for quick replies. Mistral recommends leaving repeat penalty disabled, but I am getting too many loops; I am not sure what the best value should be.
No_Algae1753@reddit
By the way since you are using the same mac as i do, can you share your setup and settings you use? Im open for every tipp
ex-arman68@reddit (OP)
put this in your .zshrc then type
moreramin the terminal after each reboot:No_Algae1753@reddit
Already set that to 96000 lol
silenceimpaired@reddit
I really don't get why they don't realize a 70b model again.
chuvadenovembro@reddit
Obrigado, pretendo testar em um Mac studo M2 ultra com 128
ex-arman68@reddit (OP)
This model seems utterly broken for now. I do not recommend downloading or using it, unless you are planning to help troubleshoot it. This is not a problem with the conversion, but with the model itself.
chuvadenovembro@reddit
fui testar agora e vi que está quebrado mesmo
starkruzr@reddit
llama2 architecture. it appears to exist just so that Euros can say they're using something built by Euros.
BaronRabban@reddit
Use RecViking/Mistral-Medium-3.5-128B-NVFP4.
It is very good and I confirm it works.
No_Algae1753@reddit
Nice work! Can you also upload one without vision? I think this could also reduce the memory footprint right? And about the loops, they are working on this issue.
ex-arman68@reddit (OP)
This model seems utterly broken for now. I do not recommend downloading or using it, unless you are planning to help troubleshoot it. This is not a problem with the conversion, but with the model itself.
No_Algae1753@reddit
Yeah I also experienced this issue with the ggufs from unsloth