MiniMax M2 Llama.cpp support
Posted by ilintar@reddit | LocalLLaMA | View on Reddit | 18 comments
By popular demand, here it is:
https://github.com/ggml-org/llama.cpp/pull/16831
I'll upload GGUFs to https://huggingface.co/ilintar/MiniMax-M2-GGUF, for now uploading Q8_0 (no BF16/F16 since the original model was quantized in FP8) and generating imatrix. I don't expect problems with accepting this PR, as I said, the model is pretty typical :)
Finanzamt_kommt@reddit
Even though I can't run it your a legend 🙏
ilintar@reddit (OP)
Me neither, Johannes Gaessler from the Llama.cpp team has kindly provided a server that can run / convert those beasts.
6969its_a_great_time@reddit
What kind of specs are on that thing?
ilintar@reddit (OP)
6 x 5090 and 512 GB RAM I believe.
Muted-Celebration-47@reddit
I run Q2 on my single 3090 + 64GB DDR5 and got 15-16 t/s. It is fast!
onil_gova@reddit
What are the VRAM requirements for MiniMax M2 at q4_0?
Tasty_Lynx2378@reddit
LM Studio reports
Cturan Q4K GGUF 138.34GB
MLX Community Q4 MLX 128.69GB
AlbeHxT9@reddit
How fast you want it?
bullerwins@reddit
Until Piotr's are up, I have already uploaded the quants here:
https://huggingface.co/bullerwins/MiniMax-M2-GGUF
Wait for his or bart's for the imatrix versions
spaceman_@reddit
Great! Am I correct in interpreting this PR as implementing the structure and architecture of Minimax M2 but all of the shaders and compute implementations are reused from other existing models?
ilintar@reddit (OP)
Yeah, that's how Llama.cpp works, it's modular and based on operations, so when there are no new operations to implement it uses the existing optimizations.
spaceman_@reddit
Interesting! Thanks for taking the time to respond and explain.
The PR mentions that there is no chat template yet as this model has interleaving think blocks. I'm guessing this also means that most tools won't be able to work with this model out of the box without changes to the client side?
ilintar@reddit (OP)
Guess so, but I might actually detach tool calling from reasoning support and just try to add tool call if it doesn't work out of the box.
No_Conversation9561@reddit
https://buymeacoffee.com/ilintar
lumos675@reddit
I am one of supporter. I will support again. Guys please support such a genius man. Let him have as much money as he want to focus on his job.
Leflakk@reddit
You are amazing, thank you
noctrex@reddit
Excellent work, as always!
AccordingRespect3599@reddit
Piotr is unstoppable.