MiniMax M2 Llama.cpp support

Posted by ilintar@reddit | LocalLLaMA | View on Reddit | 18 comments

By popular demand, here it is:

https://github.com/ggml-org/llama.cpp/pull/16831

I'll upload GGUFs to https://huggingface.co/ilintar/MiniMax-M2-GGUF, for now uploading Q8_0 (no BF16/F16 since the original model was quantized in FP8) and generating imatrix. I don't expect problems with accepting this PR, as I said, the model is pretty typical :)