MiniMax-M2 llama.cpp

Posted by butlan@reddit | LocalLLaMA | View on Reddit | 11 comments

I tried to implement it, it's fully cursor generated ai slop code, sorry. The chat template is strange; I'm 100% sure it's not correctly implemented, but it works with the roo code (Q2 is bad, Q4 is fine) at least. Anyone who wants to waste 100gb bandwidth can give it a try.

test device and command : 2x4090 and lot of ram

./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 50000 --reasoning-format auto

code: here gguf: here

https://reddit.com/link/1oilwvm/video/ofpwt9vn4xxf1/player