MiniMax-M2 llama.cpp
Posted by butlan@reddit | LocalLLaMA | View on Reddit | 11 comments
I tried to implement it, it's fully cursor generated ai slop code, sorry. The chat template is strange; I'm 100% sure it's not correctly implemented, but it works with the roo code (Q2 is bad, Q4 is fine) at least. Anyone who wants to waste 100gb bandwidth can give it a try.
test device and command : 2x4090 and lot of ram
./llama-server -m minimax-m2-Q4_K.gguf -ngl 999 --cpu-moe --jinja -fa on -c 50000 --reasoning-format auto
https://reddit.com/link/1oilwvm/video/ofpwt9vn4xxf1/player
jacek2023@reddit
what about
https://github.com/ggml-org/llama.cpp/pull/16831
FullstackSensei@reddit
Cursor can handle 20k like files?!! Dang!!!
butlan@reddit (OP)
up to 50k is fine.
Qwen30bEnjoyer@reddit
How does the Q2 compare to GPT OSS 120b Q4 or GLM 4.5 Air Q4? Given that they have the same memory footprint, and all three are at the limits of what I can run with my laptop.
butlan@reddit (OP)
It's much better than gpt-oss 120b for my use case.
ilintar@reddit
Thanks, I made a stupid mistake in my (non-vide-coded :>) implementation that I'm working on and had a working one to run comparisons ;>
butlan@reddit (OP)
I saw your comment about the chat template being tricky, you were spot on. Wise man! I bet you could implement it properly in a day. It doesn’t even look that complex a model, though it’s still kind of a mystery if this implementation actually works right.
ilintar@reddit
I did implement it, in fact, by popular demand ;> but the chat implementation will have to wait a bit since we have to figure out how to properly serve interleaved thinking (non-trivial issue, for now it's best to leave all the thinking parsing to the client).
muxxington@reddit
Pretty cool. We always have to remember that things will never be worse than that. They can only get better.
FullOf_Bad_Ideas@reddit
You should 100% update the model card on HF to mention the fork you're using to run it. Otherwise it will confuse people a lot. Great stuff otherwise!
solidsnakeblue@reddit
Dang, nicely done