MiniMax M2.7 GGUF Investigation, Fixes, Benchmarks

Posted by danielhanchen@reddit | LocalLLaMA | View on Reddit | 50 comments

Hey r/LocalLLaMA, we did an investigation into MiniMax-M2.7 GGUF causing NaNs on perplexity. Our findings show the issue affects 21%-38% of all GGUFs on Hugging Face (not just ours).

Which quants did we test?

Also, CUDA 13.2 is still definitely an issue. This causes some low bit quants on all models to get gibberish. Some people have dismissed it as not being an issue, but from what we’ve seen, more than 50 people have now confirmed that using CUDA 13.1 and lower fixes it. You can also see some of the public comments in our Hugging Face discussions, Reddit posts etc. NVIDIA has acknowledged that they are investigating the issue - see Unsloth Issue 4849, llama.cpp issue 21255, issue 21371

If you have any questions please do ask and thank you again for all the support as always. Appreciate it and hope you have a lovely week.