llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged

Posted by ggonavyy@reddit | LocalLLaMA | View on Reddit | 38 comments

https://github.com/ggml-org/llama.cpp/pull/22196

And somehow we already got some GGUFs for it!

https://huggingface.co/CISCai/gemma-4-31B-it-NVFP4-turbo-GGUF

https://huggingface.co/stevelikesrhino/gemma-4-31B-it-nvfp4-GGUF

(the below one is from PR author himself)

https://huggingface.co/michaelw9999/Nemotron-Cascade-2-30B-A3B-NVFP4-GGUF

https://huggingface.co/valikk123/Qwen3.5-35B-A3B-NVFP4-GGUF