Is Mistral-3.5-Medium-128B broken in Llama CPP?
Posted by EmPips@reddit | LocalLLaMA | View on Reddit | 7 comments
Trying some if Bartowski's Q4 quants. Using Vulkan with the latest main branch as of a few hours ago.
The model is coherent - but incredibly weak. I've tried a few sampling settings as well as toggling reasoning on and off. It's lacking knowledge-depth that Magistral Small could decently handle and code tasks fail to run, let alone end up anywhere that'd register on SWE-Bench.
Wondering if anyone's put more time in, tried vLLM, or tried other quants of this model and had a better experience?
a_beautiful_rhind@reddit
When in doubt, try the hosted version from the company itself for some number of messages. Gemma was different for a while so I assume the same story here. The quants might even be fine but the implementation isn't finished.
pmttyji@reddit
https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF/discussions/1#69f2574c5d2a92da86823371
Mistral has now labeled GGUF support as a WIP (work in progress). The issue appears most likely to be with the current GGUF parser. Will update you guys once resolved! Thank you.
The vision issue was also something NVIDIA and Mistral experienced while converting the GGUFs, thus investigation also needs to be conducted there.
ambient_temp_xeno@reddit
The parser, now there's a surprise. They had to manually add a special one for Gemma 4.
Flinchie76@reddit
I tried the full unquantized version on vLLM nightly. Gave it a python coding task to build an actor system inspired by Akka and Erlang/Beam. It tried to define a method called `def /:` for operator overloading in python and did various other things like writing the code in `/tmp` despite being instructed to "use the current directory" which made it unusable for me. There are better models in that size range.
Terminator857@reddit
Bugs seem to be found after every major new model release and get fixed quickly in the first week.
seamonn@reddit
Surprised Pikachu face
ResidentPositive4122@reddit
As usual, give it a few weeks. There are "gremlins" everywhere, not just in gpt5.5 :)