Is Mistral-3.5-Medium-128B broken in Llama CPP?

Posted by EmPips@reddit | LocalLLaMA | View on Reddit | 7 comments

Trying some if Bartowski's Q4 quants. Using Vulkan with the latest main branch as of a few hours ago.

The model is coherent - but incredibly weak. I've tried a few sampling settings as well as toggling reasoning on and off. It's lacking knowledge-depth that Magistral Small could decently handle and code tasks fail to run, let alone end up anywhere that'd register on SWE-Bench.

Wondering if anyone's put more time in, tried vLLM, or tried other quants of this model and had a better experience?

[-]

a_beautiful_rhind@reddit

When in doubt, try the hosted version from the company itself for some number of messages. Gemma was different for a while so I assume the same story here. The quants might even be fine but the implementation isn't finished.

[-]

pmttyji@reddit

https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF/discussions/1#69f2574c5d2a92da86823371

Hey guys, we're working with Mistral on a fix. Testing shows that this behavior occurs regardless of who or how the model was converted GGUF. The model initially responds correctly, but later behaves improperly.

Mistral has now labeled GGUF support as a WIP (work in progress). The issue appears most likely to be with the current GGUF parser. Will update you guys once resolved! Thank you.

The vision issue was also something NVIDIA and Mistral experienced while converting the GGUFs, thus investigation also needs to be conducted there.

[-]

ambient_temp_xeno@reddit

The parser, now there's a surprise. They had to manually add a special one for Gemma 4.

[-]

Flinchie76@reddit

I tried the full unquantized version on vLLM nightly. Gave it a python coding task to build an actor system inspired by Akka and Erlang/Beam. It tried to define a method called `def /:` for operator overloading in python and did various other things like writing the code in `/tmp` despite being instructed to "use the current directory" which made it unusable for me. There are better models in that size range.

[-]