Unsloth solved bug in Mistral Medium 3.5 implementation
Posted by Snail_Inference@reddit | LocalLLaMA | View on Reddit | 38 comments
https://unsloth.ai/docs/models/mistral-3.5
"May 1, 2026 Update: We worked with Mistral to fix Mistral Medium 3.5 inference affecting some implementations, and released updated GGUFs with the fix (NOT related to Unsloth or our quants). The issue was caused by a YaRN parsing quirk affecting several implementations, including transformers and llama.cpp. Changing mscale_all_dim from 1 to 0 resolved it. We also fixed mmproj files not being generated correctly."
yoracale@reddit
Thank you to the Mistral team for working with us on this. And thank you to the first few people who said the GGUFs didn't work properly after the conversation didn't work at longer context. It was a tricky bug but glad it all works now.
So be sure to try out the model again whether on transformers or GGUF format, it really is great!
Major-System6752@reddit
Hello. Great work. Do you know about "training bug in Qwen3.5 35B A3B model"? Is it true and is it actual to your quants?
yoracale@reddit
Did you read OP's first comment? It says: "The bug is in the original Qwen 3.5 weights released by Alibaba. Not GGUF. Not HauhauCS. Alibaba shipped it broken. I just fixed it. The cause is training-related - AdamW + MoE + DeltaNet causes rare experts in the last layers to drift. This is a known challenge with recurrent MoE architectures, but Alibaba didn't calibrate it before release."
Major-System6752@reddit
https://www.reddit.com/r/LocalLLaMA/comments/1sfwauj/comment/ofeclx1/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
DeepOrangeSky@reddit
Was that the same bug that was being discussed in this thread: https://old.reddit.com/r/LocalLLaMA/comments/1t09anw/mistral_medium_35_128b_mlx_4bit_70_gb/ ?
danielhanchen@reddit
Yes it should most likely fix it as well! The MLX folks will have to reconvert though
No_Hunter_7786@reddit
About time, this bug was causing weird outputs for a lot of people.
crantob@reddit
Unsloth are some of my favorite teachers.
Limp_Classroom_2645@reddit
Hot dann
uti24@reddit
Yeah it's been some time from release and even no good reviews on Mistral medium 128B, and it will be a while until LMStudio will get an update to run it. Not good.
BaronRabban@reddit
I can give a review. It's good. Very good. Definitely a step up from the 123B Mistral Large.
Creative, unique. It is top notch. I am giving it A+++. Have it running on vllm and llama.cpp and I am going to try and make an exl3 quant.
The launch was botched and hopefully that doesn't taint peoples perspectives of the model. They would be missing out.
segmond@reddit
woot woot! Let's sing praises to team unsloth. Whilst yall download models from whomever on HF, remember who made this happen before you start yapping away about how you don't like unsloth's quants.
yoracale@reddit
Thanks for the constant support really appreciate it! 🙏🥰
schneeble_schnobble@reddit
I'm continuously impressed with how awesome the team at Unsloth is. Not only providing amazing service to the community, but also diving into the hard stuff and working with providers and other oss projects again and again.
Regular-Forever5876@reddit
you chooms are incredibles 🎉😇
danielhanchen@reddit
Thank you! :)
Regular-Forever5876@reddit
would you chooms from the unsloth tram come and have a rapid interview in our YouTube channel?
relmny@reddit
And that's why Unsloth releasing models as soon as possible is a good thing, and not a bad thing as some claim.
mantafloppy@reddit
You dont need the user to test you model, when just loading the model fail.
Unsloth constantly fail basic QOL just to be first to release, and waste user bandwidth and time.
Being able to fix thing dont exonerate them on their repeat/constant/expected crappy QOL practice.
segmond@reddit
Anyone that's a professional software developer will understand the importance of fast feedback cycle in finding bugs and fixing it. A lot of the complainers are just non tech folks without appreciation for the craft.
jacek2023@reddit
I posted Mistral Medium experiences tonight and got downvoted by reddit experts because "unsloth said it doesn't work codrectly" :)
autonomousdev_@reddit
spent all weekend chasing a memory leak in some mistral fork. attention mask was getting computed twice. unsloth found it in like 5 minutes. 30 hours gone. now i just figure every new llm thing has at least one of these bugs built in
danielhanchen@reddit
Oh no no - we need more community help! Keep doing what you do - when we started finding issues, it took a lot of time as well - so keep it up - the community needs all the help on making OSS models work well!
ThePrimeClock@reddit
I've been trying and failing to create a Leanstral quant for some time so I can use it locally. Any chance you could pop that on your list to look at sometime?
AXYZE8@reddit
You replied to a bot btw
All of his comments that are fully lower case follow same structure "did this, got that" and he writes about outdated stuff like dall-e https://www.reddit.com/r/ChatGPT/comments/1szvtvz/comment/oj6b0ha/?context=3
The ones that are not uppercase are his comments and they suddenly dont follow that structure.
a_beautiful_rhind@reddit
Great news! I'm itching to try it and someone volunteered to port to IK_llama.
Cr4xy@reddit
I'm not sure if it's fixed already, but the Devstral 2 Small template also has tool calling issues, maybe the fix could be included in the unsloth GGUFs? https://www.reddit.com/r/MistralAI/comments/1q2u60e/comment/nzn5u1z/
danielhanchen@reddit
Julien from Mistral added a nice note as well here: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/discussions/18
DigThatData@reddit
YaRN parsing?
danielhanchen@reddit
https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF/discussions/6 and https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/commit/c4be198050fb5789774a55b92ed697becfbf20ae
DigThatData@reddit
ok here we go, YaRN is a post-training method for lengthening context. https://arxiv.org/pdf/2309.00071
danielhanchen@reddit
Oh yes that paper is correct!
brown2green@reddit
Did this affect Ministral 3 too? That one uses YaRN too with
"mscale_all_dim": 1.0,and to me that model never worked right.danielhanchen@reddit
That should be ok for now - I can re-check but Ministral's YaRN is fine after we did a large sweep for this fix
ambient_temp_xeno@reddit
Good work sounds like it was a very sneaky bug.
danielhanchen@reddit
Thank you!
spaceman_@reddit
Unsloth GGUFs were updated 6-7 hours ago, 6 hours before the README was updated about the fix.
Do last nights GGUFs include this fix? Can I pull models now and try it out?
yoracale@reddit
Yes they have the fix, just never updated people about it.