Unsloth solved bug in Mistral Medium 3.5 implementation

Posted by Snail_Inference@reddit | LocalLLaMA | View on Reddit | 38 comments

https://unsloth.ai/docs/models/mistral-3.5

"May 1, 2026 Update: We worked with Mistral to fix Mistral Medium 3.5 inference affecting some implementations, and released updated GGUFs with the fix (NOT related to Unsloth or our quants). The issue was caused by a YaRN parsing quirk affecting several implementations, including transformers and llama.cpp. Changing mscale_all_dim from 1 to 0 resolved it. We also fixed mmproj files not being generated correctly."

[-]

yoracale@reddit

Thank you to the Mistral team for working with us on this. And thank you to the first few people who said the GGUFs didn't work properly after the conversation didn't work at longer context. It was a tricky bug but glad it all works now.

So be sure to try out the model again whether on transformers or GGUF format, it really is great!

[-]

Major-System6752@reddit

Hello. Great work. Do you know about "training bug in Qwen3.5 35B A3B model"? Is it true and is it actual to your quants?

[-]

yoracale@reddit

Did you read OP's first comment? It says: "The bug is in the original Qwen 3.5 weights released by Alibaba. Not GGUF. Not HauhauCS. Alibaba shipped it broken. I just fixed it. The cause is training-related - AdamW + MoE + DeltaNet causes rare experts in the last layers to drift. This is a known challenge with recurrent MoE architectures, but Alibaba didn't calibrate it before release."

[-]

Major-System6752@reddit

https://www.reddit.com/r/LocalLLaMA/comments/1sfwauj/comment/ofeclx1/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

[-]

DeepOrangeSky@reddit

Was that the same bug that was being discussed in this thread: https://old.reddit.com/r/LocalLLaMA/comments/1t09anw/mistral_medium_35_128b_mlx_4bit_70_gb/ ?

[-]

danielhanchen@reddit

Yes it should most likely fix it as well! The MLX folks will have to reconvert though

[-]

No_Hunter_7786@reddit

About time, this bug was causing weird outputs for a lot of people.

[-]

crantob@reddit

Unsloth are some of my favorite teachers.

[-]

Limp_Classroom_2645@reddit

Hot dann

[-]

uti24@reddit

Yeah it's been some time from release and even no good reviews on Mistral medium 128B, and it will be a while until LMStudio will get an update to run it. Not good.

[-]

BaronRabban@reddit

I can give a review. It's good. Very good. Definitely a step up from the 123B Mistral Large.

Creative, unique. It is top notch. I am giving it A+++. Have it running on vllm and llama.cpp and I am going to try and make an exl3 quant.

The launch was botched and hopefully that doesn't taint peoples perspectives of the model. They would be missing out.

[-]

segmond@reddit

woot woot! Let's sing praises to team unsloth. Whilst yall download models from whomever on HF, remember who made this happen before you start yapping away about how you don't like unsloth's quants.

[-]

yoracale@reddit

Thanks for the constant support really appreciate it! 🙏🥰

[-]

schneeble_schnobble@reddit

I'm continuously impressed with how awesome the team at Unsloth is. Not only providing amazing service to the community, but also diving into the hard stuff and working with providers and other oss projects again and again.

[-]

Regular-Forever5876@reddit

you chooms are incredibles 🎉😇

[-]

danielhanchen@reddit

Thank you! :)

[-]

Regular-Forever5876@reddit

would you chooms from the unsloth tram come and have a rapid interview in our YouTube channel?

[-]

relmny@reddit

And that's why Unsloth releasing models as soon as possible is a good thing, and not a bad thing as some claim.

[-]

mantafloppy@reddit

You dont need the user to test you model, when just loading the model fail.

Unsloth constantly fail basic QOL just to be first to release, and waste user bandwidth and time.

Being able to fix thing dont exonerate them on their repeat/constant/expected crappy QOL practice.

[-]

segmond@reddit

Anyone that's a professional software developer will understand the importance of fast feedback cycle in finding bugs and fixing it. A lot of the complainers are just non tech folks without appreciation for the craft.

[-]

jacek2023@reddit

I posted Mistral Medium experiences tonight and got downvoted by reddit experts because "unsloth said it doesn't work codrectly" :)

[-]

autonomousdev_@reddit

spent all weekend chasing a memory leak in some mistral fork. attention mask was getting computed twice. unsloth found it in like 5 minutes. 30 hours gone. now i just figure every new llm thing has at least one of these bugs built in

[-]

danielhanchen@reddit

Oh no no - we need more community help! Keep doing what you do - when we started finding issues, it took a lot of time as well - so keep it up - the community needs all the help on making OSS models work well!

[-]

ThePrimeClock@reddit

I've been trying and failing to create a Leanstral quant for some time so I can use it locally. Any chance you could pop that on your list to look at sometime?

[-]

AXYZE8@reddit

You replied to a bot btw

All of his comments that are fully lower case follow same structure "did this, got that" and he writes about outdated stuff like dall-e https://www.reddit.com/r/ChatGPT/comments/1szvtvz/comment/oj6b0ha/?context=3

The ones that are not uppercase are his comments and they suddenly dont follow that structure.

[-]

yoracale@reddit

Yes they have the fix, just never updated people about it.