Magistral Small 2509 has been released

[-]

dobomex761604@reddit

Their insistence on mistral-common is very prudish, this is not how llama.cpp works and not how models are tested. It has been discussed in a pull request, but Mistral team are not ready to align with community, it seems. Oh well, another mistake.

[-]

fish312@reddit

Worse news.

they added it as a dependency so it's not possible to even convert any other model without mistral common installed ever since https://github.com/ggml-org/llama.cpp/pull/14737 was merged!

Please make your displeasure known as this kind of favoritism behaviour can lead to the degradation of FOSS projects.

[-]

dobomex761604@reddit

In this PR https://github.com/ggml-org/llama.cpp/pull/15420 they discussed it deeper with llama.cpp team. You can also see TheLocalDrummer's issues working with it, and even discussion of the message Mistral have put into the model description. This is how companies fake opensource support.

[-]

ttkciar@reddit

Thanks for that link. It looks like the Mistral team is at least willing to be flexible, and comply to the llama.cpp vision.

Regarding MaggotHate's comment there earlier today, I too am a frequent user of llama-cli so look forward to a resolution.

[-]

dobomex761604@reddit

Like TheLocalDrummer has pointed out in that same pullrequest, mistral-common is now required to covert Mistral models. I don't think moves like that can be called "flexible".

[-]

silenceimpaired@reddit

I don’t understand this concern. What are they doing?

[-]

dobomex761604@reddit

They essentially don't want to write the prompt format; they don't want to include it into metadata either, and instead want everyone to use their library. This instantly cuts off a number of testing tools and, potentially, third-party clients.

[-]

ForsookComparison@reddit

and instead want everyone to use their library

I love Mistral but my crazy conspiracy theory that someone at that company is truly banking on regulators to declare them as "the EU compliant model" is creeping into not-crazy territory. You don't do stuff like this if you don't expect there to be some artificial moat in your favor.

[-]

ttkciar@reddit

From my perspective, it looks like the industry is figuring out that chat really needs a protocol, not a template, and the transition from one to the other is rough.

OpenAI's Harmony "response format" is also more of a protocol than template.

We should expect that evolution to continue, I think.

[-]

dobomex761604@reddit

The industry of Large Language Models that are based on Natural Language Processing is forgetting what Natural Language means and forces programming onto chat templates - that's what's happening, and it's very unfortunate.

[-]

Final_Wheel_7486@reddit

Maybe they're talking about model architecture or, less likely, the chat template I'd guess, but no idea tbh

[-]

pvp239@reddit

Hey,

Mistral employee here! Just a note on mistral-common and llama.cpp.

As written in the model card: https://huggingface.co/mistralai/Magistral-Small-2509-GGUF#usage

We release the model with mistral_common to ensure correctness
We welcome by all means community GGUFs with chat template - we just provide mistral_common as a reference that has ensured correct chat behavior
It’s not true that you need mistral_common to convert mistral checkpoints, you can just convert without and provide a chat template
I think from the discussion on the pull request it should become clear that we‘ve added mistral_common as an additional dependency (it’s not even the default for mistral models)

[-]

fish312@reddit

You do need it to convert the model. Ever since https://github.com/ggml-org/llama.cpp/pull/14737 was merged it's now a dependency since the import does not fallback gracefully and the convert script will crash if mistral-common is not installed

[-]

dobomex761604@reddit

We welcome by all means community GGUFs with chat template - we just provide mistral_common as a reference that has ensured correct chat behavior

Hi! In this case, why don't you provide the template? What exactly prevents you from giving us both the template and still recommend mistral-common? For now, you leave community without an option.

It’s not true that you need mistral_common to convert mistral checkpoints, you can just convert without and provide a chat template

How about you go and read this comment by TheDrummer.

I think from the discussion on the pull request it should become clear that we‘ve added mistral_common as an additional dependency (it’s not even the default for mistral models)

The model card description makes it look the opposite.

[-]

pvp239@reddit

If you want to use checkpoint with mistral_common you can use unsloth‘s repo: https://huggingface.co/unsloth/Magistral-Small-2509-GGUF no? We link to it at the very top from the model card.

We don’t provide the chat template because we don’t have time to test it before releases and/or because the behavior is not yet supported.

We are worried that incorrect chat templates lead to people believing the checkpoint doesn’t work which happened a couple times in the past with Devstral e.g.

[-]

dobomex761604@reddit

Just to add to the whole conversation: I've just tested Magistral 2509, and while it's much better than the previous Magistral, the model is less stable than Mistral 3 (the first one) and all your previous models on the same local setup - Mistral 7, Mistral Nemo, Mistral Small 22b all work without issues.

It really seems like you actually should spend time on testing chat templates. Something changed since Small 3.1, go back to that setup, see what you've changed in your workflows. Of course, you don't have to believe me, my only job is to warn you that something is off, and it will continue to cause you problems in future unless fixed. We love your models, and we want them to be better, not worse.

[-]

cobbleplox@reddit

If you want to use checkpoint with mistral_common you can use unsloth‘s repo:

Did you mean without maybe?

Tekken is terrible enough btw, hard enough to have it as part of a solution with exchangable models as it is. An extra dependency is the last thing needed.

Regarding tekken, the worst thing about it is the restriction to message pairs and lack of the usual ways of setting system instructions. And if that's wrong, well one can read your entire guide about tekkenv3 without getting a proper example. Is it still impossible to even have the correct format in the text that goes into a standard tokenizer because they are protected? Sorry if I got that mixed up with some other format.

[-]

dobomex761604@reddit

The whole question of templates is huge; I still think that ChatML was a mistake because of strict "user-assistant" roles, and older Alpaca templates were more natural. In some ways Tekken could've solve this...but nope, no roles for you.

[-]

mikael110@reddit

It's true that models with wrong templates have been an issue in the past, and it can seriously impact the reputation of a model. But the best way to combat that is to provide the correct template yourself.

99% of people that user llama.cpp will not use mistral-common, period. That's simply not how people use llama.cpp. So I'd strongly argue that putting the resources you put into mistral-common into actually testing a regular chat template with the model would achieve far more if you actually want users to have a positive first impression of the model.

There's also community sentiment to take into account, as this very thread shows the llama.cpp community at large is not a fan of the mistral-common approach. That should be something you take into account.

[-]

dobomex761604@reddit

What do you mean by "the behavior is not yet supported" for the chat template of your own model? mistral-common is supposed to contain the same template, that how all instruct-tuned LLMs work.

If you are worried about an incorrect chat templates, then provide a correct one! It's your model, how could you not know what is correct chat template and what is not?

You had https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md, which was useful - why not link it? By forcing mistral-common you avoid the issue, not fix it.

[-]

a_beautiful_rhind@reddit

Don't understand this problem...

Use mistral common.
Run some example chats.
Have it shit everything out.
Write up chat format.

What am I missing here? Some kind of tokenization problem? [inst] become different values? Spaces are placed dynamically? Tool calls?

Could this not be done with a python script and output uploaded to HF? Would have been less work than trying to shoehorn python into llama.cpp Stuff is not rocket science.

[-]

dobomex761604@reddit

Not everyone uses python for llms.

[-]

a_beautiful_rhind@reddit

Right that's the point. But mistral common is a python package and some sample output could be used to craft a template to use anywhere.

Instead the company forces a python dependency into llama.cpp.

[-]

silenceimpaired@reddit

I am sure you don’t have the power to choose or comment but if you could pass along this idea I would appreciate it:

Mistral could release their base model for Medium without finetuning under Apache. Leave the fine tuned instruct behind API. I think it would serve hobbyists and Mistral. Businesses could see how much better a fine tune from Mistral would be via APi and hobbyists could create their own fine tunes… which typically include open data which Mistral could add to their closed API model.

There is a lot I like about Mistral models and want to see them thrive, but 24b compared against the model sizes Qwen releases I think reveals quite a wide gap in capability.

[-]

_bachrc@reddit

Any idea on how to make the custom think tags work with lm studio? :(

[-]

Iory1998@reddit

Go to the Model section, find your model, click on the gear icon next to it, and go to the model template. Scroll down, and you will find the default think tags. Change them there.

[-]

H3g3m0n@reddit

The GGUF isn't working for me with llama.cpp.

It ignores my prompt and outputs generic information about Mistral AI.

Using the following args:

  -hf mistralai/Magistral-Small-2509-GGUF
  --special
  --ctx-size 12684
  --flash-attn on
  -ngl 20
  --jinja --temp 0.7 --top-k -1 --top-p 0.95

[-]

My_Unbiased_Opinion@reddit

Mistral 3.2 2506 is my go to jack of all trades model. Used magistral before but it doesn't have proper vision support which I need. Also noticed it would go into repetition loops.

If that's fixed, I'm 100% switching to this. Mistral models are extremely versatile. No hate on Qwen, but these models are not one trick ponies.

[-]

alew3@reddit

how do you run it? I really like it, but tool calling is broken with vLLM unfortunately.

[-]

claytonkb@reddit

Same here -- what tools are folks running vision models locally with?

[-]

ThrowThrowThrowYourC@reddit

For me magistral 1.1 was my go to model. Really excited to give this a go, If the benchmark translate into real life results it seems pretty awesome

[-]

No_Conversation9561@reddit

wish they opened up medium

[-]

jacek2023@reddit (OP)

I believe medium is important for their business model

[-]

silenceimpaired@reddit

They could release the base model without fine tuning.

[-]

Odd-Ordinary-5922@reddit

if only it was moe :c

[-]

ttkciar@reddit

Some of us prefer dense models. MoE has its place and value, but it's nice to see not everyone has jumped on the MoE bandwagon.

Models in the 24B to 32B range, once quantized, are just the right size for 32GB VRAM systems.

[-]

jacek2023@reddit (OP)

It's small

[-]

Odd-Ordinary-5922@reddit

a model that can fit in a 4090 once quantized is not small bro

[-]

jacek2023@reddit (OP)

Why use 4090 if you could use two 3090s?

[-]

sleepingsysadmin@reddit

wow. epic. I cant wait for the unsloth conversion.

Small 1.2 is better than medium 1.1 by a fair amount? Amazing.

[-]

thetobesgeorge@reddit

Forgive my ignorance, what is the benefit of the Unsloth version?
And is there any special way to run it?
Every Unsloth version I’ve tried I’ve had issues with random gibberish coming out compared to the “vanilla” version, with all other settings being equal

[-]

Xamanthas@reddit

You posted this 4 minutes after daniel linked them himself in the comments 🤨

[-]

sleepingsysadmin@reddit

when i clicked the thread, there was no comments. I guess I spent a few minutes checking the links and typing my comment.

[-]

DinoAmino@reddit

Caching be like that. Happens all the time for me.

[-]

sleepingsysadmin@reddit

Luckily I said I cant wait, and I didnt have to wait because unsloth team is epic.

[-]

sleepingsysadmin@reddit

First benchmark test. It took a bit of time, it's only giving me 16 token/s. I'll have to tinker with the settingsbecause usually I get 40+ from devstral small.

But one shot result was a success. Impressive.

[-]

Cool-Chemical-5629@reddit

What did you one shot this time?

[-]

sleepingsysadmin@reddit

my personal private benchmark that cant be trained for. I certainly believe the livecodebench score.

[-]

My_Unbiased_Opinion@reddit

Unsloth is already up! Looks like they worked together behind the scenes.

[-]

sleepingsysadmin@reddit

That team is so great. Wierd, lm studio refused to see it until i specifically searched magistral 2509

[-]

Cool-Chemical-5629@reddit

Just copy & paste the whole model path from HF using that Copy button. That always works for me.

[-]

Qual_@reddit

oh ohohoh I'll test it with my battleslop benchmark :D

[-]

jacek2023@reddit (OP)

How does it work?

[-]

Qual_@reddit

It's a stupid variation of battleship but with cards, mana management etc. There is around 20 different cards ( simple shot from large area nukes, Intel gathering via satellites , defense stuff etc )

[-]

toothpastespiders@reddit

These kind of weird benchmarks are always my favorite. I think the further we get from a strict test x, test y, test z the better it often reflects the complexities of real world use. Or I could be totally off. But they're fun.

[-]