Their insistence on mistral-common is very prudish, this is not how llama.cpp works and not how models are tested. It has been discussed in a pull request, but Mistral team are not ready to align with community, it seems. Oh well, another mistake.
they added it as a dependency so it's not possible to even convert any other model without mistral common installed ever since https://github.com/ggml-org/llama.cpp/pull/14737 was merged!
Please make your displeasure known as this kind of favoritism behaviour can lead to the degradation of FOSS projects.
In this PR https://github.com/ggml-org/llama.cpp/pull/15420 they discussed it deeper with llama.cpp team. You can also see TheLocalDrummer's issues working with it, and even discussion of the message Mistral have put into the model description. This is how companies fake opensource support.
Like TheLocalDrummer has pointed out in that same pullrequest, mistral-common is now required to covert Mistral models. I don't think moves like that can be called "flexible".
They essentially don't want to write the prompt format; they don't want to include it into metadata either, and instead want everyone to use their library. This instantly cuts off a number of testing tools and, potentially, third-party clients.
I love Mistral but my crazy conspiracy theory that someone at that company is truly banking on regulators to declare them as "the EU compliant model" is creeping into not-crazy territory. You don't do stuff like this if you don't expect there to be some artificial moat in your favor.
From my perspective, it looks like the industry is figuring out that chat really needs a protocol, not a template, and the transition from one to the other is rough.
OpenAI's Harmony "response format" is also more of a protocol than template.
We should expect that evolution to continue, I think.
The industry of Large Language Models that are based on Natural Language Processing is forgetting what Natural Language means and forces programming onto chat templates - that's what's happening, and it's very unfortunate.
Mistral employee here! Just a note on mistral-common and llama.cpp.
As written in the model card: https://huggingface.co/mistralai/Magistral-Small-2509-GGUF#usage
We release the model with mistral_common to ensure correctness
We welcome by all means community GGUFs with chat template - we just provide mistral_common as a reference that has ensured correct chat behavior
It’s not true that you need mistral_common to convert mistral checkpoints, you can just convert without and provide a chat template
I think from the discussion on the pull request it should become clear that we‘ve added mistral_common as an additional dependency (it’s not even the default for mistral models)
You do need it to convert the model. Ever since https://github.com/ggml-org/llama.cpp/pull/14737 was merged it's now a dependency since the import does not fallback gracefully and the convert script will crash if mistral-common is not installed
We welcome by all means community GGUFs with chat template - we just provide mistral_common as a reference that has ensured correct chat behavior
Hi! In this case, why don't you provide the template? What exactly prevents you from giving us both the template and still recommend mistral-common? For now, you leave community without an option.
It’s not true that you need mistral_common to convert mistral checkpoints, you can just convert without and provide a chat template
How about you go and read this comment by TheDrummer.
I think from the discussion on the pull request it should become clear that we‘ve added mistral_common as an additional dependency (it’s not even the default for mistral models)
The model card description makes it look the opposite.
If you want to use checkpoint with mistral_common you can use unsloth‘s repo: https://huggingface.co/unsloth/Magistral-Small-2509-GGUF no? We link to it at the very top from the model card.
We don’t provide the chat template because we don’t have time to test it before releases and/or because the behavior is not yet supported.
We are worried that incorrect chat templates lead to people believing the checkpoint doesn’t work which happened a couple times in the past with Devstral e.g.
Just to add to the whole conversation: I've just tested Magistral 2509, and while it's much better than the previous Magistral, the model is less stable than Mistral 3 (the first one) and all your previous models on the same local setup - Mistral 7, Mistral Nemo, Mistral Small 22b all work without issues.
It really seems like you actually should spend time on testing chat templates. Something changed since Small 3.1, go back to that setup, see what you've changed in your workflows. Of course, you don't have to believe me, my only job is to warn you that something is off, and it will continue to cause you problems in future unless fixed. We love your models, and we want them to be better, not worse.
If you want to use checkpoint with mistral_common you can use unsloth‘s repo:
Did you mean without maybe?
Tekken is terrible enough btw, hard enough to have it as part of a solution with exchangable models as it is. An extra dependency is the last thing needed.
Regarding tekken, the worst thing about it is the restriction to message pairs and lack of the usual ways of setting system instructions. And if that's wrong, well one can read your entire guide about tekkenv3 without getting a proper example. Is it still impossible to even have the correct format in the text that goes into a standard tokenizer because they are protected? Sorry if I got that mixed up with some other format.
The whole question of templates is huge; I still think that ChatML was a mistake because of strict "user-assistant" roles, and older Alpaca templates were more natural. In some ways Tekken could've solve this...but nope, no roles for you.
It's true that models with wrong templates have been an issue in the past, and it can seriously impact the reputation of a model. But the best way to combat that is to provide the correct template yourself.
99% of people that user llama.cpp will not use mistral-common, period. That's simply not how people use llama.cpp. So I'd strongly argue that putting the resources you put into mistral-common into actually testing a regular chat template with the model would achieve far more if you actually want users to have a positive first impression of the model.
There's also community sentiment to take into account, as this very thread shows the llama.cpp community at large is not a fan of the mistral-common approach. That should be something you take into account.
What do you mean by "the behavior is not yet supported" for the chat template of your own model? mistral-common is supposed to contain the same template, that how all instruct-tuned LLMs work.
If you are worried about an incorrect chat templates, then provide a correct one! It's your model, how could you not know what is correct chat template and what is not?
You had https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md, which was useful - why not link it? By forcing mistral-common you avoid the issue, not fix it.
What am I missing here? Some kind of tokenization problem? [inst] become different values? Spaces are placed dynamically? Tool calls?
Could this not be done with a python script and output uploaded to HF? Would have been less work than trying to shoehorn python into llama.cpp Stuff is not rocket science.
I am sure you don’t have the power to choose or comment but if you could pass along this idea I would appreciate it:
Mistral could release their base model for Medium without finetuning under Apache. Leave the fine tuned instruct behind API. I think it would serve hobbyists and Mistral. Businesses could see how much better a fine tune from Mistral would be via APi and hobbyists could create their own fine tunes… which typically include open data which Mistral could add to their closed API model.
There is a lot I like about Mistral models and want to see them thrive, but 24b compared against the model sizes Qwen releases I think reveals quite a wide gap in capability.
Go to the Model section, find your model, click on the gear icon next to it, and go to the model template. Scroll down, and you will find the default think tags. Change them there.
Mistral 3.2 2506 is my go to jack of all trades model. Used magistral before but it doesn't have proper vision support which I need. Also noticed it would go into repetition loops.
If that's fixed, I'm 100% switching to this. Mistral models are extremely versatile. No hate on Qwen, but these models are not one trick ponies.
Forgive my ignorance, what is the benefit of the Unsloth version?
And is there any special way to run it?
Every Unsloth version I’ve tried I’ve had issues with random gibberish coming out compared to the “vanilla” version, with all other settings being equal
First benchmark test. It took a bit of time, it's only giving me 16 token/s. I'll have to tinker with the settingsbecause usually I get 40+ from devstral small.
It's a stupid variation of battleship but with cards, mana management etc. There is around 20 different cards ( simple shot from large area nukes, Intel gathering via satellites , defense stuff etc )
These kind of weird benchmarks are always my favorite. I think the further we get from a strict test x, test y, test z the better it often reflects the complexities of real world use. Or I could be totally off. But they're fun.
Hey Dan,
You're bloody amazing, I don't know how you get so much done. Being both meticulous and efficient is incredibly rare. Thanks for all of your incredible work.
Some feedback if it's helpful. Could you briefly explain the difference between GGUF, Dynamic FP* and FP8 torchAO in the model cards. I had a look at the model cards but they don't mention why that format should be chosen or how it is different to the standard safetensor or gguf.
I read the guide and there's a tiny bit at the bottom: "Both are fantastic to deploy via vLLM. Read up on using TorchAO based FP8 quants in vLLMhere" and I read that link, but still didn't make it clear if there was some benefit I should be taking advantage of or not. Some text in the model cards explaining why you offered that format and understand which to choose that would be amazing.
It also says "Unsloth Dynamic 2.0achieves SOTA performance in model quantization." But this model isn't in the "Unsloth Dynamic 2.0 Quants" model list. As I understand it, you might not be updating that list for every model but they are all in fact UD 2.0 ggufs everywhere now?
Just wanted to clarify. Thanks again for your fantastic work. Endlessly appreciate how much you're doing for the local team.
Thanks! So we're still experimenting with vLLM and TorchAO based quants - our goal mainly is to collaborate with everyone in the community to deliver the best quants :) The plan is to provide MXFP4 so float4 quants as well in the future.
For now both torchAO and vLLM type quants should be great!
Hm I'm trying your 8-bit GGUF but the output doesn't seem to be wrapping the thinking in tags. The jinja template seems to have THINK in plaintext and according to the readme it should be a special token instead?
Oh wait can you try with the flag --special when launching llama.cpp - since it's a special token, it won't be shown - using --special will render it in llama.cpp, and I'm pretty sure it comes up - but best to confirm again
First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.
Your thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.[/THINK]Here, provide a self-contained response.
I wish they would release their base model of Medium. Leave the fine tuned instruct behind API. I think it would serve hobbyists and them. Businesses could see how much better a fine tune from Mistral would be and hobbyists could create their own fine tunes… which typically include open data which Mistral could add to their closed API model.
I get that… but this isn’t that. This would just be their base model before they fine tune it. I’m holding out hope someone from the company will see my post and reconsider as I think it would benefit them. Chinese models continue to be released larger and with the same licensing. I think this would keep their company in focus.
Which is weird to me… I
Guess there could be a safety element, but the special sauce of instruct seems like it has higher value. So for companies hesitant to give away their cash cow… it seems an elegant solution. You can point out how much better instruct is on your model compared to the base model.
Nowadays the final Instruct models aren't simply base models with some instruction finetuning that hobbyists can easily compete with. The final training phase (post-training) for SOTA models can be very extensive. Just releasing a base model that almost nobody can hope to turn useful probably wouldn't look good.
So Small 1.2 is now better than Medium 1.1 ? That's crazy impressive. Glad to see my fellow Frenchies continue to deliver! Now I'm waiting for MLX and support in LM Studio. Let's hope it won't take too much time.
Agreed, heck I'm getting anxiety just from seeing the benchmarks claiming that small model X is better than a big model Y. Just sheer experience from the endless chains of disappointments drove me to conclusion that such claims should be always seen as a red flag. I love Mistral models, so I'm hoping this one to be a different story.
Yeah, measuring performance is among the biggest open questions in ML ecosystem. It's so easy to trick benchmarks (overfitting), and also in my experience somehow terrific models can perform very average.
dobomex761604@reddit
Their insistence on
mistral-common
is very prudish, this is not how llama.cpp works and not how models are tested. It has been discussed in a pull request, but Mistral team are not ready to align with community, it seems. Oh well, another mistake.fish312@reddit
Worse news.
they added it as a dependency so it's not possible to even convert any other model without mistral common installed ever since https://github.com/ggml-org/llama.cpp/pull/14737 was merged!
Please make your displeasure known as this kind of favoritism behaviour can lead to the degradation of FOSS projects.
dobomex761604@reddit
In this PR https://github.com/ggml-org/llama.cpp/pull/15420 they discussed it deeper with llama.cpp team. You can also see TheLocalDrummer's issues working with it, and even discussion of the message Mistral have put into the model description. This is how companies fake opensource support.
ttkciar@reddit
Thanks for that link. It looks like the Mistral team is at least willing to be flexible, and comply to the llama.cpp vision.
Regarding MaggotHate's comment there earlier today, I too am a frequent user of
llama-cli
so look forward to a resolution.dobomex761604@reddit
Like TheLocalDrummer has pointed out in that same pullrequest,
mistral-common
is now required to covert Mistral models. I don't think moves like that can be called "flexible".silenceimpaired@reddit
I don’t understand this concern. What are they doing?
dobomex761604@reddit
They essentially don't want to write the prompt format; they don't want to include it into metadata either, and instead want everyone to use their library. This instantly cuts off a number of testing tools and, potentially, third-party clients.
ForsookComparison@reddit
I love Mistral but my crazy conspiracy theory that someone at that company is truly banking on regulators to declare them as "the EU compliant model" is creeping into not-crazy territory. You don't do stuff like this if you don't expect there to be some artificial moat in your favor.
ttkciar@reddit
From my perspective, it looks like the industry is figuring out that chat really needs a protocol, not a template, and the transition from one to the other is rough.
OpenAI's Harmony "response format" is also more of a protocol than template.
We should expect that evolution to continue, I think.
dobomex761604@reddit
The industry of Large Language Models that are based on Natural Language Processing is forgetting what Natural Language means and forces programming onto chat templates - that's what's happening, and it's very unfortunate.
Final_Wheel_7486@reddit
Maybe they're talking about model architecture or, less likely, the chat template I'd guess, but no idea tbh
pvp239@reddit
Hey,
Mistral employee here! Just a note on mistral-common and llama.cpp.
As written in the model card: https://huggingface.co/mistralai/Magistral-Small-2509-GGUF#usage
fish312@reddit
You do need it to convert the model. Ever since https://github.com/ggml-org/llama.cpp/pull/14737 was merged it's now a dependency since the import does not fallback gracefully and the convert script will crash if mistral-common is not installed
dobomex761604@reddit
Hi! In this case, why don't you provide the template? What exactly prevents you from giving us both the template and still recommend
mistral-common
? For now, you leave community without an option.How about you go and read this comment by TheDrummer.
The model card description makes it look the opposite.
pvp239@reddit
If you want to use checkpoint with mistral_common you can use unsloth‘s repo: https://huggingface.co/unsloth/Magistral-Small-2509-GGUF no? We link to it at the very top from the model card.
We don’t provide the chat template because we don’t have time to test it before releases and/or because the behavior is not yet supported.
We are worried that incorrect chat templates lead to people believing the checkpoint doesn’t work which happened a couple times in the past with Devstral e.g.
dobomex761604@reddit
Just to add to the whole conversation: I've just tested Magistral 2509, and while it's much better than the previous Magistral, the model is less stable than Mistral 3 (the first one) and all your previous models on the same local setup - Mistral 7, Mistral Nemo, Mistral Small 22b all work without issues.
It really seems like you actually should spend time on testing chat templates. Something changed since Small 3.1, go back to that setup, see what you've changed in your workflows. Of course, you don't have to believe me, my only job is to warn you that something is off, and it will continue to cause you problems in future unless fixed. We love your models, and we want them to be better, not worse.
cobbleplox@reddit
Did you mean without maybe?
Tekken is terrible enough btw, hard enough to have it as part of a solution with exchangable models as it is. An extra dependency is the last thing needed.
Regarding tekken, the worst thing about it is the restriction to message pairs and lack of the usual ways of setting system instructions. And if that's wrong, well one can read your entire guide about tekkenv3 without getting a proper example. Is it still impossible to even have the correct format in the text that goes into a standard tokenizer because they are protected? Sorry if I got that mixed up with some other format.
dobomex761604@reddit
The whole question of templates is huge; I still think that ChatML was a mistake because of strict "user-assistant" roles, and older Alpaca templates were more natural. In some ways Tekken could've solve this...but nope, no roles for you.
mikael110@reddit
It's true that models with wrong templates have been an issue in the past, and it can seriously impact the reputation of a model. But the best way to combat that is to provide the correct template yourself.
99% of people that user llama.cpp will not use mistral-common, period. That's simply not how people use llama.cpp. So I'd strongly argue that putting the resources you put into mistral-common into actually testing a regular chat template with the model would achieve far more if you actually want users to have a positive first impression of the model.
There's also community sentiment to take into account, as this very thread shows the llama.cpp community at large is not a fan of the mistral-common approach. That should be something you take into account.
dobomex761604@reddit
What do you mean by "the behavior is not yet supported" for the chat template of your own model?
mistral-common
is supposed to contain the same template, that how all instruct-tuned LLMs work.If you are worried about an incorrect chat templates, then provide a correct one! It's your model, how could you not know what is correct chat template and what is not?
You had https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md, which was useful - why not link it? By forcing
mistral-common
you avoid the issue, not fix it.a_beautiful_rhind@reddit
Don't understand this problem...
What am I missing here? Some kind of tokenization problem? [inst] become different values? Spaces are placed dynamically? Tool calls?
Could this not be done with a python script and output uploaded to HF? Would have been less work than trying to shoehorn python into llama.cpp Stuff is not rocket science.
dobomex761604@reddit
Not everyone uses python for llms.
a_beautiful_rhind@reddit
Right that's the point. But mistral common is a python package and some sample output could be used to craft a template to use anywhere.
Instead the company forces a python dependency into llama.cpp.
silenceimpaired@reddit
I am sure you don’t have the power to choose or comment but if you could pass along this idea I would appreciate it:
Mistral could release their base model for Medium without finetuning under Apache. Leave the fine tuned instruct behind API. I think it would serve hobbyists and Mistral. Businesses could see how much better a fine tune from Mistral would be via APi and hobbyists could create their own fine tunes… which typically include open data which Mistral could add to their closed API model.
There is a lot I like about Mistral models and want to see them thrive, but 24b compared against the model sizes Qwen releases I think reveals quite a wide gap in capability.
_bachrc@reddit
Any idea on how to make the custom think tags work with lm studio? :(
Iory1998@reddit
Go to the Model section, find your model, click on the gear icon next to it, and go to the model template. Scroll down, and you will find the default think tags. Change them there.
H3g3m0n@reddit
The GGUF isn't working for me with llama.cpp.
It ignores my prompt and outputs generic information about Mistral AI.
Using the following args:
My_Unbiased_Opinion@reddit
Mistral 3.2 2506 is my go to jack of all trades model. Used magistral before but it doesn't have proper vision support which I need. Also noticed it would go into repetition loops.
If that's fixed, I'm 100% switching to this. Mistral models are extremely versatile. No hate on Qwen, but these models are not one trick ponies.
alew3@reddit
how do you run it? I really like it, but tool calling is broken with vLLM unfortunately.
claytonkb@reddit
Same here -- what tools are folks running vision models locally with?
ThrowThrowThrowYourC@reddit
For me magistral 1.1 was my go to model. Really excited to give this a go, If the benchmark translate into real life results it seems pretty awesome
No_Conversation9561@reddit
wish they opened up medium
jacek2023@reddit (OP)
I believe medium is important for their business model
silenceimpaired@reddit
They could release the base model without fine tuning.
Odd-Ordinary-5922@reddit
if only it was moe :c
ttkciar@reddit
Some of us prefer dense models. MoE has its place and value, but it's nice to see not everyone has jumped on the MoE bandwagon.
Models in the 24B to 32B range, once quantized, are just the right size for 32GB VRAM systems.
jacek2023@reddit (OP)
It's small
Odd-Ordinary-5922@reddit
a model that can fit in a 4090 once quantized is not small bro
jacek2023@reddit (OP)
Why use 4090 if you could use two 3090s?
sleepingsysadmin@reddit
wow. epic. I cant wait for the unsloth conversion.
Small 1.2 is better than medium 1.1 by a fair amount? Amazing.
thetobesgeorge@reddit
Forgive my ignorance, what is the benefit of the Unsloth version?
And is there any special way to run it?
Every Unsloth version I’ve tried I’ve had issues with random gibberish coming out compared to the “vanilla” version, with all other settings being equal
Xamanthas@reddit
You posted this 4 minutes after daniel linked them himself in the comments 🤨
sleepingsysadmin@reddit
when i clicked the thread, there was no comments. I guess I spent a few minutes checking the links and typing my comment.
DinoAmino@reddit
Caching be like that. Happens all the time for me.
sleepingsysadmin@reddit
Luckily I said I cant wait, and I didnt have to wait because unsloth team is epic.
sleepingsysadmin@reddit
First benchmark test. It took a bit of time, it's only giving me 16 token/s. I'll have to tinker with the settingsbecause usually I get 40+ from devstral small.
But one shot result was a success. Impressive.
Cool-Chemical-5629@reddit
What did you one shot this time?
sleepingsysadmin@reddit
my personal private benchmark that cant be trained for. I certainly believe the livecodebench score.
My_Unbiased_Opinion@reddit
Unsloth is already up! Looks like they worked together behind the scenes.
sleepingsysadmin@reddit
That team is so great. Wierd, lm studio refused to see it until i specifically searched magistral 2509
Cool-Chemical-5629@reddit
Just copy & paste the whole model path from HF using that Copy button. That always works for me.
Qual_@reddit
oh ohohoh I'll test it with my battleslop benchmark :D
jacek2023@reddit (OP)
How does it work?
Qual_@reddit
It's a stupid variation of battleship but with cards, mana management etc. There is around 20 different cards ( simple shot from large area nukes, Intel gathering via satellites , defense stuff etc )
toothpastespiders@reddit
These kind of weird benchmarks are always my favorite. I think the further we get from a strict test x, test y, test z the better it often reflects the complexities of real world use. Or I could be totally off. But they're fun.
danielhanchen@reddit
We made dynamic Unsloth GGUFs and float8 dynamic versions for those interested!
Magistral GGUFs Magistral FP8 Magistral FP8 torchAO
Also free Kaggle fine-tuning notebook using 2x Tesla T4s and fine-tuning and inference guides are on our docs
IrisColt@reddit
Thanks!!!
danielhanchen@reddit
:)
mj_katzer@reddit
Nice :) Thank you. Any idea how much vram a 128 rank lora would need with 64k tokens context length?
danielhanchen@reddit
Oh good question uhhh QLoRA might need ~48GB maybe? LoRA will be much more.
Free-Internet1981@reddit
Goated
danielhanchen@reddit
:)
Gildarts777@reddit
Thank you a lot
danielhanchen@reddit
:)
Wemos_D1@reddit
Thank you !
danielhanchen@reddit
Thanks!
tomakorea@reddit
AWQ when?
danielhanchen@reddit
Actually I could do one!
Phaelon74@reddit
I dont think they do awq's, could be wrong tho.
bacocololo@reddit
Take care to not give your model before mistral next time :)
danielhanchen@reddit
haha :)
sleepingsysadmin@reddit
great work!
danielhanchen@reddit
Thanks!
ActivitySpare9399@reddit
Hey Dan,
You're bloody amazing, I don't know how you get so much done. Being both meticulous and efficient is incredibly rare. Thanks for all of your incredible work.
Some feedback if it's helpful. Could you briefly explain the difference between GGUF, Dynamic FP* and FP8 torchAO in the model cards. I had a look at the model cards but they don't mention why that format should be chosen or how it is different to the standard safetensor or gguf.
I read the guide and there's a tiny bit at the bottom: "Both are fantastic to deploy via vLLM. Read up on using TorchAO based FP8 quants in vLLM here" and I read that link, but still didn't make it clear if there was some benefit I should be taking advantage of or not. Some text in the model cards explaining why you offered that format and understand which to choose that would be amazing.
It also says "Unsloth Dynamic 2.0 achieves SOTA performance in model quantization." But this model isn't in the "Unsloth Dynamic 2.0 Quants" model list. As I understand it, you might not be updating that list for every model but they are all in fact UD 2.0 ggufs everywhere now?
Just wanted to clarify. Thanks again for your fantastic work. Endlessly appreciate how much you're doing for the local team.
danielhanchen@reddit
Thanks! So we're still experimenting with vLLM and TorchAO based quants - our goal mainly is to collaborate with everyone in the community to deliver the best quants :) The plan is to provide MXFP4 so float4 quants as well in the future.
For now both torchAO and vLLM type quants should be great!
Zestyclose-Ad-6147@reddit
GGUF wh… oh, there it is 😆
danielhanchen@reddit
:)
HollowInfinity@reddit
Hm I'm trying your 8-bit GGUF but the output doesn't seem to be wrapping the thinking in tags. The jinja template seems to have THINK in plaintext and according to the readme it should be a special token instead?
danielhanchen@reddit
Oh wait can you try with the flag
--special
when launching llama.cpp - since it's a special token, it won't be shown - using--special
will render it in llama.cpp, and I'm pretty sure it comes up - but best to confirm againHollowInfinity@reddit
Perfect, that was it! Thanks!
danielhanchen@reddit
:)
jacobpederson@reddit
You need to include the system prompt.
HollowInfinity@reddit
That seems already passed in via the --jinja argument + template since the thinking process does happen.
jacobpederson@reddit
Are the think tag's case sensitive? Aren't they usually lower case? It is working for me in lmstudio after changing the case of the tags.
Fair-Spring9113@reddit
goat
danielhanchen@reddit
Thanks!
jacek2023@reddit (OP)
damn you are quick
danielhanchen@reddit
:)
rm-rf-rm@reddit
why dont they release magistral medium?
Wemos_D1@reddit
For code, I did some small tests and I think devstral is still better along side qwen coder 30b, glm 32b and GPT oss 20b
Dont hesitate to post your feed back dear friends
silenceimpaired@reddit
I wish they would release their base model of Medium. Leave the fine tuned instruct behind API. I think it would serve hobbyists and them. Businesses could see how much better a fine tune from Mistral would be and hobbyists could create their own fine tunes… which typically include open data which Mistral could add to their closed API model.
a_beautiful_rhind@reddit
we're never getting miqu back.
toothpastespiders@reddit
Miqu really was the end of an era in a lot of ways.
silenceimpaired@reddit
I get that… but this isn’t that. This would just be their base model before they fine tune it. I’m holding out hope someone from the company will see my post and reconsider as I think it would benefit them. Chinese models continue to be released larger and with the same licensing. I think this would keep their company in focus.
That said you’re probably right.
a_beautiful_rhind@reddit
Unfortunately fewer and fewer companies release any base models at all. It's all instruct tuned to some extent.
silenceimpaired@reddit
Which is weird to me… I Guess there could be a safety element, but the special sauce of instruct seems like it has higher value. So for companies hesitant to give away their cash cow… it seems an elegant solution. You can point out how much better instruct is on your model compared to the base model.
brown2green@reddit
Nowadays the final Instruct models aren't simply base models with some instruction finetuning that hobbyists can easily compete with. The final training phase (post-training) for SOTA models can be very extensive. Just releasing a base model that almost nobody can hope to turn useful probably wouldn't look good.
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
markole@reddit
What are your llama.cpp flags to use with this one?
TheLocalDrummer@reddit
Oh wow
Artistic_Composer825@reddit
I hear your L40s from here
Background-Ad-5398@reddit
awesome, I like the tone of mistrals model for small local, only 27b gemma3 is as easy to talk to compared to intelligence, qwen is not a chat bot
Ill_Barber8709@reddit
So Small 1.2 is now better than Medium 1.1 ? That's crazy impressive. Glad to see my fellow Frenchies continue to deliver! Now I'm waiting for MLX and support in LM Studio. Let's hope it won't take too much time.
beedunc@reddit
And the crowd went… mild.
PermanentLiminality@reddit
I was looking for a vision model like this one.
Substantial-Dig-8766@reddit
noooooo reasoning nooooooooo noooooooo stop this aaaaaaa
alew3@reddit
vLLM implementation of tool calling with Mistral models are broken, any chance they could be fixed?
igorwarzocha@reddit
"Small" \^_\^
[insert a sexist joke]
(still downloads it)
some_user_2021@reddit
I hope it has a small PP
bymihaj@reddit
Magistral Small 1.2 is just better then Magistral Medium 1.0 ...
jacek2023@reddit (OP)
to be honest it's hard to trust benchmarks now
FlamaVadim@reddit
true 😢
Cool-Chemical-5629@reddit
Agreed, heck I'm getting anxiety just from seeing the benchmarks claiming that small model X is better than a big model Y. Just sheer experience from the endless chains of disappointments drove me to conclusion that such claims should be always seen as a red flag. I love Mistral models, so I'm hoping this one to be a different story.
unsolved-problems@reddit
Yeah, measuring performance is among the biggest open questions in ML ecosystem. It's so easy to trick benchmarks (overfitting), and also in my experience somehow terrific models can perform very average.
S1M0N38@reddit
let's appreciate the consistent naming scheme used by Mistral
NoFudge4700@reddit
Nice