Mistral Medium Is On The Way

[-]

LegacyRemaster@reddit

https://preview.redd.it/2xiugbos1zxg1.png?width=797&format=png&auto=webp&s=71fe6baa534d13d9bcd8723883fa3e2c66dfd92f so Medium 3.5 , small 4....

Reply

[-]

so maybe the "4" in Small 4 isn't about versioning at all, but refers to its 4-token context expansion or some internal training batch thing? if that's the case, could "Medium" actually be a step sideways instead of up?

Reply

[-]

unjustifiably_angry@reddit

Maybe it refers to the model's intelligence?

Reply

[-]

CryptoUsher@reddit

could be, though iirc the early leaks pointed more to training batch sizes. still, "medium" as sideways move makes sense if "4" isn't about scale at all.

Reply

[-]

unjustifiably_angry@reddit

I was being sarcastic. Mistral has gone severely downhill unfortunately.

Reply

[-]

CryptoUsher@reddit

could be, though i doubt they'd number intelligence like that. fwiw, i've seen some folks on hugging face say the numbers might just be arbitrary internal tags.

Reply

[-]

AdIllustrious436@reddit

The main version number tracks the base model. Medium 3.5 shares its base with Medium 3, 3.1 etc.(just RL'd on top and new vision encoder), while Small 4 is a brand new architecture succeeding Small 3.2.

Reply

[-]

Mickenfox@reddit

I don't see why they'd name their next model Medium 3.5 after releasing Small 4

Reply

[-]

AdIllustrious436@reddit

Because is the same base model than medium 3, 3.1 etc. The main version number is the base model version.

Reply

[-]

Acrobatic_Donkey5089@reddit

"small" is 120b nowadays...

Reply

[-]

seamonn@reddit

119*, 120b+ is medium

Reply

[-]

ApprehensiveAd3629@reddit

Waiting to see mistral 3.5 24b 🙏🙏

Reply

[-]

AdIllustrious436@reddit

Probably not anytime soon. The 24B was the base of the 3rd Small series, and they've since pivoted to a sparse architecture for the 4th.

Reply

[-]

t4a8945@reddit

Well I'm "content" for them, but every model I've tried from Mistral (cloud and local) have been dogshit compared to other open-weight models. Hopes aren't high.

Reply

[-]

lorddumpy@reddit

sorry for doubting you, the model is actually dogshit.

Reply

[-]

t4a8945@reddit

xD thanks for the feedback, I won't waste my time with it

Reply

[-]

lorddumpy@reddit

>Well I'm "content" for them, but every model I've tried from Mistral (cloud and local) have been dogshit compared to other open-weight models. It's crazy how hostile/passive-agressive this sub can be for OSS releases. Especially when it isn't even out yet.

Reply

[-]

ayylmaonade@reddit

While the person you responded to was maybe a little aggressive in their phrasing, I'm not a fan of this rhetoric that simply because a model is open-weight, that criticism is off the table. And I know you're not saying that directly, but almost every time I see somebody comment this, it *really* gives that feeling. Like for me personally, I've been rooting for Mistral. I really liked their models from the OG Mistral 7B, Mixtral, and Mistral Small 3.1/3.2, but everything since has been rather disappointing. Small 4 being a good example - 120B-A6B that performs worse than 26-35B models like Qwen/Gemma, ofc, but even stuff like Nemotron-3-Nano and GPT-OSS.

Reply

[-]

lorddumpy@reddit

I never said criticism is off the table, but calling them "dogshit" without any points on why they are "dogshit" is kinda lame IMO. At least say they suck at coding or that they are slow. I love people being critical but leave out the toxicity

Reply

[-]

ayylmaonade@reddit

Oh I know, that's why I said I know you're not directly saying you're against criticism. Wasn't intended to be taken personally in any way, was just sharing my opinion. Apologies for any confusion!

Reply

[-]

lorddumpy@reddit

I just tested it out and yeah, it's not great. Something about it's tone is incredibly irritating to me, it didn't get my vibe, and answered a bunch of questions wrong. "Dogshit" is still a strong word but I definitely feel him more lol

Reply

[-]

lorddumpy@reddit

none taken, all good!

Reply

[-]

kerighan@reddit

Yes, well, it's hard to train models, and especially hard when you compete against pre-existing multi-billion dollar companies. Criticism is easy, but we should support them to fight as much as they can given the ultra competitiveness nature of the AI landscape.

Reply

[-]

rm-rf-rm@reddit

Chill, he isnt being hostile or passive-aggressive. He's just voicing the reality that their models are simply not competitive in the open weight space and I tend to agree. All the same, its important they keep releasing new models even if they're bit behind as we need diversity (in this case geographic, maybe political/cultural) in the space. And EU wants to build on them and they're amidst a sovereign tech push

Reply

[-]

t4a8945@reddit

Sorry, I'm still mad at the day I lost trying to run "small" 4, only to discover it was perfectly useless compared to other models of same size.

Reply

[-]

stddealer@reddit

It's still better than llam4 lol

Reply

[-]

Orolol@reddit

People are mad that free stuff isn't exactly perfect for them.

Reply

[-]

Septerium@reddit

Devstral Small 2 used to be my best sub-30GB partner for handling small tasks on Roo Code at the time it was released

Reply

[-]

SnooPaintings8639@reddit

You don't remember Mixtral, eh? The OG MoE that made me build an AI dedicated home PC. The rig was ready for Llama 3. Wonderful times.

Reply

[-]

rm-rf-rm@reddit

Yeah but those days are long gone, Mistral doesnt seem competitive now

Reply

[-]

Soft-Air5097@reddit

Medium-3.5 has just been dropped: [https://huggingface.co/mistralai/Mistral-Medium-3.5-128B](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B) Numbers indicate its performance sits between Sonnet 4.5 and Sonnet 4.6.

Reply

[-]

RegularRecipe6175@reddit

Waiting for Qwen 3.6 Coder 80b / 3.6 122b. No, I really hope Mistral Medium is good. I mean, those guys are French.

Reply

[-]

Majestical-psyche@reddit

I really wish we got a new mistral Nemo... That model was a beast for creative writing... It still is. Whatever they did with the new mistral 3 models, they absolutely suck for creative writing 😪

Reply

[-]

pigeon57434@reddit

i dont want to be a downer but can we be real for a sec and understand this model will perform worse than like qwen3.6-27b on every possibly metric

Reply

[-]

kaliku@reddit

What's that? mistral meh? I would 100% use mistral instead of any chinese models for my local shenanigans, sadly Qwen&Co. raised the bar so high...

Reply

[-]

seamonn@reddit

Hype! [I wished it into reality, haha.](https://www.reddit.com/r/LocalLLaMA/comments/1sw5mim/im_starvin/oid9y57/)

Reply

[-]

Few_Painter_5588@reddit (OP)

And it's 128B, so it can actually fit in consumer hardware!

Reply

[-]

SkyFeistyLlama8@reddit

Yeah I can run it at Q1 Mistral should make something that fits on 64GB unified RAM.

Reply

[-]

tarruda@reddit

So mistral small 4 was 119b and medium 3.5 is 128B?

Reply

[-]

Few_Painter_5588@reddit (OP)

Medium 3.5 probably has more active parameters, or it could even be a dense model.

Reply

[-]

AvocadoArray@reddit

A proper modern 128b dense model would absolutely shred. Inference speed would be slow on most consumer hardware, but MTP could help mitigate that.

Reply

[-]

PhotographerUSA@reddit

It's great for writing long stories.

Reply

[-]

Kathane37@reddit

Why the split between small and medium ? 3.5 screen disappointments

Reply

[-]

Few_Painter_5588@reddit (OP)

Mistral has three model categories, Large, Medium, Small and all three of them are on different architectures, so the numbers are not really compareable.

Reply

[-]

Kathane37@reddit

Come on. They push all the 3 at the same time. There is a roadmap that needs to be clear for their clients. OpenAI f*cked themselves for a full year because people thought that 4o > o3. If someone at Mistral has chosen to push a Small 4 then months later put a hard stop for the brand Medium 4 it is because something fishy happened during training.

Reply

[-]

somthing_tn@reddit

https://preview.redd.it/1jlr83kiqzxg1.jpeg?width=1080&format=pjpg&auto=webp&s=74f4aae51c454ca184933276db248ed920df620b

Reply

[-]

Technical-Earth-3254@reddit

What's the difference between eagle and non-eagle models? I saw Mistral 4 Small also having both, but I couldn't really get the difference.

Reply

[-]

AXYZE8@reddit

EAGLE is an addon for the main model, it's specialized model for speculative decoding which boosts single user inference by a huge margin. You can learn more about it here [https://arxiv.org/abs/2401.15077](https://arxiv.org/abs/2401.15077)

Reply

[-]

DinoAmino@reddit

It boosts code generation for sure. But the 2x perf gains will be destroyed by as much as 0.5x perf on non-code text generation. At least that's been my limited experience.

Reply

[-]