TheaterFire

Mistral Medium Is On The Way

Posted by Few_Painter_5588@reddit | LocalLLaMA | View on Reddit | 54 comments

Mistral Medium Is On The Way
Interestingly enough, Mistral Small is written as Mistral-Small-4-119B-2603. Their medium model will have 128B paramters. Either it will be a dense model, or a less sparse MoE than Mistral Small

Reply to Post

54 Comments

LegacyRemaster@reddit

https://preview.redd.it/2xiugbos1zxg1.png?width=797&format=png&auto=webp&s=71fe6baa534d13d9bcd8723883fa3e2c66dfd92f so Medium 3.5 , small 4....
View on Reddit #84707441

CryptoUsher@reddit

so maybe the "4" in Small 4 isn't about versioning at all, but refers to its 4-token context expansion or some internal training batch thing? if that's the case, could "Medium" actually be a step sideways instead of up?
View on Reddit #84735539

unjustifiably_angry@reddit

Maybe it refers to the model's intelligence?
View on Reddit #84747907

CryptoUsher@reddit

could be, though iirc the early leaks pointed more to training batch sizes. still, "medium" as sideways move makes sense if "4" isn't about scale at all.
View on Reddit #84789348

unjustifiably_angry@reddit

I was being sarcastic. Mistral has gone severely downhill unfortunately.
View on Reddit #85163355

CryptoUsher@reddit

could be, though i doubt they'd number intelligence like that. fwiw, i've seen some folks on hugging face say the numbers might just be arbitrary internal tags.
View on Reddit #84777663

AdIllustrious436@reddit

The main version number tracks the base model. Medium 3.5 shares its base with Medium 3, 3.1 etc.(just RL'd on top and new vision encoder), while Small 4 is a brand new architecture succeeding Small 3.2.
View on Reddit #84868331

Mickenfox@reddit

I don't see why they'd name their next model Medium 3.5 after releasing Small 4
View on Reddit #84716187

AdIllustrious436@reddit

Because is the same base model than medium 3, 3.1 etc. The main version number is the base model version.
View on Reddit #84868128

Acrobatic_Donkey5089@reddit

"small" is 120b nowadays...
View on Reddit #84708599

seamonn@reddit

119*, 120b+ is medium
View on Reddit #84710884

ApprehensiveAd3629@reddit

Waiting to see mistral 3.5 24b 🙏🙏
View on Reddit #84707211

AdIllustrious436@reddit

Probably not anytime soon. The 24B was the base of the 3rd Small series, and they've since pivoted to a sparse architecture for the 4th.
View on Reddit #84867788

t4a8945@reddit

Well I'm "content" for them, but every model I've tried from Mistral (cloud and local) have been dogshit compared to other open-weight models. Hopes aren't high.
View on Reddit #84707056

lorddumpy@reddit

sorry for doubting you, the model is actually dogshit.
View on Reddit #84796874

t4a8945@reddit

xD thanks for the feedback, I won't waste my time with it
View on Reddit #84797411

lorddumpy@reddit

>Well I'm "content" for them, but every model I've tried from Mistral (cloud and local) have been dogshit compared to other open-weight models. It's crazy how hostile/passive-agressive this sub can be for OSS releases. Especially when it isn't even out yet.
View on Reddit #84712172

ayylmaonade@reddit

While the person you responded to was maybe a little aggressive in their phrasing, I'm not a fan of this rhetoric that simply because a model is open-weight, that criticism is off the table. And I know you're not saying that directly, but almost every time I see somebody comment this, it *really* gives that feeling. Like for me personally, I've been rooting for Mistral. I really liked their models from the OG Mistral 7B, Mixtral, and Mistral Small 3.1/3.2, but everything since has been rather disappointing. Small 4 being a good example - 120B-A6B that performs worse than 26-35B models like Qwen/Gemma, ofc, but even stuff like Nemotron-3-Nano and GPT-OSS.
View on Reddit #84724538

lorddumpy@reddit

I never said criticism is off the table, but calling them "dogshit" without any points on why they are "dogshit" is kinda lame IMO. At least say they suck at coding or that they are slow. I love people being critical but leave out the toxicity
View on Reddit #84728905

ayylmaonade@reddit

Oh I know, that's why I said I know you're not directly saying you're against criticism. Wasn't intended to be taken personally in any way, was just sharing my opinion. Apologies for any confusion!
View on Reddit #84784892

lorddumpy@reddit

I just tested it out and yeah, it's not great. Something about it's tone is incredibly irritating to me, it didn't get my vibe, and answered a bunch of questions wrong. "Dogshit" is still a strong word but I definitely feel him more lol
View on Reddit #84795462

lorddumpy@reddit

none taken, all good!
View on Reddit #84789183

kerighan@reddit

Yes, well, it's hard to train models, and especially hard when you compete against pre-existing multi-billion dollar companies. Criticism is easy, but we should support them to fight as much as they can given the ultra competitiveness nature of the AI landscape.
View on Reddit #84758836

rm-rf-rm@reddit

Chill, he isnt being hostile or passive-aggressive. He's just voicing the reality that their models are simply not competitive in the open weight space and I tend to agree. All the same, its important they keep releasing new models even if they're bit behind as we need diversity (in this case geographic, maybe political/cultural) in the space. And EU wants to build on them and they're amidst a sovereign tech push
View on Reddit #84738922

t4a8945@reddit

Sorry, I'm still mad at the day I lost trying to run "small" 4, only to discover it was perfectly useless compared to other models of same size.
View on Reddit #84714501

stddealer@reddit

It's still better than llam4 lol
View on Reddit #84722320

Orolol@reddit

People are mad that free stuff isn't exactly perfect for them.
View on Reddit #84717785

Septerium@reddit

Devstral Small 2 used to be my best sub-30GB partner for handling small tasks on Roo Code at the time it was released
View on Reddit #84742771

SnooPaintings8639@reddit

You don't remember Mixtral, eh? The OG MoE that made me build an AI dedicated home PC. The rig was ready for Llama 3. Wonderful times.
View on Reddit #84709457

rm-rf-rm@reddit

Yeah but those days are long gone, Mistral doesnt seem competitive now
View on Reddit #84738827

Soft-Air5097@reddit

Medium-3.5 has just been dropped: [https://huggingface.co/mistralai/Mistral-Medium-3.5-128B](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B) Numbers indicate its performance sits between Sonnet 4.5 and Sonnet 4.6.
View on Reddit #84783758

RegularRecipe6175@reddit

Waiting for Qwen 3.6 Coder 80b / 3.6 122b. No, I really hope Mistral Medium is good. I mean, those guys are French.
View on Reddit #84745953

Majestical-psyche@reddit

I really wish we got a new mistral Nemo... That model was a beast for creative writing... It still is. Whatever they did with the new mistral 3 models, they absolutely suck for creative writing 😪
View on Reddit #84745278

pigeon57434@reddit

i dont want to be a downer but can we be real for a sec and understand this model will perform worse than like qwen3.6-27b on every possibly metric
View on Reddit #84743968

kaliku@reddit

What's that? mistral meh? I would 100% use mistral instead of any chinese models for my local shenanigans, sadly Qwen&Co. raised the bar so high...
View on Reddit #84743611

seamonn@reddit

Hype! [I wished it into reality, haha.](https://www.reddit.com/r/LocalLLaMA/comments/1sw5mim/im_starvin/oid9y57/)
View on Reddit #84706549

Few_Painter_5588@reddit (OP)

And it's 128B, so it can actually fit in consumer hardware!
View on Reddit #84706745

SkyFeistyLlama8@reddit

Yeah I can run it at Q1 Mistral should make something that fits on 64GB unified RAM.
View on Reddit #84740940

tarruda@reddit

So mistral small 4 was 119b and medium 3.5 is 128B?
View on Reddit #84715329

Few_Painter_5588@reddit (OP)

Medium 3.5 probably has more active parameters, or it could even be a dense model.
View on Reddit #84716940

AvocadoArray@reddit

A proper modern 128b dense model would absolutely shred. Inference speed would be slow on most consumer hardware, but MTP could help mitigate that.
View on Reddit #84737324

PhotographerUSA@reddit

It's great for writing long stories.
View on Reddit #84736268

Kathane37@reddit

Why the split between small and medium ? 3.5 screen disappointments
View on Reddit #84717011

Few_Painter_5588@reddit (OP)

Mistral has three model categories, Large, Medium, Small and all three of them are on different architectures, so the numbers are not really compareable.
View on Reddit #84717658

Kathane37@reddit

Come on. They push all the 3 at the same time. There is a roadmap that needs to be clear for their clients. OpenAI f*cked themselves for a full year because people thought that 4o > o3. If someone at Mistral has chosen to push a Small 4 then months later put a hard stop for the brand Medium 4 it is because something fishy happened during training.
View on Reddit #84721690

somthing_tn@reddit

https://preview.redd.it/1jlr83kiqzxg1.jpeg?width=1080&format=pjpg&auto=webp&s=74f4aae51c454ca184933276db248ed920df620b
View on Reddit #84719127

Technical-Earth-3254@reddit

What's the difference between eagle and non-eagle models? I saw Mistral 4 Small also having both, but I couldn't really get the difference.
View on Reddit #84707685

AXYZE8@reddit

EAGLE is an addon for the main model, it's specialized model for speculative decoding which boosts single user inference by a huge margin. You can learn more about it here [https://arxiv.org/abs/2401.15077](https://arxiv.org/abs/2401.15077)
View on Reddit #84708087

DinoAmino@reddit

It boosts code generation for sure. But the 2x perf gains will be destroyed by as much as 0.5x perf on non-code text generation. At least that's been my limited experience.
View on Reddit #84717136

jacek2023@reddit

Could you share a link, what is this code?
View on Reddit #84706678

Few_Painter_5588@reddit (OP)

My bad, it's a new VLLM PR: [https://github.com/vllm-project/vllm/pull/41024/files](https://github.com/vllm-project/vllm/pull/41024/files)
View on Reddit #84707063

SnooPaintings8639@reddit

For 120b model I'd prefer PR for llama.cpp, vllm requires full gpu offloading :(
View on Reddit #84709659

jacek2023@reddit

thanks!
View on Reddit #84707150

fizzy1242@reddit

maybe this time they'll get it right
View on Reddit #84708563