Mistral Medium Is On The Way
Posted by Few_Painter_5588@reddit | LocalLLaMA | View on Reddit | 53 comments
Interestingly enough, Mistral Small is written as Mistral-Small-4-119B-2603. Their medium model will have 128B paramters. Either it will be a dense model, or a less sparse MoE than Mistral Small
LegacyRemaster@reddit
so Medium 3.5 , small 4....
CryptoUsher@reddit
so maybe the "4" in Small 4 isn't about versioning at all, but refers to its 4-token context expansion or some internal training batch thing?
if that's the case, could "Medium" actually be a step sideways instead of up?
AdIllustrious436@reddit
The main version number tracks the base model. Medium 3.5 shares its base with Medium 3, 3.1 etc.(just RL'd on top and new vision encoder), while Small 4 is a brand new architecture succeeding Small 3.2.
unjustifiably_angry@reddit
Maybe it refers to the model's intelligence?
CryptoUsher@reddit
could be, though iirc the early leaks pointed more to training batch sizes. still, "medium" as sideways move makes sense if "4" isn't about scale at all.
CryptoUsher@reddit
could be, though i doubt they'd number intelligence like that. fwiw, i've seen some folks on hugging face say the numbers might just be arbitrary internal tags.
Mickenfox@reddit
I don't see why they'd name their next model Medium 3.5 after releasing Small 4
AdIllustrious436@reddit
Because is the same base model than medium 3, 3.1 etc. The main version number is the base model version.
Acrobatic_Donkey5089@reddit
"small" is 120b nowadays...
seamonn@reddit
119*, 120b+ is medium
ApprehensiveAd3629@reddit
Waiting to see mistral 3.5 24b 🙏🙏
AdIllustrious436@reddit
Probably not anytime soon. The 24B was the base of the 3rd Small series, and they've since pivoted to a sparse architecture for the 4th.
t4a8945@reddit
Well I'm "content" for them, but every model I've tried from Mistral (cloud and local) have been dogshit compared to other open-weight models.
Hopes aren't high.
lorddumpy@reddit
sorry for doubting you, the model is actually dogshit.
t4a8945@reddit
xD thanks for the feedback, I won't waste my time with it
lorddumpy@reddit
It's crazy how hostile/passive-agressive this sub can be for OSS releases. Especially when it isn't even out yet.
ayylmaonade@reddit
While the person you responded to was maybe a little aggressive in their phrasing, I'm not a fan of this rhetoric that simply because a model is open-weight, that criticism is off the table. And I know you're not saying that directly, but almost every time I see somebody comment this, it really gives that feeling.
Like for me personally, I've been rooting for Mistral. I really liked their models from the OG Mistral 7B, Mixtral, and Mistral Small 3.1/3.2, but everything since has been rather disappointing. Small 4 being a good example - 120B-A6B that performs worse than 26-35B models like Qwen/Gemma, ofc, but even stuff like Nemotron-3-Nano and GPT-OSS.
lorddumpy@reddit
I never said criticism is off the table, but calling them "dogshit" without any points on why they are "dogshit" is kinda lame IMO. At least say they suck at coding or that they are slow.
I love people being critical but leave out the toxicity
ayylmaonade@reddit
Oh I know, that's why I said I know you're not directly saying you're against criticism. Wasn't intended to be taken personally in any way, was just sharing my opinion. Apologies for any confusion!
lorddumpy@reddit
I just tested it out and yeah, it's not great. Something about it's tone is incredibly irritating to me, it didn't get my vibe, and answered a bunch of questions wrong. "Dogshit" is still a strong word but I definitely feel him more lol
lorddumpy@reddit
none taken, all good!
kerighan@reddit
Yes, well, it's hard to train models, and especially hard when you compete against pre-existing multi-billion dollar companies. Criticism is easy, but we should support them to fight as much as they can given the ultra competitiveness nature of the AI landscape.
rm-rf-rm@reddit
Chill, he isnt being hostile or passive-aggressive. He's just voicing the reality that their models are simply not competitive in the open weight space and I tend to agree.
All the same, its important they keep releasing new models even if they're bit behind as we need diversity (in this case geographic, maybe political/cultural) in the space. And EU wants to build on them and they're amidst a sovereign tech push
t4a8945@reddit
Sorry, I'm still mad at the day I lost trying to run "small" 4, only to discover it was perfectly useless compared to other models of same size.
stddealer@reddit
It's still better than llam4 lol
Orolol@reddit
People are mad that free stuff isn't exactly perfect for them.
Septerium@reddit
Devstral Small 2 used to be my best sub-30GB partner for handling small tasks on Roo Code at the time it was released
SnooPaintings8639@reddit
You don't remember Mixtral, eh? The OG MoE that made me build an AI dedicated home PC. The rig was ready for Llama 3. Wonderful times.
rm-rf-rm@reddit
Yeah but those days are long gone, Mistral doesnt seem competitive now
Soft-Air5097@reddit
Medium-3.5 has just been dropped: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
Numbers indicate its performance sits between Sonnet 4.5 and Sonnet 4.6.
RegularRecipe6175@reddit
Waiting for Qwen 3.6 Coder 80b / 3.6 122b. No, I really hope Mistral Medium is good. I mean, those guys are French.
Majestical-psyche@reddit
I really wish we got a new mistral Nemo... That model was a beast for creative writing... It still is. Whatever they did with the new mistral 3 models, they absolutely suck for creative writing 😪
pigeon57434@reddit
i dont want to be a downer but can we be real for a sec and understand this model will perform worse than like qwen3.6-27b on every possibly metric
kaliku@reddit
What's that? mistral meh?
I would 100% use mistral instead of any chinese models for my local shenanigans, sadly Qwen&Co. raised the bar so high...
seamonn@reddit
Hype!
I wished it into reality, haha.
Few_Painter_5588@reddit (OP)
And it's 128B, so it can actually fit in consumer hardware!
SkyFeistyLlama8@reddit
Yeah I can run it at Q1
Mistral should make something that fits on 64GB unified RAM.
tarruda@reddit
So mistral small 4 was 119b and medium 3.5 is 128B?
Few_Painter_5588@reddit (OP)
Medium 3.5 probably has more active parameters, or it could even be a dense model.
AvocadoArray@reddit
A proper modern 128b dense model would absolutely shred. Inference speed would be slow on most consumer hardware, but MTP could help mitigate that.
PhotographerUSA@reddit
It's great for writing long stories.
Kathane37@reddit
Why the split between small and medium ? 3.5 screen disappointments
Few_Painter_5588@reddit (OP)
Mistral has three model categories, Large, Medium, Small and all three of them are on different architectures, so the numbers are not really compareable.
Kathane37@reddit
Come on. They push all the 3 at the same time. There is a roadmap that needs to be clear for their clients. OpenAI f*cked themselves for a full year because people thought that 4o > o3. If someone at Mistral has chosen to push a Small 4 then months later put a hard stop for the brand Medium 4 it is because something fishy happened during training.
somthing_tn@reddit
Technical-Earth-3254@reddit
What's the difference between eagle and non-eagle models? I saw Mistral 4 Small also having both, but I couldn't really get the difference.
AXYZE8@reddit
EAGLE is an addon for the main model, it's specialized model for speculative decoding which boosts single user inference by a huge margin.
You can learn more about it here https://arxiv.org/abs/2401.15077
DinoAmino@reddit
It boosts code generation for sure. But the 2x perf gains will be destroyed by as much as 0.5x perf on non-code text generation. At least that's been my limited experience.
jacek2023@reddit
Could you share a link, what is this code?
Few_Painter_5588@reddit (OP)
My bad, it's a new VLLM PR: https://github.com/vllm-project/vllm/pull/41024/files
SnooPaintings8639@reddit
For 120b model I'd prefer PR for llama.cpp, vllm requires full gpu offloading :(
jacek2023@reddit
thanks!
fizzy1242@reddit
maybe this time they'll get it right