[Ministral 3] Add ministral 3 - Pull Request #42498 · huggingface/transformers
Posted by bratao@reddit | LocalLLaMA | View on Reddit | 27 comments
Posted by bratao@reddit | LocalLLaMA | View on Reddit | 27 comments
Cool-Chemical-5629@reddit
Now wait a damn minute. Is this the reveal of Bert-Nebulon Alpha ? Because if it is, then I'm all in!
FlamaVadim@reddit
not possible ☹️
brown2green@reddit
It has 256k tokens context and vision support like the model on OpenRouter. That one also has "small model smell" in some aspects.
sschuhmann@reddit
In the PR the context window is mentioned 😉
brown2green@reddit
To me it appears like it's 16k native extended to 256k with YaRN scaling. https://github.com/huggingface/transformers/blob/2d4578a7e58c298a0398c297adaf73887aa36e5b/src/transformers/models/ministral3/configuration_ministral3.py
Cool-Chemical-5629@reddit
Why not? We have to believe. 🙂
dampflokfreund@reddit
Compare Bert's speed to Ministral 8B. Bert is way slower on OpenRouter, so it's a much bigger model.
Cool-Chemical-5629@reddit
Sure, but whatever Ministral 3 is, it's a new architecture, because they are adding support for it to Transformers. If this was based on the original Ministral 8B, they wouldn't need to touch Transformers to support it, right? But here they are, removing 9 lines, adding 1403 new lines of code. This is not our old Ministral architecture, so the performance of the model is yet to be disclosed.
To be fair, when I said Bert-Nebulon Alpha could be this Ministral 3, it is just my wishful thinking and there could really be a bigger model, maybe one that is using the same new architecture as this Ministral 3, but we don't know that for sure. What I do know though, last time I tried this Bert-Nebulon Alpha, it had some serious flaws in its logic which I commented about in other threads in this sub and I made comparison to Mistral Small 3.2 at some point with conclusion that if this Bert-Nebulon Alpha is indeed a new Mistral model, its logic is weaker than that of Mistral Small 3.2, but then again maybe they fixed it in the meantime. If not, it would make more sense that it is truly that 8B model and for that size, it would be really smart one.
In the meantime, someone knowledgeable in this could take a look and figure out what kind of architecture it really is from the code, I'm sure it would be appreciated.
Hoblywobblesworth@reddit
Haven't looked in detail at the code additions yet, but the comments on the PR suggest it's not a major architecture update beyond a minor change to the RoPE implementation.
Cool-Chemical-5629@reddit
Minor change with 1403 additions? Hmm, okay.
Hoblywobblesworth@reddit
Comment on the PR from ~6hrs ago:
"Out of interest: if the only difference here is that the attn layer now supports L4-style rope extension, why was a whole new arch made instead of extending the regular Mistral LM arch with L4 rope support?"
rerri@reddit
I'm wondering if the upcoming Flux.2 Klein will use this as a text encoder. The image model was said to be size-distilled, so maybe a smaller text encoder would make sense too.
Zestyclose-Ad-6147@reddit
Ministral? Whats that? Did I miss something?
hainesk@reddit
It's Mistral for edge computing, 8b model.
mpasila@reddit
The code also mentioned a 3B model so there might be more.
sourceholder@reddit
I hope you're joking.
Clear-Ad-9312@reddit
I guess just an update to the previous ministral that is a small performance model in the 8B range. seems to skip 2 straight to just 3. maybe to line up with the current model names? idk its odd
No_Conversation9561@reddit
I’m waiting for a 100B+ open model from mistral
misterflyer@reddit
8x22B V2
ResidentPositive4122@reddit
Huh? Aren't they at mistral small 3.2 and mistral medium 3.1 already?
youcef0w0@reddit
id you read the pr, it's an upcoming 8B model
it's gonna have a base, instruct, and thinking variants
random-tomato@reddit
dayum
Klutzy-Snow8016@reddit
Their last ministral was an 8b model. Maybe they're updating that.
jacek2023@reddit
Nice to see new Mistral but I will be patiently waiting for something bigger than 24B
lacerating_aura@reddit
Yeah. Strange that Mistral were among the first to explore MoE models but have been really quite now.
brown2green@reddit
Mistral Medium 3.x is probably a MoE model, but it's API only.
guiopen@reddit
Apache license! Very excited for this