I love Mistral but...
Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 47 comments
Their new license sucks.
At this point, might as well use llama, and 3B without open weights?
Something has definitely changed, we are at the beginning of the AI era, and precedents matter.
I hope there will come a time where we could pretrain a good 20B model ourselves (distributed training is showing some promise with the 10B being trained now).
Monkey_1505@reddit
So it's non-commercial right? I'm okay with that. People can still tune it, we can still use it, you just can't make money from it, correct?
CabinetMain3163@reddit
how is this good? So only people with cutting age HW can use? I want to use with inference provider
Monkey_1505@reddit
They make small models and moe's you can run on a fairly standard graphics card.
They probably have their own API, if you prefer cloud.
CabinetMain3163@reddit
their cloud will not host modified models which is whole point of releasing them 'open source'. Also they are back to apache so wtf was this decision back then.
Monkey_1505@reddit
Wanting to specifically run community fine tunes of a model on a cloud provider huh?
Not a common opportunity regardless of license, IME.
CabinetMain3163@reddit
It makes the most sense? Lot of those models needs to be ablated to be useful
Monkey_1505@reddit
Inference providers tend to just offer the base post-trained model. Even the ones that do provide decensored or fine tuned models, don't offer them from many labs.
Like you can want that, ofc, but it's just not a very common offering.
Sicarius_The_First@reddit (OP)
Yes, but it's not just that, it's the whole fill in this, fill in that, accept this, accept your name and details to be saved in our database... etc... pad precedent.
he77789@reddit
Actually, not really. INTELLECT-1, presumably the 10b model you are mentioning, isn't as distributed as you think. They haven't really figured out how to let untrusted nodes take part yet, so you can't quite just let your home PC help for now. This is mentioned in their Next Steps section: https://www.primeintellect.ai/blog/intellect-1
Also, in their "Contribute Compute" page (https://docs.primeintellect.ai/tutorials-decentralized-training/contribute-compute), it says "Decentralized training of INTELLECT-1 currently requires 8x H100 SXM5 GPUs." That is not exactly what I would call a home PC.
So, I don't think we are really that close to being able to train models with everyone's PCs like BOINC or Folding@home.
Scary_Low9184@reddit
Paid weights soon, just you wait.
a_beautiful_rhind@reddit
if it was cool and uncensored sure.. if it's guardrailed and full of slop, keep it.
a_slay_nub@reddit
That's already the case, they offer a commercial license. It's expensive as hell and not worth it though IMO. 200k/year to run it on 8 A100.
shokuninstudio@reddit
If they're going you something for free then you're the product, they say.
Something like this is happening with generative imaging models. They give lighter and less accurate models away for free and then sell the pro version which can only be run on a server. Future updates to the models become completely commercial. The early versions were just tasters.
Klutzy-Smile-9839@reddit
People develop proof of concepts with free models, then they can scale with Pro licences. This is a good way to maintain their business
Sicarius_The_First@reddit (OP)
Oh it definitely going to that direction... I mean I love Mistral, I really do, but this sharp change in attitude... it stings.
EastSignificance9744@reddit
we already had this moral outrage a couple months ago and they've made many awesome models since
Odd-Environment-7193@reddit
What are you people basing this on? Have you seen their range of models?
They have a whole range for you to pick and choose from.
They also give you 150$ a month to rip as you see fit on their platform.
Your assertions are not based on reality. Furthermore, they have a right to turn a profit and remain sustainable. Posts like these come across as really entitled and probably belong under u/choosingbeggars
Patience.
hapliniste@reddit
This is what closed licences are. When they said "for local install contact us" did you think it's to share a drink?
Sicarius_The_First@reddit (OP)
Exactly.
llama_ques@reddit
go with gpt-4o. It is far better.
MoffKalast@reddit
Even if that wasn't the case, Mistral has an instruct problem right now. They've settled on a template that doesn't work for anything but the core basics.
That wasn't much of a problem while everyone else was also making similar errors and Hermes/Dolphin tunes could correct it, but these days Qwen uses chatml and Meta uses their version of it and they both throw a few magnitudes more compute at actually good instruct tuning, making these base model trained community alternatives entirely unviable in comparison. Pretty much everyone trains on top of the instruct tunes these days which used to be an entirely dumb idea, but now it's the only real option to stay in the game. And Mistral's instruct tunes are dogshit.
Feztopia@reddit
According to their reported numbers the instruct version of Ministral 3b beats llama 3.1 8b instruct at 4 out of 6 benchmarks. Yes benchmarks aren't everything but that's still impressive if true.
MoffKalast@reddit
And Phi beats everything on benchmarks. Funny how nobody is actually using it eh?
Feztopia@reddit
You are comparing apples to donkeys, phi was just trained on synthetic textbooks it's not meant to be a general model which isn't the case for Mistral. We were talking about instruct tuning, not base models, it's awesome how you forgot the topic yourself started and now you talk about phi and benchmark despite me being the one saying that benchmarks aren't everything.
Sicarius_The_First@reddit (OP)
Gemini 8B? No weights, Mistral 3B? No weights. Sorry I'm sad, it's allowed, right? :C
polikles@reddit
if by allowed you mean "legal", then it's allowed. Sure it sucks to get lesser possibilities, but Mistral, just like any other company funded with VC money, needs to monetize their stuff somehow. And giving away early version for free to gain market share, and start charging for more advanced ones, is quite common strategy
It also upsets me a bit, but what can we do? It's harsh reality
Icy_Advisor_3508@reddit
Yeah, the new licensing rules have made things tricky, especially for those who want more flexibility with their models. LLaMA is definitely a solid option for open weights right now, and with the way distributed training is evolving, we might see communities pretraining larger models like 20B before long.
Maykey@reddit
Not the first time. It took time for mistral to become open
No_Afternoon_4260@reddit
Haven t read their last licence, what about it?
eNB256@reddit
Not a lawyer. From a quick look, the MRL-0.1 license only allows (research if it's not connected with money in any way) so it's implied that stuff like - benchmarking at home - (perhaps) having it help you understand your homework better (if any) - learning how to do new fine tuning methods at home, and sharing the derivative (with the rules followed) might be okay,
though stuff like causing it to be hosted (on a rented GPU or otherwise) / other personal uses might not be okay (without receiving additional permission)
arousedsquirel@reddit
Models will be tried to be monopolized but, and here we are, as a community we'll find a workaround. Every huge capacity model can be installed as trainer for ur use case 'smaller' models. Between learning and scraping is a difference. And what if I scrape model x output like everyone did if you only look at ai generated content. Deduction of reasoning yes? We can train a model on that. So much competent training possibilities to get open licensed models on track of wishful behavior without breaking any circumvents... We look at the papers, we evaluate the implementations and we start integrating? But yes there is a possibility IPs will start ruling the game. It depends on the behavior of open source communities and what they in return could provide. Give and take
Feztopia@reddit
According to their reported numbers the instruct version of Ministral 3b beats llama 3.1 8b instruct at 4 out of 6 benchmarks. Yes benchmarks aren't everything but that's still impressive if true.
Downtown-Case-1755@reddit
Or better yet, continue train a permissively licensed model like Qwen 32B?
It seems kinda stupid that everyone has to start from scratch with every model release, especially when the architecture isn't changing anyway.
Mysterious-Rent7233@reddit
The architecture isn't changing but the training techniques are, and those are reflected in the deep layer weights.
Downtown-Case-1755@reddit
Then apply the new techniques to the existing pretrain!
Part of it is definitely pride. It would (for instance) look "bad" for Mistral if they continue trained a Alibaba model, and vice versa.
bobby-chan@reddit
Did it look bad for Microsoft when (before the partnership) they released wizardlm?
On the other hand, since Mistral's entire business is around AI, it might be a bit different, compared to Microsoft or Alibaba...
Downtown-Case-1755@reddit
Apparently so, as wizardlm is dead lol.
It doesn't apply to everyone. Nvidia, for instance, is totally fine using other models as starting points for usable experiments, but this is exactly what I was thinking. It would "appear" more awkward for a company like Mistral, Deepseek to start with another model, as that's ostensibly their product.
This is of course nonsense, but I can see why they'd think that way.
Many_SuchCases@reddit
You sound super entitled. The fact that we even have llama is huge. Just use the new model but not for business, its that simple. That's how most commercial industries works.
brotie@reddit
Sanest comment in this thread. You think AI companies owe you anything for free? Meta is one of the most valuable companies in the world and one of the only that could bankroll what they’re doing and it was a fucking leak that started this crazy journey.
Downtown-Case-1755@reddit
Are any of them getting money from governments? If so, it kinda makes sense to publish some of their work.
IIRC Chinese AI companies like Alibaba and Deepseek are getting subsidized, but I can't find anything similar for Mistral even through I remember reading about it.
FishermanEuphoric687@reddit
Not as hopeful pre training 20B model unless we've a substantial GPU cluster and a significant six-figure investment. Training these models costs a lot, it's more likely we'll have more hybrid license models in future.
I'd say focus more on high quality datasets since they can be just as having the model itself. Distillation, parse etc.
PrinceOfLeon@reddit
Well the biggest models are the most accurate and expensive to train and you'll need expensive equipment to run them anyway, so they may as well move away from the permissive licenses and make their money.
But also the smallest models are the most valuable because companies want on-device models that are capable, so they may as well move away from the permissive licensed and make their money.
And of course there's all these EU regulations to follow, so they may as well move away from the a permissive licenses and...
ortegaalfredo@reddit
Mistral is great, and Mistral-Large still is the best model for a lot of tasks, but its still a small company compared with giants like OpenAI and Meta. They need to make a buck somehow, and not even OpenAI is turning a profit yet.
Everlier@reddit
All the companies seem to be doing that. There's money in licensing edge models to Microsoft and Apple, whomever wins might have profits from their model (finally).
Dark_Fire_12@reddit
I suspect they have smaller models like a 0.5 that they will give away.
3B seems like a sweet spot, even Qwen didn't give that one away for free, it's non commercial.
Mistral is still open source just has to capture some value for the next funding round.
Last year was a bloodbath since every hosting company (fireworks, together) hosted their models and captured all the upside.
Sizes above 12B but below 100B might be open sourced but that doesn't serve us much.
Don't lose faith.
Sicarius_The_First@reddit (OP)
12B is edge device territory. Upcoming mobile devices are going to be orders of magnitude more powerful that what we used to have.
Calcidiol@reddit
What was the major change in the license, when did that happen?
And yes 100% agreed we shouldn't "allow" this technology to escape completely (from a practical standpoint) from having good free as in freedom / free as in beer "modern, competent" models that are open in all respects.
Naturally enough there will be companies that only want to do SAAS or profit seeking licensing B2B but beyond all that there should still be relevance of non-commercial ML models -- for academic purposes, for personal purposes, for independent non commercial open / trustworthy / verifiable research and development purposes, et. al.
So whatever "weight" the voices of individual hobbyists, professionals, researchers, students, SMBs might have to influence a positive outcome to keep some relevant aspects of AIML R&D in the open for the benefit of all I think it'd be nice to see it applied to help motivate such balance as opposed to having ML creation / free access almost wholly locked off behind industrial castle walls and use gated through "robber barons" without any alternative. The technologies are too new / important / promising to be basically monopolized and taken away from open sector progress.