Mistral Small 3 24b is the first model under 70b I’ve seen pass the “apple” test (even using Q4).
Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 51 comments
I put all the Deepseek-R1 distills through the “apple” benchmark last week and only 70b passed the “Write 10 sentences that end with the word “apple” “ test, getting all 10 out of10 sentences correct.
I tested a slew of other newer open source models (all the major ones, Qwen, Phi-, Llama, Gemma, Command-R, etc) as well, but no model under 70b has ever managed to succeed in getting all 10 right….until Mistral Small 3 24b came along.
It is the first and only model under 70b parameters that I’ve found that could pass this test. Congrats Mistral Team!!
51 Comments
Still_Potato_415@reddit
Sky_Linx@reddit
EmergencyLetter135@reddit
ds_nlp_practioner@reddit
EmergencyLetter135@reddit
rhinodevil@reddit
drifter_VR@reddit
rhinodevil@reddit
EmergencyLetter135@reddit
BraceletGrolf@reddit
Flashy_Management962@reddit
ds_nlp_practioner@reddit
Zenobody@reddit
Worth-Product-5545@reddit
pkmxtw@reddit
Many_SuchCases@reddit
jeremyckahn@reddit
NeedleworkerDeer@reddit
GrungeWerX@reddit
-Ellary-@reddit
zekses@reddit
Positive_Click_8963@reddit
uti24@reddit
dubesor86@reddit
-Ellary-@reddit
AaronFeng47@reddit
drifter_VR@reddit
Admirable-Star7088@reddit
drifter_VR@reddit
Sea_Sympathy_495@reddit
uti24@reddit
Sea_Sympathy_495@reddit
onil_gova@reddit
drifter_VR@reddit
Sea_Sympathy_495@reddit
vyralsurfer@reddit
Hisma@reddit
YRUTROLLINGURSELF@reddit
Sea_Sympathy_495@reddit
Sea_Sympathy_495@reddit
beedunc@reddit
Sea_Sympathy_495@reddit
Porespellar@reddit (OP)
Sea_Sympathy_495@reddit
perturbe@reddit
AmericanKamikaze@reddit
Gloomy_MTTime420@reddit
rookan@reddit
overnightmare@reddit
Brilliant-Day2748@reddit
first2wood@reddit