Mistral Small 3 24b is the first model under 70b I’ve seen pass the “apple” test (even using Q4).

Posted by Porespellar@reddit | LocalLLaMA | View on Reddit | 51 comments

I put all the Deepseek-R1 distills through the “apple” benchmark last week and only 70b passed the “Write 10 sentences that end with the word “apple” “ test, getting all 10 out of10 sentences correct. I tested a slew of other newer open source models (all the major ones, Qwen, Phi-, Llama, Gemma, Command-R, etc) as well, but no model under 70b has ever managed to succeed in getting all 10 right….until Mistral Small 3 24b came along. It is the first and only model under 70b parameters that I’ve found that could pass this test. Congrats Mistral Team!!