TheaterFire

O3 mini level model vs BEST phone size model?

Posted by Conscious_Nobody9571@reddit | LocalLLaMA | View on Reddit | 15 comments

Guys switch your votes... we've got enough of reasoning models+ it's been a while somebody released a nice <5b model

Reply to Post

15 Comments

a_beautiful_rhind@reddit

Bold of you to assume it's "nice".
View on Reddit #48947050

More-Ad5919@reddit

Do we have reasoning models that can actually reason? Whats the best?
View on Reddit #48909041

Environmental-Metal9@reddit

I suppose that depends on how you define reasoning. We don’t have any models that “think” in latent space yet, but plenty of models reason pretty ok, as in predict some tokens, analyze the tokens, re-evaluate, then generate some more tokens based on that. o1, o3-mini, DeepSeek r1, Qwen QwQ are all pretty decent at reasoning by the later definition (although the quality of final answers vary quite a lot from model to model, with R1 being the better in my very personal opinion, not based on benchmarks just real use cases) and are most definitely not “thinking” models by the former definition. If you have an alternative definition of reasoning, mind sharing with us? It’s very possible that there aren’t any models that reason if your definition is different.
View on Reddit #48910109

More-Ad5919@reddit

I also give a damn about benchmarks. I always test them with logic puzzles, and I am not very impressed whatever I use. I am not sure what the difference should be between reasoning and thinking. I always start with: Give me a hard logic puzzle that has a definitive answer. While diving into that, I can't really spot any thinking or reasoning.
View on Reddit #48910509

Environmental-Metal9@reddit

But wait! (Sorry, just a reasoning model joke) this is the output of DeepSeek R1 for your prompt. I didn’t check the logic puzzle because I perform worse than most models at that, but you can definitely see the thinking part: It won’t let me copy the thinking section but it went on like this for a few miles worth of text, then spit out the answer below: **Hard Logic Puzzle:** Four cards lie on a table, each labeled 1 to 4. On the back of each card is a statement: 1. “The number of true statements is greater than my number.” 2. “The number of true statements is less than my number.” 3. “The number of true statements is not equal to my number.” 4. “Exactly three of these statements are false.” **Question:** Which cards have **true** statements? — **Solution:** Let’s analyze all possible numbers of true statements (1, 2, 3, or 4) and eliminate contradictions. 1. **Assume only 1 card is true:** - If card 1 is true: “True count > 1” → False (only 1). Contradiction. - If card 2 is true: “True count < 2” → True (1 < 2). But card 3 would also state “True count ≠ 3” → True (1 ≠ 3). This makes **2 true statements**, contradicting the assumption. - If card 3 is true: “True count ≠ 3” → True (1 ≠ 3). But card 1 states “True count > 1” → False, and card 2 states “True count < 2” → True (1 < 2). This again results in **2 true statements**. Contradiction. - Card 4 can’t be true alone, as it claims “Exactly 3 are false,” which would make itself the only true statement. But then card 2’s “True count < 2” would also be true (1 < 2), creating a contradiction. 2. **Assume 2 cards are true:** - If cards 1 and 3 are true: - Card 1: “True count > 1” → True (2 > 1). - Card 3: “True count ≠ 3” → True (2 ≠ 3). - Card 2: “True count < 2” → False (2 ≮ 2). - Card 4: “Exactly 3 false” → False (2 are false). **No contradictions.** This works. - Other combinations fail: - Cards 1 and 2: Card 2’s “True count < 2” contradicts 2 being true. - Cards 2 and 3: Card 2’s “True count < 2” invalidates 2 truths. - Cards 3 and 4: Card 4 is false. 3. **Assume 3 cards are true:** - Card 3 would state “True count ≠ 3” → False (3 = 3). Contradiction. 4. **Assume all 4 are true:** - Card 1: “True count > 1” → True, but card 4’s “Exactly 3 false” is False. Contradiction. **Answer:** Cards **1 and 3** are true.
View on Reddit #48911596

More-Ad5919@reddit

But now comes the big question! Is that true? Well it looks complicated enough, so it must be true. I would argue: Point 1 and 2 already contradict each other. There is no real number between 1 and 2.... And finally it says yeah, your are right. It all happends because it was trained on logic puzzles and tries to make similar stuff but without understanding logic.
View on Reddit #48913747

Inevitable-Rub8969@reddit

AI models like **O1, O3-mini, DeepSeek R1, Grok 3 and Qwen QwQ** are good at reasoning**.**
View on Reddit #48911587

Environmental-Metal9@reddit

Ah, I have 0 experience with grok. While I will remain having 0 experience with it out of my own personal views, I appreciate you raising awareness to it, as I can’t see my own blind spots. Thank you!
View on Reddit #48911819

MerePotato@reddit

Models of that size are basically toys by comparison
View on Reddit #48913294

Snoo_64233@reddit

yeah. People were declaring how they don't need OpanAI models no more as they have DeepSeek a couple weeks ago. Let people have phone size model. "We CaN DisTill O3.." yeah go do that with DeepSeek or other hundreds of models available on HuggingFace. opensource O3 is not needed. Gotta see what OpenAI has to offer in terms of phone size model.
View on Reddit #48909700

NNN_Throwaway2@reddit

wtf is anyone going to do with a phone-sized model lmao
View on Reddit #48912759

Finanzamt_kommt@reddit

Don't you dare take o3 mini open source from us!!!
View on Reddit #48909186

redditscraperbot2@reddit

Imagine having a capable local workhorse model taken away by phone users lol.
View on Reddit #48909923

Relative-Flatworm827@reddit

Lol. You can vs them. But it's going up look like this. O3 - millions of points Best phone model : 0 points
View on Reddit #48909167

MustyMustelidae@reddit

An O3 mini level model can be distilled and quantized: both allow trickling down to phones and can be done cheaply. A phone sized model will not have any capabilities we don't have other than being small, and that doesn't help unless OpenAI gives us some earth shattering architecture optimizations _and_ some 3rd party spends millions scaling those to a new model.
View on Reddit #48909143