O3 mini level model vs BEST phone size model?

[-]

a_beautiful_rhind@reddit

Bold of you to assume it's "nice".

Reply

[-]

More-Ad5919@reddit

Do we have reasoning models that can actually reason? Whats the best?

Reply

[-]

I suppose that depends on how you define reasoning. We don’t have any models that “think” in latent space yet, but plenty of models reason pretty ok, as in predict some tokens, analyze the tokens, re-evaluate, then generate some more tokens based on that. o1, o3-mini, DeepSeek r1, Qwen QwQ are all pretty decent at reasoning by the later definition (although the quality of final answers vary quite a lot from model to model, with R1 being the better in my very personal opinion, not based on benchmarks just real use cases) and are most definitely not “thinking” models by the former definition. If you have an alternative definition of reasoning, mind sharing with us? It’s very possible that there aren’t any models that reason if your definition is different.

Reply

[-]

More-Ad5919@reddit

I also give a damn about benchmarks. I always test them with logic puzzles, and I am not very impressed whatever I use. I am not sure what the difference should be between reasoning and thinking. I always start with: Give me a hard logic puzzle that has a definitive answer. While diving into that, I can't really spot any thinking or reasoning.

Reply

[-]

Environmental-Metal9@reddit

But wait! (Sorry, just a reasoning model joke) this is the output of DeepSeek R1 for your prompt. I didn’t check the logic puzzle because I perform worse than most models at that, but you can definitely see the thinking part: It won’t let me copy the thinking section but it went on like this for a few miles worth of text, then spit out the answer below: **Hard Logic Puzzle:** Four cards lie on a table, each labeled 1 to 4. On the back of each card is a statement: 1. “The number of true statements is greater than my number.” 2. “The number of true statements is less than my number.” 3. “The number of true statements is not equal to my number.” 4. “Exactly three of these statements are false.” **Question:** Which cards have **true** statements? — **Solution:** Let’s analyze all possible numbers of true statements (1, 2, 3, or 4) and eliminate contradictions. 1. **Assume only 1 card is true:** - If card 1 is true: “True count > 1” → False (only 1). Contradiction. - If card 2 is true: “True count < 2” → True (1 < 2). But card 3 would also state “True count ≠ 3” → True (1 ≠ 3). This makes **2 true statements**, contradicting the assumption. - If card 3 is true: “True count ≠ 3” → True (1 ≠ 3). But card 1 states “True count > 1” → False, and card 2 states “True count < 2” → True (1 < 2). This again results in **2 true statements**. Contradiction. - Card 4 can’t be true alone, as it claims “Exactly 3 are false,” which would make itself the only true statement. But then card 2’s “True count < 2” would also be true (1 < 2), creating a contradiction. 2. **Assume 2 cards are true:** - If cards 1 and 3 are true: - Card 1: “True count > 1” → True (2 > 1). - Card 3: “True count ≠ 3” → True (2 ≠ 3). - Card 2: “True count < 2” → False (2 ≮ 2). - Card 4: “Exactly 3 false” → False (2 are false). **No contradictions.** This works. - Other combinations fail: - Cards 1 and 2: Card 2’s “True count < 2” contradicts 2 being true. - Cards 2 and 3: Card 2’s “True count < 2” invalidates 2 truths. - Cards 3 and 4: Card 4 is false. 3. **Assume 3 cards are true:** - Card 3 would state “True count ≠ 3” → False (3 = 3). Contradiction. 4. **Assume all 4 are true:** - Card 1: “True count > 1” → True, but card 4’s “Exactly 3 false” is False. Contradiction. **Answer:** Cards **1 and 3** are true.

Reply

[-]

More-Ad5919@reddit

But now comes the big question! Is that true? Well it looks complicated enough, so it must be true. I would argue: Point 1 and 2 already contradict each other. There is no real number between 1 and 2.... And finally it says yeah, your are right. It all happends because it was trained on logic puzzles and tries to make similar stuff but without understanding logic.

Reply

[-]

Inevitable-Rub8969@reddit

AI models like **O1, O3-mini, DeepSeek R1, Grok 3 and Qwen QwQ** are good at reasoning**.**

Reply

[-]

Environmental-Metal9@reddit

Ah, I have 0 experience with grok. While I will remain having 0 experience with it out of my own personal views, I appreciate you raising awareness to it, as I can’t see my own blind spots. Thank you!

Reply

[-]

MerePotato@reddit

Models of that size are basically toys by comparison

Reply

[-]

Snoo_64233@reddit

yeah. People were declaring how they don't need OpanAI models no more as they have DeepSeek a couple weeks ago. Let people have phone size model. "We CaN DisTill O3.." yeah go do that with DeepSeek or other hundreds of models available on HuggingFace. opensource O3 is not needed. Gotta see what OpenAI has to offer in terms of phone size model.

Reply

[-]

NNN_Throwaway2@reddit

wtf is anyone going to do with a phone-sized model lmao

Reply

[-]

Finanzamt_kommt@reddit

Don't you dare take o3 mini open source from us!!!

Reply

[-]

redditscraperbot2@reddit

Imagine having a capable local workhorse model taken away by phone users lol.

Reply

[-]

Relative-Flatworm827@reddit

Lol. You can vs them. But it's going up look like this. O3 - millions of points Best phone model : 0 points

Reply

[-]

MustyMustelidae@reddit

An O3 mini level model can be distilled and quantized: both allow trickling down to phones and can be done cheaply. A phone sized model will not have any capabilities we don't have other than being small, and that doesn't help unless OpenAI gives us some earth shattering architecture optimizations _and_ some 3rd party spends millions scaling those to a new model.

Reply

O3 mini level model vs BEST phone size model?

Reply to Post

15 Comments

a_beautiful_rhind@reddit

More-Ad5919@reddit

Environmental-Metal9@reddit

More-Ad5919@reddit

Environmental-Metal9@reddit

More-Ad5919@reddit

Inevitable-Rub8969@reddit

Environmental-Metal9@reddit

MerePotato@reddit

Snoo_64233@reddit

NNN_Throwaway2@reddit

Finanzamt_kommt@reddit

redditscraperbot2@reddit

Relative-Flatworm827@reddit

MustyMustelidae@reddit