How good is o1-pro?
Posted by MrMrsPotts@reddit | LocalLLaMA | View on Reddit | 33 comments
Is it actually much better than the other models? I am particularly interested in math and coding?
Posted by MrMrsPotts@reddit | LocalLLaMA | View on Reddit | 33 comments
Is it actually much better than the other models? I am particularly interested in math and coding?
raphaelfreediver@reddit
It takes ages with any prompt. Sometimes, more than 5 minutes. But it solves hard mathematical problems and tricky bugs better than most models.
blackkettle@reddit
I honestly believe they deliberately nerfed 4o and then forced us into o1. 4o used to provide excellent long form answers. After the o1 family release it stopped doing this and now provides only much shorter responses and often gets confused. O1 does this like 4o used to but is way slower than 4o ever was. It’s just nonsense IMO.
Thedudely1@reddit
I would not be surprised if this was the case. Not even necessarily maliciously, just like "oh if you want thoughtful answers just use o1!" which is so frustrating obviously. Really motivates you to try services like Deepseek just saying
Secure_Reflection409@reddit
There are definitely huge consistency issues with chatgpt.
MrMrsPotts@reddit (OP)
Are there any other models that can compete?
frivolousfidget@reddit
Not that I know of, and I try most of them. But I only use it as my last resort because it takes ages.
raphaelfreediver@reddit
I'm not sure. But I went back to the premium subscription even if I'm more or less the optimal target for the pro subscription (mathematician developing complicated software). Honestly, I just can't wait 5-10 minutes for each answer. I still use o1 when quicker models fail though.
No_Kick7086@reddit
It is sooo slow.
codyp@reddit
They don't have a setting to control the thinking time yet? Thats what I am looking foreword to, forcing it to spend hours on an essay--
raphaelfreediver@reddit
They didn't until last week
CtrlAltDelve@reddit
Deepseek's thinking model and Gemini 2.0's thinking model are not bad either.
I had them both try to help me solve a complex metric-gathering logic problem and they both came up with something quite solid.
AppearanceHeavy6724@reddit
no, not o-1 pro in terms of strength.
CtrlAltDelve@reddit
Sure, which is why I said "not bad" :)
Competition is still competition, regardless of how far behind, and they're also highly accessible, given that both can currently be used for free.
Adept-Werewolf-6470@reddit
It's the best model available by far. Hate that it costs so much but for me its irreplaceable. You need to know how to use it properly. It fixes code errors in individual files really well but isn't going to write you a complete web app in one shot. It also excels in philosophy and has given me some very interesting angles to look at ideas.
x54675788@reddit
Yes. I'm willing to let you test it if you share a prompt. At that point, you'll have to share your result as well.
segmond@reddit
How would we know? It's not a local LLM.
Puzzleheaded_Fold466@reddit
Better than non-pro. Whether or not that’s worth $180 to you, only you can say.
frivolousfidget@reddit
Also the ability to just use o1 as much as one can (reasonably) want. I use o1 all day long now with pro. And o1 pro when I need.
wow-again@reddit
Good
Sweet_Protection_163@reddit
Good
medialoungeguy@reddit
Good
codyp@reddit
good?
medialoungeguy@reddit
Ya, good
YearnMar10@reddit
No, good
sedition666@reddit
No bueno?
MealFew8619@reddit
Better
Fleshybum@reddit
If you are going from cursor using Claude, for coding it’s not as good.
endgamefond@reddit
vscode github copilot for debugging.
sammoga123@reddit
Why hasn't anyone commented on Qwen 2.5 plus? Now that Qwen has a site I've been using it and it seems pretty complete, btw, I'm using Lua, a language not as popular as Python or C
me1000@reddit
o1-pro is good, but not perfect. It's got a different prompt style, I found this post quite helpful: https://www.latent.space/p/o1-skill-issue
The main complaint for me is that by not seeing the output of the reasoning model the experience is worse. The summary model overtop the reasoning output often gets very abstract. You know how like when Claude will write code and then it gives you a couple paragraphs of text after explaining what the code does? o1 pro will often just output that paragraph of text. Which is annoying since what I really want is the raw code to copy and paste!
For that reason you have to be very explicit about what you want o1 to give you, otherwise it's often just a summary of work done behind the scenes.
I've also found that o1 can get argumentative once the context gets long enough that it's "reasoned" itself into a conclusion. Getting it to admit something is wrong is often hard.
I'm finding it useful, but it's a very different style of interaction. And tbh, it's kind of annoying they forced it into a chat bot style UI. I'm excited for all the other players to release their reasoning models, I think there's a LOT of room to innovate on the actual interaction style with these kinds of models!
Phil-Park3r@reddit
It is as good as the prompt it is given.
Claude is better if you are just going to smash in a prompt and hope it understands what you want.
01 Pro does not make assumptions as much, so you really need to ask it for exactly what you are after and what it should and should not do.
Sweet_Protection_163@reddit
Good
medialoungeguy@reddit
Good