How good is o1-pro? | TheaterFire

[-]

datbackup@reddit

The question you should be asking is how good will it be three months from now? Will it even be the same model? Will all the prompts you wrote be equally as usable with any changes that occur to the model?

These are the questions that local LLM users seek to avoid having to answer

[-]

me1000@reddit

o1-pro is good, but not perfect. It's got a different prompt style, I found this post quite helpful: https://www.latent.space/p/o1-skill-issue

The main complaint for me is that by not seeing the output of the reasoning model the experience is worse. The summary model overtop the reasoning output often gets very abstract. You know how like when Claude will write code and then it gives you a couple paragraphs of text after explaining what the code does? o1 pro will often just output that paragraph of text. Which is annoying since what I really want is the raw code to copy and paste!

For that reason you have to be very explicit about what you want o1 to give you, otherwise it's often just a summary of work done behind the scenes.

I've also found that o1 can get argumentative once the context gets long enough that it's "reasoned" itself into a conclusion. Getting it to admit something is wrong is often hard.

I'm finding it useful, but it's a very different style of interaction. And tbh, it's kind of annoying they forced it into a chat bot style UI. I'm excited for all the other players to release their reasoning models, I think there's a LOT of room to innovate on the actual interaction style with these kinds of models!

[-]

blackkettle@reddit

I dunno I really feel like they are playing a sort of shell game with all the silly almost-the-same model names and just tweaking the the various default inference settings.

My primary use case is generating React component boiler plate. I've been using various LLMs for this for the past 6-8 months quite regularly. Up to around December, I would use the following strategy: for s new component provide a simple stub and describe what was needed. Then when later refactoring, provide the full component and ask for a refactoring and a 'drop in replacement' with no 'this part is unchanged' stubs. With 4o or Claude sonnet 3.0 this worked great even for relatively large components - say up to 1500 lines.

Then suddenly when the new 'reasoning' models came out this strategy completely stopped working with both Anthropic and OpenAI 4o. Suddenly any component with more than say 100 lines of code would end up garbled during the refactoring process or it would be unable to remember enough to generate a drop in replacement. I switched to o1 and low and behold it worked basically the same as 4o used to - just taking 10x as long, and now with an request limit that resets on a _weekly_ basis?

I just feel like I'm being had TBH. They shuffle the names, update things a bit, shuffle the inference params and run through some marketing hoops. I wonder if it is actually desperation because they are getting stuck.

[-]

raphaelfreediver@reddit

It takes ages with any prompt. Sometimes, more than 5 minutes. But it solves hard mathematical problems and tricky bugs better than most models.

[-]

blackkettle@reddit

I honestly believe they deliberately nerfed 4o and then forced us into o1. 4o used to provide excellent long form answers. After the o1 family release it stopped doing this and now provides only much shorter responses and often gets confused. O1 does this like 4o used to but is way slower than 4o ever was. It’s just nonsense IMO.

[-]

Thedudely1@reddit

I would not be surprised if this was the case. Not even necessarily maliciously, just like "oh if you want thoughtful answers just use o1!" which is so frustrating obviously. Really motivates you to try services like Deepseek just saying

[-]

Secure_Reflection409@reddit

There are definitely huge consistency issues with chatgpt.

[-]

MrMrsPotts@reddit (OP)

Are there any other models that can compete?

[-]

frivolousfidget@reddit

Not that I know of, and I try most of them. But I only use it as my last resort because it takes ages.

[-]

raphaelfreediver@reddit

I'm not sure. But I went back to the premium subscription even if I'm more or less the optimal target for the pro subscription (mathematician developing complicated software). Honestly, I just can't wait 5-10 minutes for each answer. I still use o1 when quicker models fail though.

[-]

No_Kick7086@reddit

It is sooo slow.

[-]

codyp@reddit

They don't have a setting to control the thinking time yet? Thats what I am looking foreword to, forcing it to spend hours on an essay--

[-]

raphaelfreediver@reddit

They didn't until last week

[-]

CtrlAltDelve@reddit

Deepseek's thinking model and Gemini 2.0's thinking model are not bad either.

I had them both try to help me solve a complex metric-gathering logic problem and they both came up with something quite solid.

[-]

AppearanceHeavy6724@reddit

no, not o-1 pro in terms of strength.

[-]

CtrlAltDelve@reddit

Sure, which is why I said "not bad" :)

Competition is still competition, regardless of how far behind, and they're also highly accessible, given that both can currently be used for free.

[-]

Adept-Werewolf-6470@reddit

It's the best model available by far. Hate that it costs so much but for me its irreplaceable. You need to know how to use it properly. It fixes code errors in individual files really well but isn't going to write you a complete web app in one shot. It also excels in philosophy and has given me some very interesting angles to look at ideas.

[-]

x54675788@reddit

Yes. I'm willing to let you test it if you share a prompt. At that point, you'll have to share your result as well.

[-]

segmond@reddit

How would we know? It's not a local LLM.

[-]

Puzzleheaded_Fold466@reddit

Better than non-pro. Whether or not that’s worth $180 to you, only you can say.

[-]

frivolousfidget@reddit

Also the ability to just use o1 as much as one can (reasonably) want. I use o1 all day long now with pro. And o1 pro when I need.

[-]

wow-again@reddit

Good

[-]

Sweet_Protection_163@reddit

Good

[-]

medialoungeguy@reddit

Good

[-]

codyp@reddit

good?

[-]

medialoungeguy@reddit

Ya, good

[-]

YearnMar10@reddit

No, good

[-]

sedition666@reddit

No bueno?

[-]

MealFew8619@reddit

Better

[-]

Fleshybum@reddit

If you are going from cursor using Claude, for coding it’s not as good.

[-]

endgamefond@reddit

vscode github copilot for debugging.

[-]

sammoga123@reddit

Why hasn't anyone commented on Qwen 2.5 plus? Now that Qwen has a site I've been using it and it seems pretty complete, btw, I'm using Lua, a language not as popular as Python or C

[-]

Phil-Park3r@reddit

It is as good as the prompt it is given.

Claude is better if you are just going to smash in a prompt and hope it understands what you want.

01 Pro does not make assumptions as much, so you really need to ask it for exactly what you are after and what it should and should not do.

[-]

Sweet_Protection_163@reddit

Good

[-]

medialoungeguy@reddit

Good