right now what model is truly as good as gpt 4o? i wanna escape CloseAi claws

Posted by lordkamael@reddit | LocalLLaMA | View on Reddit | 51 comments

i tried deepseek after all the ruckus, in the end i didn't really vibed with it as much but i'm sure he's very good with science stuff or idk coding (wich i'll probably need as well at some point). i'm just trying to understand wich one is objectively better in comparison to gpt since that's the one that fits with most of my use cases. i tried llama it was ok, mistral as well, a little better that one, but still, gpt was more "human like?" i guess... but i'm not sure if that's the right term to describe it. i tried llama and was very satisfied with it but idk, i just feel deepseek was more powerful overally speaking. i need smt local and smart to help me with a bunch of projects. i work with digital art and i deal with a big gamma of topics and philosophical questions, somewhat complex ideas that fit into my art and craft in general. smt uncensored would also be appreciated! can anyone help me finding a good model? my specs are: rtx 2060 super 6gb (not the strongest i know) 16gb of RAM, and a i5 9400f 2.90Ghz 6cpus. i know my machine is not the most sharpened tool in the shed and that i wont probably be able to run smt as powerful as gtp to it it's full potential but i want to get as close as possible without burning my wings in the sun.

[-]

Ambitious-Disk-5987@reddit

Gemma 3 12B seems to be the best "successor" to GPT 4o for my workflows. I can run it fine-ish on my laptop, but it is the least corporate AI akin to models such as GPT 5.2, 5.3, and so that's quite an advantage.

[-]

ExtremePresence3030@reddit

You are comparing a 600b gpt model to a 32b model of deepseek?

It is highly inlilely thay you tried the real deepseek. You prabably just tried he distill version of it which is better be called fake deepseek

[-]

lordkamael@reddit (OP)

i'm not comparing anything brother, i'm just trying to get as close to it with my machine.

[-]

ExtremePresence3030@reddit

Honestly small models of deepseek are not that good compared to similar models of same size. But you cam still run some decent models. One of my laptops is 6gb gpu as well. I even run 32b models on it (i know people say its impossible. But i’m doing it in an acceptable speed). My ram is more than yours though.

I suggest you try qwen 2.5 14B and Mistral 24B models to see if they run on your system. Just avoid LM studio. People love to suggest this app and it’s quite a good app but in my experience it is bloated and not fast for lower spec systems. Use Koboldcpp and try to run those models. They are both very decent and I think you might be able to run them.

[-]

Master-Meal-77@reddit

Qwen2.5-32B-Instruct beats 4o-mini across the board according to benchmarks, and based on my personal experience with the model I think that's accurate

[-]

lordkamael@reddit (OP)

can i run it on my machine tho?

[-]

Master-Meal-77@reddit

Idk, see if an IQ4_XS quant fits. If not then no, you can't

[-]

BidWestern1056@reddit

use qwen 32b its really good. and use it with npcsh https://github.com/cagostino/npcsh !

[-]

lordkamael@reddit (OP)

interesting

[-]

YordanTU@reddit

The local models are much smaller (except 405B+ models) to compare like apples with apples, but I assume for you machine the Mistral-Small-3 24B will be the best alternative. There is almost the same machine as yours at home and I know it runs in Q4_K_M quite acceptable on this machine.

[-]

lordkamael@reddit (OP)

nice i'll try this one next.

[-]

valdecircarvalho@reddit

None! With consumer hardware you will NEVER ever get the same results and speeds as GPT-4 or any other commercial LLM in the market.

[-]

lordkamael@reddit (OP)

that's obvious, but i'm trying to get as close as possible, that's all.

[-]

Linkpharm2@reddit

3090 + qwq will get reasonably close. 5090 + qwen2.5 72b or llama R1 70b will be better. Then you can benifit from the open source part - seeing things like logprobs, samplers, context vs quantization, actually good ttft, uncensorship, whatever really.

[-]

valdecircarvalho@reddit

Do you have a 5090? Or you are only repeating what you heard ?

[-]

Linkpharm2@reddit

It's not related to op's post. I was replying to u/vladecircarvalho. I don't have a 5090 but I assume speed increases linearly with memory bandwith. 32gb vram will fit 70b well enough.

My comment was directed to disprove the statement "with consumer hardware you will never get the same results and speeds of GPT-4.".

[-]

valdecircarvalho@reddit

So, you don’t know. You just assumed. Ok then. Next!

[-]

Linkpharm2@reddit

You could contribute to the conversation. I know it increases. It's not an assumption that it increases. It is an assumption that above 1tb/s (I've tested 228, 448, 919, 1050) that the speed would increase. But cmon, you don't have to be brilliant to realize that there isn't a random wall at that bandwith.

[-]

valdecircarvalho@reddit

1tb/s (I've tested 228, 448, 919, 1050)

What the fu*** does it means??????

[-]

Linkpharm2@reddit

I've tested those speeds. 1050 is the highest I've tested. The assumption is that there isn't an invisible wall above 1050 but below 1700.

[-]

Healthy-Nebula-3603@reddit

Better than GPT-4o? in science , math , reasoning and coding .. QwQ 32b easily beat gpt4o.

Offline knowledge and still better than gpt4o? In reasonable size llama 3.3 70b ...

[-]

lordkamael@reddit (OP)

is was thinking of trying qwen2.5 14b

[-]

Healthy-Nebula-3603@reddit

you also can try and compare two of them.

[-]

lordkamael@reddit (OP)

it wasn't my favorite so far.

[-]

ForsookComparison@reddit

They're free. No need to think/consider, just download and try

[-]

Admirable-Star7088@reddit

I would not be surprised if QwQ 32b actually beat GTP-4o in raw intelligence, but I find it hard to believe that a 70b model would have more knowledge than a (200b?) model.

[-]

Sherwood355@reddit

GPT-4o is probably at least 400. But more likely, around 600b.

But anyway, these benchmarks honestly don't really tell the whole story. The only local thing we have that actually comes close to it would be deepseek r1 as in terms of raw intelligence.

But just so you know, I'm not saying QwQ 32b isn't great for its size, but that there's no way it's beating a huge cloud model that is more than 20 times its size, but it does get close at some areas.

[-]

Healthy-Nebula-3603@reddit

in reasoning , math , or coding gpt4o is not even close to QwQ as is not a reasoning model

[-]

ForsookComparison@reddit

I don't know where you're all getting this from but QwQ does not code as well as SOTA models. It beats Qwen-Coder 32B and Llama 3.3 70B which is a huge deal, but this community is getting way ahead of itself

[-]

Healthy-Nebula-3603@reddit

...gp4o is a sota in coding?

[-]

ForsookComparison@reddit

There is a huge difference in opinion between the people using these tools to create and the people reposting benchmark screenshots.

[-]

Healthy-Nebula-3603@reddit

I am a coder and can you say from my experience that gpt-4o is not even close in coding to QwQ.

- DP R1 seems better

- o3 mini high also generating a better code

- sonnet 3.7 non thinking ...hard to say ... lately broke my code soo badly .... thinking version not tested

[-]

ForsookComparison@reddit

Sonnet 3.7 is the best, but I also experience the "dice roll" where once every 10 prompts or so it'll decide to break your code or go way off the rails. Just using a system that allows easy rollbacks (aider with commits enabled) usually suffices there.

R1 is better 4o

QwQ is impressive but I cannot and have never been able to get it to a point where I'd say it competes with 4o. It might follow instructions a hair better, as most reasoning models do, but 4o writes better code and is much better at fixing its own bugs whereas QwQ, if it makes a mistake, struggles to fix it

[-]

hainesk@reddit

100%. As good as QwQ is, it’s limited by the fact that it’s a 32b model. There’s no making up for what it just doesn’t know.

[-]

Old-Organization2431@reddit

hahahha

[-]

Linkpharm2@reddit

Who cares. It's better than the rest. If it's Chinese reroll and fix your samplers.

[-]

StandardLovers@reddit

I run local LLMs from Qwen, QwQ, Gemma to Deepseek… up to 70B.

And yet, I still spend $20/month on GPT-4o.

It's simply the best all-in-one model—unmatched knowledge, reasoning, and adaptability. You can't replicate that on a local machine. The sheer breadth of its knowledge base surpasses anything open-source.

[-]

DarkVoid42@reddit

i run deepseek r1 670b which is pretty good. definitely better than gpt 4.

[-]

valdecircarvalho@reddit

What GPU(s) do you have?

[-]

Old-Organization2431@reddit

There is no such model

[-]

Old-Organization2431@reddit

And of course I get down voted, when being honest…

[-]

valdecircarvalho@reddit

Yeah! I feel you! Here´s my upvote!

[-]

Yes_but_I_think@reddit

Nothing except r1.

[-]

MaxDPS@reddit

Gemini 2.0 Pro is my daily driver for work (coding).

[-]

AliNT77@reddit

Very underrated model… 2m context, very high tps for free is insane…

[-]

LLMtwink@reddit

probably nothing open, if you want to run it locally, especially on your system, then definitely nothing unfortunately the new gemmas are pretty good as far as personality goes as compared to other models imo, you might wanna try that (though they're very censored), maybe there are community finetunes out there which are better for your purposes

[-]

Sea_Sympathy_495@reddit

Grok 3, Claude 3.7 and Deepseek v3, Flash 2.0 are all on par or better than GPT4-o. For local models go with qwen 2.5

[-]

billtsk@reddit

GPT-4o is a really good model, have to admit.

I only run locally for coding assistance, to avoid rate limits: Qwen2.5-7B-Instruct (codegen) Qwen2.5-Coder-7B or Qwen2.5-Coder-14B (codegen) Deepseek-R1-Distill-Qwen-7B or Deepseek-R1-Distill-Qwen-14B (planning/design) The 32B models run too slow for 8GB VRAM + DRAM imo.

Otherwise, I mess around with free options Grok3 app (good for research and long learning discussions) Perplexity (GPT-4o) freebie with phone service Llama 3.1B and Gemini 2 with browser Gemma 3 in Google AI Studio

[-]