Any Actual alternative to gpt-4o or claude?

Posted by Dragonacious@reddit | LocalLLaMA | View on Reddit | 44 comments

I'm looking for something I can run locally that's actually close to gpt-4o or claude in terms of quality.

Kinda tight on money right now so I can't afford gpt plus or claude pro :/

I have to write a bunch of posts throughout the day, and the free gpt-4o hits its limit way too fast.

Is there anything similar out there that gives quality output like gpt-4o or claude and can run locally?

[-]

z_3454_pfk@reddit

just use gemini via the web or deepseek/mistral/etc free via api or kimi for cheap via api

[-]

Corporate_Drone31@reddit

Save your money for the hardware in the future. Instead, try Kimi K2 from the API. At least on my provider, it's extremely inexpensive, and even a single dollar of query credit will take you far.

[-]

TheRealMasonMac@reddit

If you have a relative who is a student, you can sign-up for Gemini Pro for free.

[-]

you can make 1k/day requests for free on openrouter, search for 'free' models. (you just have to add 10$ of credit one time to increase the limit for free models from 10 to 1k per day) currently they offer even deepseek R1 for free. (obviously, don't expect much privacy...) you can chat with those models on the openrouter chat UI or use the API key on another UI (ie openwebUI)

if you value privacy, use non 'free' model on openrouter (look at the providers for every model, everyone has different politics about logging ad data retention). many models are really cheap and cost arount 1$ per million token.

https://openrouter.ai/models?order=pricing-low-to-high

[-]

Ylsid@reddit

If you sign up directly with what they route to e.g. chutes you can get even better usage limits

[-]

Affectionate-Cap-600@reddit

I didn't know that... thanks for the info!

[-]

jakegh@reddit

Use gemini 2.5 pro in google's AI studio for free (for now, anyway).

[-]

pokemonplayer2001@reddit

"Is there anything similar out there that gives quality output like gpt-4o or claude and can run locally?"

No. And,

"I got RTX 12 GB Nvidia 3060, 16 gb ram and an i5. Sorry I dont have high specs."

Nothing you can run will be close to the quality.

Use free models with openrouter.

[-]

Dragonacious@reddit (OP)

Saw a video on using Open AI API for using gpt-4o.

Video says cost will be far less compared to GPT Plus subscription. Really?

If I use gpt-4o via API, will it be same quality response compared to when using gpt-4o via GPT Plus subscription?

[-]

Ylsid@reddit

You can use DeepSeek for free

[-]

Logical_Divide_3595@reddit

You can buy a gemini pro account with student subscribed with $20, which is valid till Aug, 2026

[-]

ninjasaid13@reddit

Kinda tight on money right now so I can't afford gpt plus or claude pro :/

If you're tight on money then you can't afford the hardware that can run models close to gpt4o or claude.

[-]

RhubarbSimilar1683@reddit

Money aside, it's probably kimi k2

[-]

Dragonacious@reddit (OP)

My specs are not that high. I got RTX 12 GB Nvidia 3060, 16 gb ram and an i5.

I can spend like around $5-$6 a month for an LLM that gives gpt-4o or claude quality response.

I came across a site called galaxy .ai and which claims to provides all AI tools like claude, gpt-4o, veo 3 for $15 a month. The price seems too be good to be true, and seems like a scam too so didn't bother.

Can I use gpt-4o api? I've heard APIs are cheaper but not sure if they give the "actual" same quality response as gpt-4o via gpt plus subscription.

What are my options?

[-]

d4rk31337@reddit

You can get plenty of tokens for that budget on openrouter.ai and use different models for different purposes. There are even free models occasionally. That combined with https://openwebui.com/ should be more than enough for your requirements

[-]

Affectionate-Cap-600@reddit

yeah, also as I said in another comment, if you are not going to share sensitice/private data you have 1k request/day for ':free: models on openrouter (deepseek R1 is currently avaible as free version). you just have to add 10$ one time to increase the limit for free models to 1k.

when you are going to share something you don't want to be logged, just swith to the non free version (check specific provider policy / ToS) , and 5-6$ / month will give you many tokens

[-]

botornobotcrawler@reddit

Take your budget to openrouter, if you cannot run the models locally. There you can basically buy every llm via one api as you need! 5-6 dollars are month will be enough for most smaller models. When you use roo or cline to do the calls you have a nice ui and keep track of your spending.

There you can run deepseek r1 for quite cheap or even for free.

[-]

Dark_Fire_12@reddit

Not affiliated but you can use t3 chat, it fits your budget, it's $8.

Theo gives lots of $1 discounts daily for the first month.

Most Indies who built their own stop working on it but he managed to get enough success I think he and his team won't stop.

[-]

iheartmuffinz@reddit

Using large models via OpenRouter (or any API) might be for you. Instead of paying monthly, you deposit money and then pay per token generated. It is almost always cheaper than the subscriptions and by a substantial amount.

[-]

CommunityTough1@reddit

Not local, but Google is giving away $300 in AI credits to everyone for free for Gemini 2.5. Also, if you use something like OpenWebUI where you can bring your own key for API-based inference, there are a lot of really good models for free through OpenRouter, such as DeepSeek V3 and R1, as well as Kimi K2.

[-]

CupcakeSecure4094@reddit

There's nothing that will get anywhere near the quality of Claude or GPT-4o on a 12GB 3060 - your best bet to save money is to use a range of free tiers.

https://aistudio.google.com

https://gemini.google.com

https://chat.deepseek.com

[-]

Conscious_Cut_6144@reddit

On what hardware?

[-]

Dragonacious@reddit (OP)

I got RTX 12 GB Nvidia 3060, 16 gb ram and an i5. Sorry I dont have high specs.

[-]

Conscious_Cut_6144@reddit

How fast do you need it?
You can run Qwen3 32b very slowly
or Qwen3 14b at better speeds.

[-]

skipfish@reddit

Both of those far away in terms of quality comparing to Claude or gpt-4, unfortunately.

[-]

jacek2023@reddit

I think local LLMs are not what you expect

[-]

Annual_Cable_7865@reddit

use gemini 2.5pro for free http://ai.studio/

[-]

jkh911208@reddit

i tried https://lmstudio.ai/models/mistralai/devstral-small-2507 for few days now and it is very reliable

i am using 8 bit version but if you are downgrade to 4bit it will need 14GB of VRAM

i am running it on Mac

[-]

burner-throw_away@reddit

What model & specs, if I may ask? Thank you.

[-]

jkh911208@reddit

M1 max with 64gb ram getting about 13token/s with lmstudio

[-]

simracerman@reddit

Mistral Small 3.2 -24B is amazing! Even if some of the Q4 spills into system memory, OP will still have a nice experience.

[-]

Square-Onion-1825@reddit

you need h/w to support 70B+ parameter models

[-]

CommunityTough1@reddit

Nah. RTX Pro 6000 Blackwell i96GB s $8k and can easily handle 70B models at 4-bit quants. You wouldn't need to spend $15k for the rest of the setup. You could do a whole Ryzen 9 16-core/32 thread setup with 128GB DDR5 and 1200W 80 Plus Platinum PSU on top of that for another $2k. That's only $10k total.

[-]

wivaca2@reddit

Gpt4o is probably using the same electricity per user as your monthly home electric bill. Nothing is going to match these that isn't consuming a half a city block of racks in a datacenter.

[-]

Chris__Kyle@reddit

If you won't end up being able to run locally, then why no use: 1. chat.qwen.ai 2. aistudio.google.com 3. gemini.google.com 4. kimi.com There is a YouTuber called Theo. Many times he gives a promo codes in his videos so you can buy a subscription for t3.chat for $1. But you can still subscribe to $8 if you don't have.

[-]

Double_Cause4609@reddit

Uh...It really depends on what you use it for specifically.

Depending on exactly what you do, QwQ 32B, one of the Mistral Small variants (or finetunes) might do it. You could potentially push for Jamba Mini 1.7.

It'll be slow on your hardware but in principle it's possible, at least.

Again, I'm not really sure what you're doing ("write a bunch of posts" is extremely vague. Technical articles? Lifestyle posts?), so it's really hard to say. From your description anything from Gemma 3 4B to Kimi 1T might be necessary and it's really not clear where you are on that spectrum.

[-]

tempetemplar@reddit

Welcome to DeepSeek!

[-]

Accomplished_Ad8465@reddit

Gemma or Qwen do well with this

[-]

sciencewarrior@reddit

Unless you have a beefy GPU or a Mac, you may be better off sticking with online providers. Deepseek is a solid option, and Gemini 2.5 Pro is available for free via Google's AI Studio.

[-]

vegatx40@reddit

Does your laptop have a graphics card?

A lot of low end consumer RTX cards have between four and eight gig of VRAM. With that you could run one of the smaller Gemma3 models, for actually Gemma2 because you don't need multimodal. And of course there's the workhorse llama 3.1-8b

[-]

Dragonacious@reddit (OP)

I got rtx 12 gb nvidia 3060, 16 gb ram and an i5

[-]

vegatx40@reddit

It might not be super fast, but I am guessing you could squeeze in maybe a 15 billion parameter model.

Deepseek-r1:14b

Gemma3:12b

Qwen3:14b

Llama3.1:8b

[-]

adviceguru25@reddit

At least for coding, there's DeepSeek, Mistral, Kimi (though that's heavy). On this benchmark for models developing UI, GPT comes behind a lot of open source models.

[-]

kevin_1994@reddit

I actually dont think qwen3 32b is much worse than 4o. If you want o3 or claude, there is only deepseek, and there's no realistic way for you to run it, considering you use the free tier of chatgpt lol