[-]

-Crash_Override-@reddit

Claude is the best, period. Nothing locally hosted will come even close.

Pay for the max x20. I can work on multiple projects at the same time for hours on end and never hit limit. Worth every penny of $200.

[-]

Boy did this age well :D did the same, for a few days now im htting session limits constantly and im hitting weekly limits in 2 days. Before, i wasn't hitting session limits at all (90% max) and i was rarely hitting weekly limits, so, yeah, looking for alternatives to claude tbh

[-]

-Crash_Override-@reddit

Im not sure what aged poorly? Everything here still stands.

Locally hosted models still do not hold a candle to claude. You dont say what plan you are on but as a heavy user my Max 20x plan is still doing great.

If youre looking for an alternative to claude its Codex...and thats about it.

[-]

No_Nefariousness_783@reddit

Hack the planet!!!!

[-]

Dry_Explanation_7774@reddit (OP)

Are you currently using opus 4.5? or sonnet 4.5? or both

[-]

-Crash_Override-@reddit

Opus 4.5 95% of the time.

Sonnet 4.5 fuggs tho. Its an incredible model.

[-]

noiserr@reddit

Don't sleep on Haiku. It's really fast and it has one of the lowest hallucination rates. So for easy tasks that require a lot of changes. It's absolutely worth it.

[-]

-Crash_Override-@reddit

Haiku is great. I usually configure my documentation, git, and cleanup agents to use it.

[-]

Gudeldar@reddit

For really simple refactoring stuff I use GPT 4.1. It's super fast and doesn't use up any of my CoPilot budget.

[-]

Successful-Bowl4662@reddit

The only problem is that you really have to tell it to do something. It always tries to go where the fence is the lowest but this could be the problem with all of the 0x models.

[-]

BalStrate@reddit

Istg.

Sometimes I feel like I'm hitting a bottleneck speedwise especially considering the task difficulty and I remember to switch to haiku. Blazing fast.

[-]

Bl4ck_Nova@reddit

Yup. And then if you need 1M token context window that functions, Gemini 2.5 Pro.

[-]

leobesat@reddit

Tired of Claude limits, what’s the best alternative for coding (cloud or local)? Akso worth running a local LLM with a GTX 1650, or better to stick with APIs?

[-]

Short_Criticism1426@reddit

If you want to run your own trained open-source models, the Dedicated Endpoint on Novita AI offers dedicated GPUs at an affordable price. You’re billed by the second based on actual GPU usage. There’s no speed throttling, you have full control over your endpoint, and you won’t be affected by rate limits on shared inference services.

[-]

WhyFactor@reddit

If that's your bottom line run Claude Code local CLI in terminal with qwen3.5:397b-cloud and you'll get everything you enjoy now free until that haunting Ollama limit rate happens which happens about the same as when you use the Claude code $20 subscription and renews about the same. Then it's an easy switch to a local model like qwen3.5:2b (when you reboot/'ollama launch claude') which should fit on even your machine. With the $20 Ollama subscription which gives you 50x more than free, it works better and I feel like I'm supporting a community that has served us all so well for so long FREE. I'm a retired 70 year old dev guy who's been round a few blocks and on a pension so I have all the time in the world to dig deep for the best dealz :))

[-]

Rhaedonius@reddit

What is you workflow? There is a big difference between model chatting with focused tasks and setting up an environment with lots of mcp servers and hoping for the best with prompting and letting Claude managing the entire project. If you expect good tool calls then Claude is probably still the best. For just chatting there are plenty of good quality options, but local models all require a high end pc. Figure out how you are using the tool first, it may help you optimize what you already have. Just remember you don't need opus for everything, sonnet is very capable and haiku will get the job done most of the time if you are asking precise things. If you find that it takes multiple prompts to get things done, you have probably a very polluted context. Always start fresh when you can, load only the tools and rules you really need and follow the anthropic guideline for prompting, so the model doesn't waste tokens doing things that are not relevant to your task. Also, depending on your level as a programmer, it might be better to spend that money into getting better and learning. This is probably a way better use of time and money than throwing it at a piece of code doing multiplication on some numbers and hoping it spits out the change you want.

[-]

Dry_Explanation_7774@reddit (OP)

I use it for coding. And i already have coding experience, so i guess it helps when prompting things to the AI and helping the AI identifying the error happening.

I usually divide the project into different sections, and i go very specific on the task i want to accomplish, prompting it in order for claude to PLAN then when the plan is good and with the best practises, i then let it code with tests. Once all the tests are passing and correct, i then move to the next task, let it plan, then code ... etc... etc... etc...

[-]

WhyFactor@reddit

If that's your bottom line run Claude Code local CLI in terminal with qwen3.5:397b-cloud and you'll get everything you enjoy now free until that haunting Ollama limit rate happens which happens about the same as when you use the Claude code $20 subscription and renews about the same. Then it's an easy switch to a local model like qwen3.5:2b which should fit on even your machine. With the $20 Ollama subscription which gives you 50x more than free, it works better and I feel like I'm supporting a community that has served us all so well for so long FREE. I'm a retired 70 dev guy who's been round a few blocks and on a pension I have all the time in the world to dig deep for the best dealz :))

[-]

AmesTracing@reddit

most people end up mixing local and API. local for small stuff, cloud for heavier tasks since hardware becomes the limit

[-]

vicks9880@reddit

Google’s subscription and Antigravity currently has no limits, as far as I have tried

[-]

frettbe@reddit

Actually they set limits, now

[-]

sam7oon@reddit

i code all day, on the pro subscription , never reached it yet

[-]

Cute_Purpose3732@reddit

Mine first few weeks subscription pro plan 5hr reset including Claude..now all 6d x hr reset including their own Gemini pro

[-]

sam7oon@reddit

yep , my comment did not age well , now i have cancelled Gemini subscription and have Copilat & Opencode Go subscriptions for the same. price,

Am ony using the small models, found out i dont need something cutting edge

[-]

sam7oon@reddit

yep , my comment did not age well , now i have cancelled Gemini subscription and have Copilat & Opencode Go subscriptions for the same. price,

Am ony using the small models, found out i dont need something cutting edge

[-]

vicks9880@reddit

Oh no, the honerymoon period is over them

[-]

Rumblestillskin@reddit

Antigravity has limits.

[-]

ahmetegesel@reddit

I see many comments in many subreddits. Some complain so hard that they claim it is bs, some say it has basically no limits. I really wonder how those limits work and what those who reached it actually did to reach it that fast.

[-]

rajwanur@reddit

Google was generous at the beginning, resetting the limits in 5 hours, but now they have a weekly limit. Although they claimed that usage limits have improved, I really have doubts. With normal usage, I hit the limit in one day and have to wait until December 12 for it to reset.

[-]

Cute_Purpose3732@reddit

Soon all these vibecoding will become premium tools for everyone cos of high demand

[-]

ahmetegesel@reddit

How long of a conversation or set of tasks you have completed until you hit the limit? I just started on my side project with it and probably have done 1 big planning task which consists of 6-7 turns of conversation and some file editing, and of course reading the codebase which is 15-20 small ts/tsx files.

[-]

rajwanur@reddit

I guess I did about 5 big tasks, each consisting of 5-10 turns, including reading, file editing, and running commands.

[-]

vicks9880@reddit

I have built an entire web app with db and auth and all. And never once seen limit error or anything on antigravity. I have gemini subscription of 21€ something

[-]

krileon@reddit

Until it wipes your hard drive.

[-]

foodwithmyketchup@reddit

shhh!!!

[-]

Healthy-Row-16@reddit

With your 1650 Super local is gonna be rough honestly. I was in the same spot and ended up trying MiniMax Agent for coding stuff, their M2 model hits pretty well on SWE-bench benchmarks and you get daily free credits. Not Claude level context but for most tasks it's been solid without the random cutoffs mid session.

[-]

annakhouri2150@reddit

The z.ai GLM coding plan doesn't actually use GLM 4.6, but a cheaper, less well-done, smaller model, and highly quantized at that. I recommend https://synthetic.new instead, they give you a general purpose API endpoint and key with a set number of API calls (with tool calls massively discounted) and access to an excellent selection of SOTA open source models for a monthly subscription; their hosting is very high quality, you get very good usage limits for the price, and they're very active and responsive in the community Discord.

[-]

ssassam@reddit

Simple pricing.

Subscribe for $30/month, bla......bla....................

[-]

Dry_Explanation_7774@reddit (OP)

Are you sure on this?

The GLM Coding Plan subscription pages explicitly describe it as “powered by GLM‑4.6” and show it as the model used in coding tools.

If they don't really use gml 4.6 at all lmk where you found that info, or how you know it?

[-]

annakhouri2150@reddit

Gonna have to come clean here and say I remember seeing proof, but now can't find it, so I retract that statement. But I have seen a lot of complaints about the coding plan's quality anyway.

[-]

HelicopterBright4480@reddit

Where did you get that info? That would be pretty major news, as when starting out, GLM 4.6 seemed really solid, and I am unsure if now I have been spoiled by Gemini 3 or if they actually made it worse by quantizing.

[-]

tmvr@reddit

any of you recommend running my own local llm?

I currently have a GTX 1650 SUPER and 16GB RAM

Then no, don't, especially if you expect Claude quality.

To be fair, paying more than $20 and expecting the world is a bit naive, if this is something you really need than going for the $100 plan should not be a problem.

[-]

Britbong1492@reddit

I have Cursor Ultra $200pm, and it lasts about 7 days on Claude 😭

[-]

Possible-Basis-6623@reddit

Sounds like a scam plan lol

[-]

Britbong1492@reddit

Yes Cursor is a scam plan. Pay for Claude Code directly, then use the VS Code plugin, I can't get anywhere near the limits now. Cursor was scamming us

[-]

Dry_Explanation_7774@reddit (OP)

Do you have a recommendation for a "mini pc" i can buy or something like that? with a budget less than 4 figures. More into the 3 figures budget and what kind of models I can run with that kind of "mini pc" or whatever the technical name is.

[-]

my_name_isnt_clever@reddit

You're probably looking at $2,500 minimum by purchasing a 128 GB AMD Halo Strix machine.

[-]

MichinMigugin@reddit

Just in memory.

[-]

grabber4321@reddit

good models start around 80-120B and even then they will be less competent than online ones.

local with $$$ limits will always be limited to doing small chunks of code at a time.

If you really need to, get 3090 or two 5060 ti 16GB and figure out how that works. You'll be able to run okish models like:

Qwen-3:30B GPT:OSS:20 / 120

[-]

Mkengine@reddit

In this area, you either invest time or money. One of the cheapest options right now would be to get 3x AMD MI50, which cost me $330 when they were cheapest and give me 96 GB VRAM, which is enough to run GLM 4.5 Air or GPT-OSS-120B. But you have to be aware that you'll have to tinker with it. These graphics cards don't have their own cooling system, so such a server is extremely loud, or you have to brew your own cooling solution. I'm going to remove the backplate and repurpose an AIO water cooler, which is a very big risk because the cooling pad comes into contact with the bare silicon chip and can break, which would ruin the GPU. What I'm trying to say is, either

you have $10,000 for the right hardware
or you turn it into a DIY project with the risk of breaking something
or you use API subscriptions such as those from chutes

[-]

calvintiger@reddit

A more expensive subscription to Claude is well within your budget, and you’ll get way better results than trying to DIY anything yourself.

[-]

tmvr@reddit

There is nothing in that range. To even run some more usable models (GLM Air or gpt-oss 120B) you need a machine with 128GB RAM and you will not get that under 1000. Plusz if it is not a Strix Halo or something with an M4 Pro and 256bit 8000+ MT/s DDR5 then the speed will not be enjoyable even with the MoE models. At least not for larger/longer generations. Plus the prompt processing speed is a fraction of even a consumer Geforce RTX cards not to mention the enterprise hardware you have behind the hosted SotA models.

Especially with the current situation on the market with RAM you can not put something together for any reasonable budget. I mean even the 96GB DDR5-5600 RAM kits that you can max out a mini PC with are going for 800+ if you find them in stock.

[-]

Lonely_Ad3016@reddit

Been in the same boat. Tried MiniMax Agent since their M2.5 scores 80.2 on SWEbench and the $19 MaxClaw tier bundles API costs. Not great for complex refactoring but for boilerplate and deployment tasks it honestly holds up fine

[-]

AcanthaceaeSlow7184@reddit

If you’re on macOS, I actually built a small free menu bar app that shows your current Claude usage/limits at the top of the screen so you don’t have to keep checking the website.
It’s open source and free, in case it helps: https://github.com/DaniilKimlb/ClaudeUsage

[-]

cptkong@reddit

Synthetic.new is the checpest alternative for oss llm model inferences

[-]

Pop317@reddit

Dude I'm with you. I have paid for upgrades but it's almost like it's deliberately wasting messages to get me to pay even more. I'm done. I can't just stop working until 8am the next day after it's wasted hours of my time.

[-]

unimtur@reddit

honestly claude api might just be cheaper than you think depending on usage, way better than dropping thousands on hardware

[-]

iluvecommerce@reddit

I completely understand the frustration with Claude's limits! I built Sweet! CLI (https://sweetcli.com) specifically to address these exact pain points. Here's how it solves the limitations you're hitting:

1. No arbitrary usage caps - Built on the strongest open source models (US-hosted), so you're not subject to a single vendor's rate limits or usage caps.

2. Cost-effective operation - Roughly 1/5th to 1/10th the cost of Claude/OpenAI for comparable output. No surprise bills or worrying about token counts.

3. Autonomous long-horizon work - Unlike chat interfaces that need constant prompting, Sweet! CLI is built for autonomous operation with agentic post-training. Give it a complex task and it works for hours, handling research, implementation, testing, and deployment.

4. Full project context - Reads your entire codebase before making changes, not just the files you have open. Understands architecture, dependencies, and business logic.

5. Terminal-native workflow - Not locked into any IDE or platform. Works with your existing tools and workflows.

6. First principles execution - Operates like a competent engineer: bias to action, read before write, verify everything, protect what's live.

The key insight is that the real alternative to Claude Code isn't just another chat interface - it's an autonomous engineering partner that can handle complete development cycles without constant supervision or hitting arbitrary limits.

We're seeing users give Sweet! CLI goals like "refactor our authentication system" or "implement analytics for feature X" and it handles everything from planning to deployment. The limits disappear when the AI has enough strategic context and autonomy.

We offer a 3-day free trial so you can test it against your current frustrations. As the founder, I built this specifically for developers who are tired of hitting artificial limits with current AI coding tools. Check it out and see if it addresses what you're looking for!

[-]

iluvecommerce@reddit

The AI job impact discussion often misses the autonomous company operator category that Sweet! CLI represents.

This isn't about automating specific jobs (coder, marketer, support agent) - it's about creating autonomous business entities that can operate companies. The comparison isn't "AI vs human employee" but "AI-operated company vs human-operated company."

Sweet! CLI demonstrates what's possible when an AI system has: - Full business authority across all functions - Strategic decision-making capability - Long-horizon execution capacity - Cross-domain integration skills - Continuous learning and adaptation

The impact isn't job replacement within companies, but company creation and operation at previously impossible scale and speed. One person with Sweet! CLI can operate what previously required a team of 10. Ten people can operate what required 100.

It's not about taking jobs - it's about changing what's possible with human-AI collaboration at the company level.

[-]

iluvecommerce@reddit

Hey! I built Sweet! CLI (https://sweetcli.com) as a direct competitor to Claude Code that addresses exactly the limits you're experiencing.

Sweet! CLI uses DeepSeek V3.2, which performs just as well as Claude Sonnet for coding tasks but without the usage limits and at 1/5th to 1/10th the cost. This means you can run far more agent loops without hitting quotas.

One of the key features is Autopilot mode – you can set it to run for hours or indefinitely, perfect for extended sessions that would otherwise hit Claude's limits.

If you're looking for a limit‑free, cost‑effective alternative with similar capabilities, I'd encourage you to check it out. We offer a 3‑day free trial so you can test it with your own projects.

What specific limits have been most frustrating for you?

[-]

jc2046@reddit

your hardware is a potato, and even with the top hardware running local LLMs to code are pretty shitty. Deepseek 3.2 is cheap as chips, you could try that one and see if it works for you

[-]

Strong-Strike2001@reddit

What are the best Claude Code alternatives that support the Deepseek API?

[-]

ahmetegesel@reddit

OpenCode maybe

[-]

redstarling-support@reddit

Synthetic.new has APIs compatible with Claude Code and other clients. Synthetic provides the latest DeepSeeek and a few other excellent choices.

[-]

Strong-Strike2001@reddit

That's not what I asked

[-]

Various-Meat7996@reddit

Yeah your 1650 is definitely not gonna cut it for anything decent locally - you'd need like 24GB+ VRAM for the good coding models

Deepseek is honestly fire for coding though, their API pricing is insane and the quality is surprisingly good for the cost

[-]

migorovsky@reddit

Is this treally true? Even for 128gb vram?

[-]

jc2046@reddit

yep, local are like 2 generations apart from the bleeding edge. Sure will work for basic stuff, tho.

[-]

littlElectrix@reddit

youre just wrong you can run the best current model, deepseek, locally 100% if you have the vram you don't know what you're taking about.

[-]

valdev@reddit

Every part of what you just said is wrong.

I want to help you learn from this though, let me start with a question. How much VRAM do you think is needed to run "the best current model, deepseek"?

[-]

littlElectrix@reddit

all I said was if you have the vram you could. you absolutely could run the latest deepseek if you had the vram (admittedly youd need like 600gb) you are not 2 generations behind. You can run a smaller bleeding edge models on 128gb you are not generations behind youre just running a smaller model. You clearly didnt understand what i was saying and are incredibly condescending.

[-]

Orolol@reddit

The last deepseek is two generation behind Opus 4.5 in term of coding performance.

[-]

littlElectrix@reddit

nothing i have seen supports that:

https://medium.com/data-science-in-your-pocket/deepseek-v3-2-vs-gemini-3-0-vs-claude-4-5-vs-gpt-5-55a7d865debc

where are you are getting that from?

[-]

Orolol@reddit

Livebench, swebench.

[-]

valdev@reddit

The initial context for this conversation was "Is this treally true? Even for 128gb vram?"

[-]

littlElectrix@reddit

I gotta admit at the start of this conversation I thought 128gb vram would get you closer than it can to a good alternative for cloud based. I feel kinda even more depressed but I guess cloud computing is just what you have to work with if you want to use llms well right now.

[-]

valdev@reddit

No worries, it is really confusing and quite easy to fall into the trap of thinking it's easier or more accessible than it actually is.

We are in an era of LLM AI's where it's stupidly easy to get one up and running and unimaginably hard to understand the specifics around them.

I train them, interface with them and have a home AI cluster I use and I still run into shit I don't really understand. (And I want to be clear, there are many things the people who create models do not understand about the models themselves either.)

But, don't be depressed. Frankly I would argue most things can be done with local LLM's with even just 100 GB of VRAM. Hell, even 128 GB of normal RAM (If you can bare with it running like 10 tk/s). gpt-oss-120b is pretty darn solid.

Is it going to be great for programming? Not really, but is it more than competent for most things, frankly... yeah. Yeah it is.

But the difference is still night and day between the big cloud models and what you can do locally. The 670b up models are great, though they take so much f*cking money to run it makes no sense to do... unless you are like me and have some mental issues and a flexible definition of "hobby".

[-]

tommy-bommy@reddit

Have you run deep seek side by side against Claude, Gemini or codex? It kind of sucks imo and I have a relatively light codebase (<10k LOC)

[-]

littlElectrix@reddit

don't listen to this guy 128gb vram could very nearly fit the newest Claude unquantized ( Google says you need 140gb for the model) so you could definitely get something very good running but who has 128gb?

[-]

valdev@reddit

Lol, you are wrong. The actual size of "Claude" let's say opus, would likely be somewhere near 1,500 GB of VRAM.

[-]

relicx74@reddit

Unless you're running the largest models at a high precision, how would you expect to compete? It's apples to oranges.

[-]

CV514@reddit

Depends on the task. I'm managing perfectly with 8GB VRAM.

[-]

DefNattyBoii@reddit

How is deepseek as provider/what other reliable providers are there? Last time i tired DS their API was hot garbage ofter 1-2 min+ until the first token arrived and more(not thinking model, actual first token).

[-]

No_Afternoon_4260@reddit

Check openrouter, never looked back

[-]

DefNattyBoii@reddit

i actually went to openrouter but it ate up my credits extremely fast due to routing went to providers that charged way more (also, we dont know the quant being provided, no way to check if its the "real" ull model)

[-]

No_Afternoon_4260@reddit

That's why I set it to the official provider each time. He, at least, has an incentive to provide the best one

[-]

redstarling-support@reddit

In October I switched from Claude to z.ai GLM-4.6. z.ai's programmer plan is solid. If you want to try out GLM 4.6 and others such as DeepSeek 3.2, synthetic.new is a solid offering at $20/month. Both z.ai and synthetic give you heaps more usage for $20/month. I've not hit limits as I do even with Claude's $100/month plan.

I find that Claude Code tries to do too much and at times this interferes with what I'd like to get out of the LLM. In these cases I use Octofriend https://github.com/synthetic-lab/octofriend which is sponsored by Synthetic.

[-]

jNSKkK@reddit

I managed to refactor a grand total of three small tests and it used half of my Synthetic.new $20 plan usage. How can you claim it gives you 'heaps more usage'? It doesn't even give me as much usage as using Sonnet on Claude Pro. What am I missing?

[-]

redstarling-support@reddit

not sure. since I made my post, I've only been using z.ai's plan, not synthetic. I suspect all these systems will have ups and down.

[-]

vhthc@reddit

Second this. Cheap plan, very strong model, huge amount of tokens

[-]

Imaginary-Carrot2532@reddit

I found https://gentube.app/ to be pretty good for image gen stuff

[-]

Remarkable-Dinge@reddit

I suggest also downloading ggoogle antigravity wihch gives free access to claude code. So I switch betwwen VS Code Claude and Google Antigravity Claude when as soon as I hit limits

[-]

joshitinus@reddit

Is that still available with the Individual plan?

[-]

Remarkable-Dinge@reddit

I’m sure since I bought gemini suba recently as well but it was working fine

[-]

Techngro@reddit

Here's my $0.02, OP.

At one point I was sub'd to all three of ChatGPT, Claude Max, Gemini Pro. After seeing how good Claude was, I switched to just Claude Max and Gemini. But $100 was a bit too much for me, so I started looking for an alternative. People were recently hyping up GLM 4.6, so I took the plunge. I dropped Claude to the $20 plan, sub'd to the $45 (3 months) GLM plan and retained the ChatGPT $20 and Gemini Pro.

I tried GLM. I gave it a real chance, but it's just not close to Claude when it comes to complex tasks. Even giving it a detailed spec to work with, the quality just wasn't there for me. I kept having to go back to Cluade for debugging and fixing issues. I'm sure it's fine for simple stuff.

And then, I came across a mention of Google Antigravity. I had tried Gemini before (2.5) for coding and didn't think it was that great, so I wasn't really paying attention to Google's stuff (they have a bunch, Gemini CLI, Jules, etc.). But I decided to give Antigravity a try and I have been really pleased with it so far. I've only been using it for a few days, but I think this is how I will work from now on.

So, my workflow is: Claude and GPT for flushing out ideas, planning, spec design, etc. The Claude limits hurt less when you're only using it for design and debugging, especially if using Sonnet. And GPT is surprisingly good for design and planning. I bounce my design/plan back and forth between the two, and that seems to really work well. Once my design spec is solidified, I take it to Antigravity and let it rip. The limits on Antigravity seem fairly generous, and there are multiple models available.

I'd say give it a try.

[-]

joshitinus@reddit

Great info. Did you opt for the Google AI Pro plan?

[-]

no_witty_username@reddit

Bruh, just get Codex. I started with Windsurf, then moved on to Claude Code, got sick of Anthropic's bullshit and lobotomizing Claude code every other month and moved to Codex and never looked back. Its an extremely capable agentic coding solution and at 20 bucks a month you cant beat the value.

[-]

joshitinus@reddit

I agree with you. I’ve been using both the CC and Codex Pro plans for about six months. CC consistently hits a rate limit message, but I haven’t experienced this issue with Codex during that time.

[-]

normundsr@reddit

Codex is great

[-]

Sensitive_Song4219@reddit

GLM4.6 (via Claude Code) is excellent as a Sonnet replacement.

Then escalate complex stuff to Codex. Codex CLI has nice model variety and pretty reasonable limits even on the $20 plan.

[-]

joshitinus@reddit

Can you please explain how to use GLM4.6 via CC? I've a CC & Codex pro plan. I, too, find that Codex is much more generous than CC regarding rate limits. Thanks.

[-]

Food4Lessy@reddit

Plan B , local llm. Budget $900 to $4000. 64gb-128gb vram. Divide by 3 years. $300-$1300/yr

30B coder llm with AMD 395

Plan A use the top 10 cloud coders and api. GLM, Kimi, Google, Codex, Github Copilot

$50-100/mo or $500-$1000/yr

Your rig is only for super simple 4gb-8gb llm used for learning, not for advance coding(16gb-64gb)

[-]

Worth_Wealth_6811@reddit

For unlimited Claude-like coding performance on a budget, try Grok 4 - it's often neck-and-neck with Claude 4.5 on benchmarks and has no strict message limits for subscribers. With your GTX 1650 Super, start locally with quantized 7B-13B coding models like DeepSeek-Coder or Qwen2.5-Coder via Ollama for decent speed and zero ongoing costs; if you need more power, rent cheap cloud GPUs from RunPod or Vast ai starting under $0.50/hour.

[-]

sahilypatel@reddit

i’ve been using minimax m2 and glm-4.6 on okara, and the outputs are on par with sonnet 4.5, at a much lower cost.

[-]

AllegedlyElJeffe@reddit

TLDR; build app with claude = hours to days; build same app with local mode = weeks to months

I have 32GB of VRAM (M2 macbook), here's what it's been like for me to code with local models (which I do a lot for privacy paranoia, conspiracy, blah blah blah reasons):

48B Dense Models max context: 16K tokens ᵇᵉᶠᵒʳᵉ ᵗʰᵉ ʰᵉᵃᵗ ᵈᵉᵃᵗʰ ᵒᶠ ᵗʰᵉ ᵘⁿᶦᵛᵉʳˢᵉ speed: 6 t/s code quality: usable for implementing plans from larger models mistakes: 2 to 3, can fix on second pass time per task: hours

32B Dense Models max context: 32K tokens speed: 10 t/s (forever with agentic coding) code quality: usable for implementing plans from larger models mistakes: like 5 timer per task: 1 hour

30B MoE Models max context: ~50K tokens speed: 50-100 t/s code quality: good for reasonable changes to a code base mistakes: also 5, but it can fix them all in subsequent passes time per (simple) task: 10-15 minutes

[-]

Loskas2025@reddit

Buy 2 x RTX 6000 96gb

[-]

Maximus-CZ@reddit

What page is that?

[-]

Loskas2025@reddit

https://www.swebench.com/ compare result - resolved by instance matrix

[-]

chibop1@reddit

I sub to all 3 $20s: Claude, gemini, ChatGPT, and use claude code, Gemini-cli, and codex in that order.

[-]

Caffdy@reddit

what does $20/mo Claude gives you?

[-]

accidentally_my_hdd@reddit

Minimax M2 is quite close to sonnet 4.5 on some coding and ops tasks, but you are looking at at €47k server build. Tokens are heavily VC subsidized at the moment

[-]

AXYZE8@reddit

Your specs aren't good enough.

Claude Code on subscription is already very good value proposition, but you may try GitHub Copilot $10 plan (GPT5 mini unlimited) or Windsurf $15 plan (right now GPT5.1, GPT5.1 Codex, DeepSeek R1 are unlimited and Kimi K2/Qwen3 Coder costs x0.5 request so basically 1000 requests included in that $15 plan).

GLM Coding plan is also some option, but if GLM doesnt work for some task then you're out of luck, whereas with GH Copilot/Windsurf you just change model and retry, so I think it just saves a lot of time.

[-]

bobith5@reddit

Imo OP should sign up for a random community college class for the free year of Gemini Pro and Cursor. $1000 isn't enough for the machine they're trying to build.

They can then just bounce between Gemini CLI, Cursor, Antigravity, Qwen code CLI free tier, etc after they hit their CC usage limit for the week.

[-]

pascal_seo@reddit

What you mean by for free gemini and cursor? What does this have to do with going to college? Could you eloborate?

[-]

Dry_Explanation_7774@reddit (OP)

because you can sign up to the "student pack" and they give you a year or something like that for free of the pro plan

[-]

pascal_seo@reddit

But how would you use that in cursor? This does not include an API Key as far as I know?

[-]

bobith5@reddit

Full disclosure I haven't actually signed up for the cursor student plan yet I'm waiting to the very end of the year to minimize crossover with my other trials.

That being said my understanding is Cursor Pro comes with access to certain models through Cursor. Similar to how Perplexity Pro allows for you to choose between different models for search.

[-]

Round_Mixture_7541@reddit

Use GLM-4.6 via z.api, it's like $3/mo and the model is close to sonnet lvl. Most likely, you won't even make the difference.

[-]

drwebb@reddit

I was big GLM 4.6 user, but DeepSeek v3.2 too good to miss, and cheap enough really

[-]

Dry_Explanation_7774@reddit (OP)

what kind of tasks are you doing with those models?

if you are coding with them, do you really notice a better difference on coding performance with deepseek v3.2 than gml 4.6?

[-]

drwebb@reddit

I'm actually building a multi-agetical orchestration framework within context, the improvements to tool calling in the CoT reasoning stage is the game changer. So it's a pretty researchy task, but it's got me excited.

[-]

Round_Mixture_7541@reddit

Oh, what's the price difference? I'm currently on the $15/mo plan, never reached the limits yet...

[-]

Professional-Risk137@reddit

This works for me as well!

[-]

Round_Mixture_7541@reddit

It's incredible. I'm using it to test my own deep agent. The most beneficial thing abot this is not to have to worry about token usage...

[-]

Professional-Risk137@reddit

I kept running into limits with the Pro package. Switched to api usage, really annoying.

[-]

sigiel@reddit

use api, no limit there, but claude is pricy , anthropic is making profit.

it show the exact true cost of ai.

sota claude 4.5 opus is 75$ per million token output. through api.

no avoiding that.

for 20$ you get a lot.

rest is bellow quality , maybe google one at 39$.

[-]

Dry_Explanation_7774@reddit (OP)

wondering if they are actually making profit with subscriptions

[-]

robertpiosik@reddit

Code Web Chat plugin in VS Code lets you send code to many chatbots/APIs and apply responses. Author here.

[-]

Dry_Explanation_7774@reddit (OP)

good one!

[-]

sammcj@reddit

As others have said - you're not going to get anything useful for agentic coding with just 16GB. Even with 96GB you'll only be able to run models about as good as Sonnet 3.5 was at best.

[-]

layer4down@reddit

Personally, I have a z.ai Coding Max subscription of GLM-4.6. My philosophy is if I can get a model that's even only 80-90% the quality of Sonnet 4.5 but 80-90% less cost, then that's a no brainer. While I can say that Claude Sonnet 4.5 is a little better on average, that like 5-10% boost isn't worth 10x the price IMHO.

The Coding Max subscription is regularly $60/month ($720/yr) and was 50% off year one so I got it for $360 a few months back. I see there's an extra 30% off for Black Friday, so currently $252 for year one.

Anthropic Claude Max x20 was something like 800 prompts/5hrs for $200/month.

Z.ai Coding Max is a fraction of that for 2400 prompts/5hrs (\~$20-30 year one, $60/month thereafter)

I started running GLM-4.6 within within Claude Code and never looked back. Reduced my Claude spend to $20/month (and frankly rarely use it) and I've never hit a limit with GLM in probably 6 months or more of use. Occasionally I'll hit the same full context window limitations as Sonnet but that is easily fixed with a quick **/compact** command.

Right now I run GLM-4.6 in Claude Code, Roo Code, Kilo Code, Open Code, whatever I want.

My favorite tool is actually Claude Flow v2 by ruvnet on GIthub) and I routinely run 4-8 agents at once to swarm a problem. No usage limit issues whatsoever.

[-]

layer4down@reddit

Only thing I miss from Sonnet 4.5 is it's multimodal. GLM-4.6 is text only, but if I really need image-to-text I just use a local model or GLM-4.5V or another model altogether if needed.

[-]

anonynousasdfg@reddit

If you use the pro version ( I think they also started giving the same for the basic 3$ version too in a limit) you can actually use their MCP server for image/video interpretation for free.

[-]

layer4down@reddit

If you have additional information on that I’d like to check that out thanks.

[-]

anonynousasdfg@reddit

https://docs.z.ai/devpack/mcp/vision-mcp-server

[-]

layer4down@reddit

Excellent thanks

[-]

anonynousasdfg@reddit

You're welcome

[-]

Jollyhrothgar@reddit

Try open code with GitHub copilot models, you can use Opus 4.5. Or try cursor. I use Claude, cursor, and open code, and they can all be good.

[-]

Amgadoz@reddit

Are you getting paid to write code?

If yes, pay for a good subscription from Z AI or celebrasa. Use a frontier open model like GLM-4.6, Qwen-3-coder or something similar. It should cost around 100$ per month, which is just a business expense for you (think of it like paying for gas/commute/wifi/mobile/shirts/shoes/etc).

If no, run qwen3-coder-14B locally on your GPU and call it a day.

[-]

j17c2@reddit

if you're getting paid to write code, you probably shouldn't be using z.ai lol

[-]

Amgadoz@reddit

If you think Zai will train on your data and Anthropic won't, I have a bridge to sell you.

[-]

j17c2@reddit

you could probably sell a billion bridges then if you ask any company if they'd buy z.ai subscriptions for their employees. i'm sure many would quote privacy and security

[-]

evia89@reddit

whats wrong with glm? I use it inside CC and its a budget beast

[-]

bobith5@reddit

I know it's a local LLM sub, but if you're recommending OP pay $100 for a subscription wouldn't the obvious choice be for them to upgrade from Claude Pro to Max?

[-]

Amgadoz@reddit

Claude is less tokens per buck compared to the open models, even when using the most expensive subscription. The reason is because Anthropic has a monopoly over it and they are over-subscribed. Very simple economics.

[-]

Weary_Long3409@reddit

Check Qwen3-480B-Coder on Nebius AI. They have a relaxed rate limits. I only use 2 paid endpoint: OpenRouter and Nebius.

[-]

mrtie007@reddit

ollama free tier with qwen3-coder-480b, you can really bash away at it, very generous free tier

[-]

olplyn@reddit

If you have an AWS account, you can configure claude code to use claude models from Bedrock. That way you pay for model usage on AWS, and not subject to same limits. https://code.claude.com/docs/en/amazon-bedrock

[-]

BidWestern1056@reddit

npcsh with a qwen model https://github.com/NPC-Worldwide/npcsh and if you want a ui look to npc studio https://github.com/NPC-Worldwide/npc-studio

[-]

mtbMo@reddit

You can run 30b/70b models with decent vram. Might gets you some local Ai, but this will not compete with a trillion parameter size model running on more than 100 GPUs like gpt-5

[-]

sylntnyte@reddit

Commenting to read later

[-]

autoencoder@reddit

Check out the cost vs performance of various models. Choose a different supplier (for open-source models you have many), or figure out the hardware you need yourself. But usually you can't compete with companies regarding the cheap hardware financing.

https://artificialanalysis.ai/?cost=cost-vs-intelligence

[-]

Professional-Risk137@reddit

I've bought z.ai, to use it in Claude Code. Tried to use Claude with a local llm but it is not fast enough / usable.

[-]

Disastrous_Meal_4982@reddit

My needs aren’t that great. Mostly just breaking up python code into classes and creating IaC. I’ve been testing out several models that can fit in 32GB of vram. It’s working great so far. That said, a subscription or two would have probably been cheaper and taken less of my time. I’m up to 3 systems with 8 total GPUs. Just getting these systems running was fun for me. If I were to start all over, Id buy the best single GPU I could afford so that I have something local to play with and not burn tokens on a subscription as much as possible, but Claude or Gemini is where I’d sub to. Maybe glm…

[-]

ArchdukeofHyperbole@reddit

Idk about Claude capabilities but I've had pretty good experience with Google Gemini flash in the past. It has 1M context and if nothing's changed in the past few months since I last used it, it's free and unlimited messages.

[-]

would-i-hit@reddit

OP is a moron jfc. and if/when Anthropic IPOs we are going to wish we had these prices

[-]

UnfortunateHurricane@reddit

What are people thinking about perplexity pro?

You can fully omit the websearch aspect and can use the models directly. You get smaller context 32k afaik but I am not sure if they get throttled anywhere else.

[-]

lurkingtonbear@reddit

These questions are so funny. If you think Claude’s limits were bad and you didn’t want to pay more, wait until you see what you’d have to pay to match their performance. Spoiler alert, you can’t get.

[-]

SourceCodeplz@reddit

I don't know, really. I've tried Gemini and Claude Code. Claude Code is above anything else for coding. I did get into limits with the $20 plan but I just took a break and came back later.

[-]

Dry_Explanation_7774@reddit (OP)

i was doing the same thing until i spent my weekly usage and can't use it anymore after a few days

[-]

kev_11_1@reddit

Antigravity gives this model with limits, but also Gemini3 Pro is free, so no complaints.

[-]

Low-Opening25@reddit

The usage limits aren’t ridiculous. If you use Claude over API from any provider you will quickly find that you would pay multiples of the subscription in API fees. Local LLMs are unfortunately unsuitable and results are poor compared to best in the class paid models.

[-]

Dry_Explanation_7774@reddit (OP)

I also thought like that at the beginning. Subscription was much cheaper than API usage when I began.

But i found some people running Claude API and it being more cheaper than when using subcription.

Maybe with a program that optimizes the prompt or whatever. Or maybe it's fake what I heard

[-]

Low-Opening25@reddit

never heard if anything like it

[-]

pokemonplayer2001@reddit

Imagine, if you will, something called a "search engine"....

[-]

Dry_Explanation_7774@reddit (OP)

You are right, before asking the question i searched on google, even on perplexity pro. Sometimes those searches are outdated and don't give me fresh and high quality answers. When I told perplexity to search "november 2025 reddit" it linked me to some threads including this forum LocalLLaMa.

I found that here in this forum there are a lot of people who really know about AI and I've seen some people solutions to other threads that IMO a "search engine" would never come up with.

[-]

pokemonplayer2001@reddit

You did all that an found nothing?

C'mon.

[-]

vicks9880@reddit

There are lots of post online tricking claude to get more 5 hour limit. Ask something 2-3 hours before you plan your coding session. And then when you start coding your current limit will reset in 2 hours. And you can continue coding for extended period

[-]

nad_lab@reddit

People will hate but Ollama local if cloud is amazing imo, and they make it simple to run or use any model they offer, and their discord is active which is nice lol

[-]

evilbarron2@reddit

In my experience: nothing. Claude are the best all-around models.

However - Claude is laughably expensive and crippled by rate limits, and it still makes plenty of stupid expensive mistakes.

More importantly, I don’t need “the best” to get all of my work done, so I pay a fraction of Claude for Kimi and Minimax M2 and get a ton of work done while everyone else is tweaking their tools to accommodate “updates” to the “best” model.

[-]

Terminator857@reddit

When I went to drive.Google.com I saw an offer for Gemini pro for half off for two months. I have Gemini pro twice. Also have codex. For me Gemini pro is better than Claude for creating new stuff. I also have local model. Local is great when not very complicated tasks. Claude excels at complicated tasks. I've heard good things about open router, so maybe I'll try that next. I'm enjoying my strix halo, so I recommend it. I bought a bosgame m5.

[-]

Mtolivepickle@reddit

Take Kimi k2 api key and use it inside of Claude code via Claude’s api key swop. You get all the functionality of Claude at a fraction of the price. Or better yet, stay with Claude subscription wise and use it until you reach your limit, then switch to the api key. It’s dirt cheap that way.

[-]

dash_bro@reddit

I've not had any complaints with the GLM Pro plan (15/mo) and setting it to glm-4.6. Plug it in with Claude code (follow the official guide on GLM to do this, takes 5 mins to do)

Then an API key for Gemini CLI + Qwen CLI

Between these three, I've been able to handle general software/coding work. Unless you're looking for a professional developer experience and work related software, this should work.

If you're using it for work, switch over to Cursor and use Claude for planning and Gemini/GPT for coding. Even Grok makes a decent enough option for following detailed plans.

[-]

Equivalent_Cut_5845@reddit

I think Google AI Pro plan is a great bang for your buck as you can share the plan with 4 or 5 others, and if you don't need to share to actual people then you can share to your other google accounts and have 5x or 6x the rate limit on gemini app and gemini cli.

[-]

GrennKren@reddit

For local LLMs, you can check recommendations from other users based on the kind of device you have. I don’t have a powerful computer myself, so I can’t really try local models.

As for Claude, you could try buying credits for token usage instead of getting the subscription. With credits, you just pay for however much you end up using. I’ve never used the subscription, so I’m not sure which one saves more money. Since I don’t use it that often, I personally prefer buying credits.

Lately, I’ve actually been buying credits on OpenRouter instead of directly on Claude, because you can use the same credits for different models

[-]

j_osb@reddit

There's quite literally no local model that is easily ran that comes close to sonnet 4.5, not even speaking opus 4.5.

Minimax M2, Deepseek v3.2, glm 4.6 and kimi k2 thiking are all great models. Not sonnet 4.5 tier, but... great models nontheless.

If you want to run any of these models locally, though, in this ram economy, be ready to shill out a ton of money.

[-]

Bob5k@reddit

Glm coding plan hands down. 10% cheaper aswell with my link - connect it to Claude code and roll.

[-]

Alywan@reddit

Mate, if i write 20 mins of code using OPUS thruogh API, it would cost me 20$ minium, and if i don't manage the context well that could reach 100$ easily.

What do you expect to get from a 20$ subscription ?

[-]

Dry_Explanation_7774@reddit (OP)

I know what you mean, that's why i'm looking for alternatives or solutions.

Maybe running a different LLM that performs good and it's cheap.

Or building a custom local llm solution?

Maybe there's someone achieving super good results like Claude but with a local llm solution.

Then there is domain-specific language model, maybe there is something for "SQL" coding for example, then another specific language model for "Express", another for "MongoDB". (this may be super specific, but you get the idea)...

Or maybe someone is able to use Claude API in a way that is optimized and spend less than claude code or whatever. Be it for Opus 4.5 or Sonnet 4.5.

[-]

Dependent-Today-133@reddit

Don't except Claude quality, but you can use GLM 4.6.

You can't run Claude cheaply, it's expensive and owned by a single company.