I'm tired of claude limits, what's the best alternative? (cloud based or local llm)
Posted by Dry_Explanation_7774@reddit | LocalLLaMA | View on Reddit | 180 comments
Hello everyone I hope y'all having a great day.
I've been using Claude Code since they released but I'm tired of the usage limits they have even when paying subscription.
I'm asking here since most of you have a great knowledge on what's the best and efficient way to run AI be it online with API or running a local LLM.
I'm asking, what's the best way to actually run Claude at cheap rates and at the same time getting the best of it without that ridiculous usage limits?
Or is there any other model that gives super similar or higher results for "coding" related activities but at the same time super cheap?
Or any of you recommend running my own local llm? which are your recommendations about this?
I currently have a GTX 1650 SUPER and 16GB RAM, i know it's super funny lol, but just lyk my current specs, so u can recommend me to buy something local or just deploy a local ai into a "custom ai hosting" and use the API?
I know there are a lot of questions, but I think you get my idea. I wanna get started to use the """tricks""" that some of you use in order to use AI with the highest performace and at lowest rate.
Looking forward to hear ideas, recommendations or guidance!
Thanks a lot in advance, and I wish y'all a wonderful day :D
-Crash_Override-@reddit
Claude is the best, period. Nothing locally hosted will come even close.
Pay for the max x20. I can work on multiple projects at the same time for hours on end and never hit limit. Worth every penny of $200.
vull23@reddit
Boy did this age well :D did the same, for a few days now im htting session limits constantly and im hitting weekly limits in 2 days. Before, i wasn't hitting session limits at all (90% max) and i was rarely hitting weekly limits, so, yeah, looking for alternatives to claude tbh
-Crash_Override-@reddit
Im not sure what aged poorly? Everything here still stands.
Locally hosted models still do not hold a candle to claude. You dont say what plan you are on but as a heavy user my Max 20x plan is still doing great.
If youre looking for an alternative to claude its Codex...and thats about it.
No_Nefariousness_783@reddit
Hack the planet!!!!
Dry_Explanation_7774@reddit (OP)
Are you currently using opus 4.5? or sonnet 4.5? or both
-Crash_Override-@reddit
Opus 4.5 95% of the time.
Sonnet 4.5 fuggs tho. Its an incredible model.
noiserr@reddit
Don't sleep on Haiku. It's really fast and it has one of the lowest hallucination rates. So for easy tasks that require a lot of changes. It's absolutely worth it.
-Crash_Override-@reddit
Haiku is great. I usually configure my documentation, git, and cleanup agents to use it.
Gudeldar@reddit
For really simple refactoring stuff I use GPT 4.1. It's super fast and doesn't use up any of my CoPilot budget.
Successful-Bowl4662@reddit
The only problem is that you really have to tell it to do something. It always tries to go where the fence is the lowest but this could be the problem with all of the 0x models.
BalStrate@reddit
Istg.
Sometimes I feel like I'm hitting a bottleneck speedwise especially considering the task difficulty and I remember to switch to haiku. Blazing fast.
Bl4ck_Nova@reddit
Yup. And then if you need 1M token context window that functions, Gemini 2.5 Pro.
leobesat@reddit
Tired of Claude limits, what’s the best alternative for coding (cloud or local)? Akso worth running a local LLM with a GTX 1650, or better to stick with APIs?
Short_Criticism1426@reddit
If you want to run your own trained open-source models, the Dedicated Endpoint on Novita AI offers dedicated GPUs at an affordable price. You’re billed by the second based on actual GPU usage. There’s no speed throttling, you have full control over your endpoint, and you won’t be affected by rate limits on shared inference services.
WhyFactor@reddit
If that's your bottom line run Claude Code local CLI in terminal with qwen3.5:397b-cloud and you'll get everything you enjoy now free until that haunting Ollama limit rate happens which happens about the same as when you use the Claude code $20 subscription and renews about the same. Then it's an easy switch to a local model like qwen3.5:2b (when you reboot/'ollama launch claude') which should fit on even your machine. With the $20 Ollama subscription which gives you 50x more than free, it works better and I feel like I'm supporting a community that has served us all so well for so long FREE. I'm a retired 70 year old dev guy who's been round a few blocks and on a pension so I have all the time in the world to dig deep for the best dealz :))
Rhaedonius@reddit
What is you workflow? There is a big difference between model chatting with focused tasks and setting up an environment with lots of mcp servers and hoping for the best with prompting and letting Claude managing the entire project. If you expect good tool calls then Claude is probably still the best. For just chatting there are plenty of good quality options, but local models all require a high end pc. Figure out how you are using the tool first, it may help you optimize what you already have. Just remember you don't need opus for everything, sonnet is very capable and haiku will get the job done most of the time if you are asking precise things. If you find that it takes multiple prompts to get things done, you have probably a very polluted context. Always start fresh when you can, load only the tools and rules you really need and follow the anthropic guideline for prompting, so the model doesn't waste tokens doing things that are not relevant to your task. Also, depending on your level as a programmer, it might be better to spend that money into getting better and learning. This is probably a way better use of time and money than throwing it at a piece of code doing multiplication on some numbers and hoping it spits out the change you want.
Dry_Explanation_7774@reddit (OP)
I use it for coding. And i already have coding experience, so i guess it helps when prompting things to the AI and helping the AI identifying the error happening.
I usually divide the project into different sections, and i go very specific on the task i want to accomplish, prompting it in order for claude to PLAN then when the plan is good and with the best practises, i then let it code with tests. Once all the tests are passing and correct, i then move to the next task, let it plan, then code ... etc... etc... etc...
WhyFactor@reddit
If that's your bottom line run Claude Code local CLI in terminal with qwen3.5:397b-cloud and you'll get everything you enjoy now free until that haunting Ollama limit rate happens which happens about the same as when you use the Claude code $20 subscription and renews about the same. Then it's an easy switch to a local model like qwen3.5:2b which should fit on even your machine. With the $20 Ollama subscription which gives you 50x more than free, it works better and I feel like I'm supporting a community that has served us all so well for so long FREE. I'm a retired 70 dev guy who's been round a few blocks and on a pension I have all the time in the world to dig deep for the best dealz :))
AmesTracing@reddit
most people end up mixing local and API. local for small stuff, cloud for heavier tasks since hardware becomes the limit
vicks9880@reddit
Google’s subscription and Antigravity currently has no limits, as far as I have tried
frettbe@reddit
Actually they set limits, now
sam7oon@reddit
i code all day, on the pro subscription , never reached it yet
Cute_Purpose3732@reddit
Mine first few weeks subscription pro plan 5hr reset including Claude..now all 6d x hr reset including their own Gemini pro
sam7oon@reddit
yep , my comment did not age well , now i have cancelled Gemini subscription and have Copilat & Opencode Go subscriptions for the same. price,
Am ony using the small models, found out i dont need something cutting edge
sam7oon@reddit
yep , my comment did not age well , now i have cancelled Gemini subscription and have Copilat & Opencode Go subscriptions for the same. price,
Am ony using the small models, found out i dont need something cutting edge
vicks9880@reddit
Oh no, the honerymoon period is over them
Rumblestillskin@reddit
Antigravity has limits.
ahmetegesel@reddit
I see many comments in many subreddits. Some complain so hard that they claim it is bs, some say it has basically no limits. I really wonder how those limits work and what those who reached it actually did to reach it that fast.
rajwanur@reddit
Google was generous at the beginning, resetting the limits in 5 hours, but now they have a weekly limit. Although they claimed that usage limits have improved, I really have doubts. With normal usage, I hit the limit in one day and have to wait until December 12 for it to reset.
Cute_Purpose3732@reddit
Soon all these vibecoding will become premium tools for everyone cos of high demand
ahmetegesel@reddit
How long of a conversation or set of tasks you have completed until you hit the limit? I just started on my side project with it and probably have done 1 big planning task which consists of 6-7 turns of conversation and some file editing, and of course reading the codebase which is 15-20 small ts/tsx files.
rajwanur@reddit
I guess I did about 5 big tasks, each consisting of 5-10 turns, including reading, file editing, and running commands.
vicks9880@reddit
I have built an entire web app with db and auth and all. And never once seen limit error or anything on antigravity. I have gemini subscription of 21€ something
krileon@reddit
Until it wipes your hard drive.
foodwithmyketchup@reddit
shhh!!!
Healthy-Row-16@reddit
With your 1650 Super local is gonna be rough honestly. I was in the same spot and ended up trying MiniMax Agent for coding stuff, their M2 model hits pretty well on SWE-bench benchmarks and you get daily free credits. Not Claude level context but for most tasks it's been solid without the random cutoffs mid session.
annakhouri2150@reddit
The z.ai GLM coding plan doesn't actually use GLM 4.6, but a cheaper, less well-done, smaller model, and highly quantized at that. I recommend https://synthetic.new instead, they give you a general purpose API endpoint and key with a set number of API calls (with tool calls massively discounted) and access to an excellent selection of SOTA open source models for a monthly subscription; their hosting is very high quality, you get very good usage limits for the price, and they're very active and responsive in the community Discord.
ssassam@reddit
Simple pricing.
Subscribe for $30/month, bla......bla....................
Dry_Explanation_7774@reddit (OP)
Are you sure on this?
The GLM Coding Plan subscription pages explicitly describe it as “powered by GLM‑4.6” and show it as the model used in coding tools.
If they don't really use gml 4.6 at all lmk where you found that info, or how you know it?
annakhouri2150@reddit
Gonna have to come clean here and say I remember seeing proof, but now can't find it, so I retract that statement. But I have seen a lot of complaints about the coding plan's quality anyway.
HelicopterBright4480@reddit
Where did you get that info? That would be pretty major news, as when starting out, GLM 4.6 seemed really solid, and I am unsure if now I have been spoiled by Gemini 3 or if they actually made it worse by quantizing.
tmvr@reddit
Then no, don't, especially if you expect Claude quality.
To be fair, paying more than $20 and expecting the world is a bit naive, if this is something you really need than going for the $100 plan should not be a problem.
Britbong1492@reddit
I have Cursor Ultra $200pm, and it lasts about 7 days on Claude 😭
Possible-Basis-6623@reddit
Sounds like a scam plan lol
Britbong1492@reddit
Yes Cursor is a scam plan. Pay for Claude Code directly, then use the VS Code plugin, I can't get anywhere near the limits now. Cursor was scamming us
Dry_Explanation_7774@reddit (OP)
Do you have a recommendation for a "mini pc" i can buy or something like that? with a budget less than 4 figures. More into the 3 figures budget and what kind of models I can run with that kind of "mini pc" or whatever the technical name is.
my_name_isnt_clever@reddit
You're probably looking at $2,500 minimum by purchasing a 128 GB AMD Halo Strix machine.
MichinMigugin@reddit
Just in memory.
grabber4321@reddit
good models start around 80-120B and even then they will be less competent than online ones.
local with $$$ limits will always be limited to doing small chunks of code at a time.
If you really need to, get 3090 or two 5060 ti 16GB and figure out how that works. You'll be able to run okish models like:
Qwen-3:30B GPT:OSS:20 / 120
Mkengine@reddit
In this area, you either invest time or money. One of the cheapest options right now would be to get 3x AMD MI50, which cost me $330 when they were cheapest and give me 96 GB VRAM, which is enough to run GLM 4.5 Air or GPT-OSS-120B. But you have to be aware that you'll have to tinker with it. These graphics cards don't have their own cooling system, so such a server is extremely loud, or you have to brew your own cooling solution. I'm going to remove the backplate and repurpose an AIO water cooler, which is a very big risk because the cooling pad comes into contact with the bare silicon chip and can break, which would ruin the GPU. What I'm trying to say is, either
calvintiger@reddit
A more expensive subscription to Claude is well within your budget, and you’ll get way better results than trying to DIY anything yourself.
tmvr@reddit
There is nothing in that range. To even run some more usable models (GLM Air or gpt-oss 120B) you need a machine with 128GB RAM and you will not get that under 1000. Plusz if it is not a Strix Halo or something with an M4 Pro and 256bit 8000+ MT/s DDR5 then the speed will not be enjoyable even with the MoE models. At least not for larger/longer generations. Plus the prompt processing speed is a fraction of even a consumer Geforce RTX cards not to mention the enterprise hardware you have behind the hosted SotA models.
Especially with the current situation on the market with RAM you can not put something together for any reasonable budget. I mean even the 96GB DDR5-5600 RAM kits that you can max out a mini PC with are going for 800+ if you find them in stock.
Lonely_Ad3016@reddit
Been in the same boat. Tried MiniMax Agent since their M2.5 scores 80.2 on SWEbench and the $19 MaxClaw tier bundles API costs. Not great for complex refactoring but for boilerplate and deployment tasks it honestly holds up fine
AcanthaceaeSlow7184@reddit
If you’re on macOS, I actually built a small free menu bar app that shows your current Claude usage/limits at the top of the screen so you don’t have to keep checking the website.
It’s open source and free, in case it helps: https://github.com/DaniilKimlb/ClaudeUsage
cptkong@reddit
Synthetic.new is the checpest alternative for oss llm model inferences
Pop317@reddit
Dude I'm with you. I have paid for upgrades but it's almost like it's deliberately wasting messages to get me to pay even more. I'm done. I can't just stop working until 8am the next day after it's wasted hours of my time.
unimtur@reddit
honestly claude api might just be cheaper than you think depending on usage, way better than dropping thousands on hardware
iluvecommerce@reddit
I completely understand the frustration with Claude's limits! I built Sweet! CLI (https://sweetcli.com) specifically to address these exact pain points. Here's how it solves the limitations you're hitting:
1. No arbitrary usage caps - Built on the strongest open source models (US-hosted), so you're not subject to a single vendor's rate limits or usage caps.
2. Cost-effective operation - Roughly 1/5th to 1/10th the cost of Claude/OpenAI for comparable output. No surprise bills or worrying about token counts.
3. Autonomous long-horizon work - Unlike chat interfaces that need constant prompting, Sweet! CLI is built for autonomous operation with agentic post-training. Give it a complex task and it works for hours, handling research, implementation, testing, and deployment.
4. Full project context - Reads your entire codebase before making changes, not just the files you have open. Understands architecture, dependencies, and business logic.
5. Terminal-native workflow - Not locked into any IDE or platform. Works with your existing tools and workflows.
6. First principles execution - Operates like a competent engineer: bias to action, read before write, verify everything, protect what's live.
The key insight is that the real alternative to Claude Code isn't just another chat interface - it's an autonomous engineering partner that can handle complete development cycles without constant supervision or hitting arbitrary limits.
We're seeing users give Sweet! CLI goals like "refactor our authentication system" or "implement analytics for feature X" and it handles everything from planning to deployment. The limits disappear when the AI has enough strategic context and autonomy.
We offer a 3-day free trial so you can test it against your current frustrations. As the founder, I built this specifically for developers who are tired of hitting artificial limits with current AI coding tools. Check it out and see if it addresses what you're looking for!
iluvecommerce@reddit
The AI job impact discussion often misses the autonomous company operator category that Sweet! CLI represents.
This isn't about automating specific jobs (coder, marketer, support agent) - it's about creating autonomous business entities that can operate companies. The comparison isn't "AI vs human employee" but "AI-operated company vs human-operated company."
Sweet! CLI demonstrates what's possible when an AI system has: - Full business authority across all functions - Strategic decision-making capability - Long-horizon execution capacity - Cross-domain integration skills - Continuous learning and adaptation
The impact isn't job replacement within companies, but company creation and operation at previously impossible scale and speed. One person with Sweet! CLI can operate what previously required a team of 10. Ten people can operate what required 100.
It's not about taking jobs - it's about changing what's possible with human-AI collaboration at the company level.
iluvecommerce@reddit
Hey! I built Sweet! CLI (https://sweetcli.com) as a direct competitor to Claude Code that addresses exactly the limits you're experiencing.
Sweet! CLI uses DeepSeek V3.2, which performs just as well as Claude Sonnet for coding tasks but without the usage limits and at 1/5th to 1/10th the cost. This means you can run far more agent loops without hitting quotas.
One of the key features is Autopilot mode – you can set it to run for hours or indefinitely, perfect for extended sessions that would otherwise hit Claude's limits.
If you're looking for a limit‑free, cost‑effective alternative with similar capabilities, I'd encourage you to check it out. We offer a 3‑day free trial so you can test it with your own projects.
What specific limits have been most frustrating for you?
jc2046@reddit
your hardware is a potato, and even with the top hardware running local LLMs to code are pretty shitty. Deepseek 3.2 is cheap as chips, you could try that one and see if it works for you
Strong-Strike2001@reddit
What are the best Claude Code alternatives that support the Deepseek API?
ahmetegesel@reddit
OpenCode maybe
redstarling-support@reddit
Synthetic.new has APIs compatible with Claude Code and other clients. Synthetic provides the latest DeepSeeek and a few other excellent choices.
Strong-Strike2001@reddit
That's not what I asked
Various-Meat7996@reddit
Yeah your 1650 is definitely not gonna cut it for anything decent locally - you'd need like 24GB+ VRAM for the good coding models
Deepseek is honestly fire for coding though, their API pricing is insane and the quality is surprisingly good for the cost
migorovsky@reddit
Is this treally true? Even for 128gb vram?
jc2046@reddit
yep, local are like 2 generations apart from the bleeding edge. Sure will work for basic stuff, tho.
littlElectrix@reddit
youre just wrong you can run the best current model, deepseek, locally 100% if you have the vram you don't know what you're taking about.
valdev@reddit
Every part of what you just said is wrong.
I want to help you learn from this though, let me start with a question. How much VRAM do you think is needed to run "the best current model, deepseek"?
littlElectrix@reddit
all I said was if you have the vram you could. you absolutely could run the latest deepseek if you had the vram (admittedly youd need like 600gb) you are not 2 generations behind. You can run a smaller bleeding edge models on 128gb you are not generations behind youre just running a smaller model. You clearly didnt understand what i was saying and are incredibly condescending.
Orolol@reddit
The last deepseek is two generation behind Opus 4.5 in term of coding performance.
littlElectrix@reddit
nothing i have seen supports that:
https://medium.com/data-science-in-your-pocket/deepseek-v3-2-vs-gemini-3-0-vs-claude-4-5-vs-gpt-5-55a7d865debc
where are you are getting that from?
Orolol@reddit
Livebench, swebench.
valdev@reddit
The initial context for this conversation was "Is this treally true? Even for 128gb vram?"
littlElectrix@reddit
I gotta admit at the start of this conversation I thought 128gb vram would get you closer than it can to a good alternative for cloud based. I feel kinda even more depressed but I guess cloud computing is just what you have to work with if you want to use llms well right now.
valdev@reddit
No worries, it is really confusing and quite easy to fall into the trap of thinking it's easier or more accessible than it actually is.
We are in an era of LLM AI's where it's stupidly easy to get one up and running and unimaginably hard to understand the specifics around them.
I train them, interface with them and have a home AI cluster I use and I still run into shit I don't really understand. (And I want to be clear, there are many things the people who create models do not understand about the models themselves either.)
But, don't be depressed. Frankly I would argue most things can be done with local LLM's with even just 100 GB of VRAM. Hell, even 128 GB of normal RAM (If you can bare with it running like 10 tk/s). gpt-oss-120b is pretty darn solid.
Is it going to be great for programming? Not really, but is it more than competent for most things, frankly... yeah. Yeah it is.
But the difference is still night and day between the big cloud models and what you can do locally. The 670b up models are great, though they take so much f*cking money to run it makes no sense to do... unless you are like me and have some mental issues and a flexible definition of "hobby".
tommy-bommy@reddit
Have you run deep seek side by side against Claude, Gemini or codex? It kind of sucks imo and I have a relatively light codebase (<10k LOC)
littlElectrix@reddit
don't listen to this guy 128gb vram could very nearly fit the newest Claude unquantized ( Google says you need 140gb for the model) so you could definitely get something very good running but who has 128gb?
valdev@reddit
Lol, you are wrong. The actual size of "Claude" let's say opus, would likely be somewhere near 1,500 GB of VRAM.
relicx74@reddit
Unless you're running the largest models at a high precision, how would you expect to compete? It's apples to oranges.
CV514@reddit
Depends on the task. I'm managing perfectly with 8GB VRAM.
DefNattyBoii@reddit
How is deepseek as provider/what other reliable providers are there? Last time i tired DS their API was hot garbage ofter 1-2 min+ until the first token arrived and more(not thinking model, actual first token).
No_Afternoon_4260@reddit
Check openrouter, never looked back
DefNattyBoii@reddit
i actually went to openrouter but it ate up my credits extremely fast due to routing went to providers that charged way more (also, we dont know the quant being provided, no way to check if its the "real" ull model)
No_Afternoon_4260@reddit
That's why I set it to the official provider each time. He, at least, has an incentive to provide the best one
redstarling-support@reddit
In October I switched from Claude to z.ai GLM-4.6. z.ai's programmer plan is solid. If you want to try out GLM 4.6 and others such as DeepSeek 3.2, synthetic.new is a solid offering at $20/month. Both z.ai and synthetic give you heaps more usage for $20/month. I've not hit limits as I do even with Claude's $100/month plan.
I find that Claude Code tries to do too much and at times this interferes with what I'd like to get out of the LLM. In these cases I use Octofriend https://github.com/synthetic-lab/octofriend which is sponsored by Synthetic.
jNSKkK@reddit
I managed to refactor a grand total of three small tests and it used half of my Synthetic.new $20 plan usage. How can you claim it gives you 'heaps more usage'? It doesn't even give me as much usage as using Sonnet on Claude Pro. What am I missing?
redstarling-support@reddit
not sure. since I made my post, I've only been using z.ai's plan, not synthetic. I suspect all these systems will have ups and down.
vhthc@reddit
Second this. Cheap plan, very strong model, huge amount of tokens
Imaginary-Carrot2532@reddit
I found https://gentube.app/ to be pretty good for image gen stuff
Remarkable-Dinge@reddit
I suggest also downloading ggoogle antigravity wihch gives free access to claude code. So I switch betwwen VS Code Claude and Google Antigravity Claude when as soon as I hit limits
joshitinus@reddit
Is that still available with the Individual plan?
Remarkable-Dinge@reddit
I’m sure since I bought gemini suba recently as well but it was working fine
Techngro@reddit
Here's my $0.02, OP.
At one point I was sub'd to all three of ChatGPT, Claude Max, Gemini Pro. After seeing how good Claude was, I switched to just Claude Max and Gemini. But $100 was a bit too much for me, so I started looking for an alternative. People were recently hyping up GLM 4.6, so I took the plunge. I dropped Claude to the $20 plan, sub'd to the $45 (3 months) GLM plan and retained the ChatGPT $20 and Gemini Pro.
I tried GLM. I gave it a real chance, but it's just not close to Claude when it comes to complex tasks. Even giving it a detailed spec to work with, the quality just wasn't there for me. I kept having to go back to Cluade for debugging and fixing issues. I'm sure it's fine for simple stuff.
And then, I came across a mention of Google Antigravity. I had tried Gemini before (2.5) for coding and didn't think it was that great, so I wasn't really paying attention to Google's stuff (they have a bunch, Gemini CLI, Jules, etc.). But I decided to give Antigravity a try and I have been really pleased with it so far. I've only been using it for a few days, but I think this is how I will work from now on.
So, my workflow is: Claude and GPT for flushing out ideas, planning, spec design, etc. The Claude limits hurt less when you're only using it for design and debugging, especially if using Sonnet. And GPT is surprisingly good for design and planning. I bounce my design/plan back and forth between the two, and that seems to really work well. Once my design spec is solidified, I take it to Antigravity and let it rip. The limits on Antigravity seem fairly generous, and there are multiple models available.
I'd say give it a try.
joshitinus@reddit
Great info. Did you opt for the Google AI Pro plan?
no_witty_username@reddit
Bruh, just get Codex. I started with Windsurf, then moved on to Claude Code, got sick of Anthropic's bullshit and lobotomizing Claude code every other month and moved to Codex and never looked back. Its an extremely capable agentic coding solution and at 20 bucks a month you cant beat the value.
joshitinus@reddit
I agree with you. I’ve been using both the CC and Codex Pro plans for about six months. CC consistently hits a rate limit message, but I haven’t experienced this issue with Codex during that time.
normundsr@reddit
Codex is great
Sensitive_Song4219@reddit
GLM4.6 (via Claude Code) is excellent as a Sonnet replacement.
Then escalate complex stuff to Codex. Codex CLI has nice model variety and pretty reasonable limits even on the $20 plan.
joshitinus@reddit
Can you please explain how to use GLM4.6 via CC? I've a CC & Codex pro plan. I, too, find that Codex is much more generous than CC regarding rate limits. Thanks.
Food4Lessy@reddit
Plan B , local llm. Budget $900 to $4000. 64gb-128gb vram. Divide by 3 years. $300-$1300/yr
30B coder llm with AMD 395
Plan A use the top 10 cloud coders and api. GLM, Kimi, Google, Codex, Github Copilot
$50-100/mo or $500-$1000/yr
Your rig is only for super simple 4gb-8gb llm used for learning, not for advance coding(16gb-64gb)
Worth_Wealth_6811@reddit
For unlimited Claude-like coding performance on a budget, try Grok 4 - it's often neck-and-neck with Claude 4.5 on benchmarks and has no strict message limits for subscribers. With your GTX 1650 Super, start locally with quantized 7B-13B coding models like DeepSeek-Coder or Qwen2.5-Coder via Ollama for decent speed and zero ongoing costs; if you need more power, rent cheap cloud GPUs from RunPod or Vast ai starting under $0.50/hour.
sahilypatel@reddit
i’ve been using minimax m2 and glm-4.6 on okara, and the outputs are on par with sonnet 4.5, at a much lower cost.
AllegedlyElJeffe@reddit
TLDR; build app with claude = hours to days; build same app with local mode = weeks to months
I have 32GB of VRAM (M2 macbook), here's what it's been like for me to code with local models (which I do a lot for privacy paranoia, conspiracy, blah blah blah reasons):
48B Dense Models max context:
16K tokensᵇᵉᶠᵒʳᵉ ᵗʰᵉ ʰᵉᵃᵗ ᵈᵉᵃᵗʰ ᵒᶠ ᵗʰᵉ ᵘⁿᶦᵛᵉʳˢᵉ speed:6 t/scode quality:usable for implementing plans from larger modelsmistakes:2 to 3, can fix on second passtime per task:hours32B Dense Models max context:
32K tokensspeed:10 t/s(forever with agentic coding) code quality:usable for implementing plans from larger modelsmistakes:like 5timer per task:1 hour30B MoE Models max context:
~50K tokensspeed:50-100 t/scode quality:good for reasonable changes to a code basemistakes:also 5, but it can fix them all in subsequent passestime per (simple) task:10-15 minutesLoskas2025@reddit
Buy 2 x RTX 6000 96gb
Maximus-CZ@reddit
What page is that?
Loskas2025@reddit
https://www.swebench.com/ compare result - resolved by instance matrix
chibop1@reddit
I sub to all 3 $20s: Claude, gemini, ChatGPT, and use claude code, Gemini-cli, and codex in that order.
Caffdy@reddit
what does $20/mo Claude gives you?
accidentally_my_hdd@reddit
Minimax M2 is quite close to sonnet 4.5 on some coding and ops tasks, but you are looking at at €47k server build. Tokens are heavily VC subsidized at the moment
AXYZE8@reddit
Your specs aren't good enough.
Claude Code on subscription is already very good value proposition, but you may try GitHub Copilot $10 plan (GPT5 mini unlimited) or Windsurf $15 plan (right now GPT5.1, GPT5.1 Codex, DeepSeek R1 are unlimited and Kimi K2/Qwen3 Coder costs x0.5 request so basically 1000 requests included in that $15 plan).
GLM Coding plan is also some option, but if GLM doesnt work for some task then you're out of luck, whereas with GH Copilot/Windsurf you just change model and retry, so I think it just saves a lot of time.
bobith5@reddit
Imo OP should sign up for a random community college class for the free year of Gemini Pro and Cursor. $1000 isn't enough for the machine they're trying to build.
They can then just bounce between Gemini CLI, Cursor, Antigravity, Qwen code CLI free tier, etc after they hit their CC usage limit for the week.
pascal_seo@reddit
What you mean by for free gemini and cursor? What does this have to do with going to college? Could you eloborate?
Dry_Explanation_7774@reddit (OP)
because you can sign up to the "student pack" and they give you a year or something like that for free of the pro plan
pascal_seo@reddit
But how would you use that in cursor? This does not include an API Key as far as I know?
bobith5@reddit
Full disclosure I haven't actually signed up for the cursor student plan yet I'm waiting to the very end of the year to minimize crossover with my other trials.
That being said my understanding is Cursor Pro comes with access to certain models through Cursor. Similar to how Perplexity Pro allows for you to choose between different models for search.
Round_Mixture_7541@reddit
Use GLM-4.6 via z.api, it's like $3/mo and the model is close to sonnet lvl. Most likely, you won't even make the difference.
drwebb@reddit
I was big GLM 4.6 user, but DeepSeek v3.2 too good to miss, and cheap enough really
Dry_Explanation_7774@reddit (OP)
what kind of tasks are you doing with those models?
if you are coding with them, do you really notice a better difference on coding performance with deepseek v3.2 than gml 4.6?
drwebb@reddit
I'm actually building a multi-agetical orchestration framework within context, the improvements to tool calling in the CoT reasoning stage is the game changer. So it's a pretty researchy task, but it's got me excited.
Round_Mixture_7541@reddit
Oh, what's the price difference? I'm currently on the $15/mo plan, never reached the limits yet...
Professional-Risk137@reddit
This works for me as well!
Round_Mixture_7541@reddit
It's incredible. I'm using it to test my own deep agent. The most beneficial thing abot this is not to have to worry about token usage...
Professional-Risk137@reddit
I kept running into limits with the Pro package. Switched to api usage, really annoying.
sigiel@reddit
use api, no limit there, but claude is pricy , anthropic is making profit.
it show the exact true cost of ai.
sota claude 4.5 opus is 75$ per million token output. through api.
no avoiding that.
for 20$ you get a lot.
rest is bellow quality , maybe google one at 39$.
Dry_Explanation_7774@reddit (OP)
wondering if they are actually making profit with subscriptions
robertpiosik@reddit
Code Web Chat plugin in VS Code lets you send code to many chatbots/APIs and apply responses. Author here.
Dry_Explanation_7774@reddit (OP)
good one!
sammcj@reddit
As others have said - you're not going to get anything useful for agentic coding with just 16GB. Even with 96GB you'll only be able to run models about as good as Sonnet 3.5 was at best.
layer4down@reddit
Personally, I have a z.ai Coding Max subscription of GLM-4.6. My philosophy is if I can get a model that's even only 80-90% the quality of Sonnet 4.5 but 80-90% less cost, then that's a no brainer. While I can say that Claude Sonnet 4.5 is a little better on average, that like 5-10% boost isn't worth 10x the price IMHO.
The Coding Max subscription is regularly $60/month ($720/yr) and was 50% off year one so I got it for $360 a few months back. I see there's an extra 30% off for Black Friday, so currently $252 for year one.
Anthropic Claude Max x20 was something like 800 prompts/5hrs for $200/month.
Z.ai Coding Max is a fraction of that for 2400 prompts/5hrs (\~$20-30 year one, $60/month thereafter)
I started running GLM-4.6 within within Claude Code and never looked back. Reduced my Claude spend to $20/month (and frankly rarely use it) and I've never hit a limit with GLM in probably 6 months or more of use. Occasionally I'll hit the same full context window limitations as Sonnet but that is easily fixed with a quick **/compact** command.
Right now I run GLM-4.6 in Claude Code, Roo Code, Kilo Code, Open Code, whatever I want.
My favorite tool is actually Claude Flow v2 by ruvnet on GIthub) and I routinely run 4-8 agents at once to swarm a problem. No usage limit issues whatsoever.
layer4down@reddit
Only thing I miss from Sonnet 4.5 is it's multimodal. GLM-4.6 is text only, but if I really need image-to-text I just use a local model or GLM-4.5V or another model altogether if needed.
anonynousasdfg@reddit
If you use the pro version ( I think they also started giving the same for the basic 3$ version too in a limit) you can actually use their MCP server for image/video interpretation for free.
layer4down@reddit
If you have additional information on that I’d like to check that out thanks.
anonynousasdfg@reddit
https://docs.z.ai/devpack/mcp/vision-mcp-server
layer4down@reddit
Excellent thanks
anonynousasdfg@reddit
You're welcome
Jollyhrothgar@reddit
Try open code with GitHub copilot models, you can use Opus 4.5. Or try cursor. I use Claude, cursor, and open code, and they can all be good.
Amgadoz@reddit
Are you getting paid to write code?
If yes, pay for a good subscription from Z AI or celebrasa. Use a frontier open model like GLM-4.6, Qwen-3-coder or something similar. It should cost around 100$ per month, which is just a business expense for you (think of it like paying for gas/commute/wifi/mobile/shirts/shoes/etc).
If no, run qwen3-coder-14B locally on your GPU and call it a day.
j17c2@reddit
if you're getting paid to write code, you probably shouldn't be using z.ai lol
Amgadoz@reddit
If you think Zai will train on your data and Anthropic won't, I have a bridge to sell you.
j17c2@reddit
you could probably sell a billion bridges then if you ask any company if they'd buy z.ai subscriptions for their employees. i'm sure many would quote privacy and security
evia89@reddit
whats wrong with glm? I use it inside CC and its a budget beast
bobith5@reddit
I know it's a local LLM sub, but if you're recommending OP pay $100 for a subscription wouldn't the obvious choice be for them to upgrade from Claude Pro to Max?
Amgadoz@reddit
Claude is less tokens per buck compared to the open models, even when using the most expensive subscription. The reason is because Anthropic has a monopoly over it and they are over-subscribed. Very simple economics.
Weary_Long3409@reddit
Check Qwen3-480B-Coder on Nebius AI. They have a relaxed rate limits. I only use 2 paid endpoint: OpenRouter and Nebius.
mrtie007@reddit
ollama free tier with qwen3-coder-480b, you can really bash away at it, very generous free tier
olplyn@reddit
If you have an AWS account, you can configure claude code to use claude models from Bedrock. That way you pay for model usage on AWS, and not subject to same limits. https://code.claude.com/docs/en/amazon-bedrock
BidWestern1056@reddit
npcsh with a qwen model https://github.com/NPC-Worldwide/npcsh and if you want a ui look to npc studio https://github.com/NPC-Worldwide/npc-studio
mtbMo@reddit
You can run 30b/70b models with decent vram. Might gets you some local Ai, but this will not compete with a trillion parameter size model running on more than 100 GPUs like gpt-5
sylntnyte@reddit
Commenting to read later
autoencoder@reddit
Check out the cost vs performance of various models. Choose a different supplier (for open-source models you have many), or figure out the hardware you need yourself. But usually you can't compete with companies regarding the cheap hardware financing.
https://artificialanalysis.ai/?cost=cost-vs-intelligence
Professional-Risk137@reddit
I've bought z.ai, to use it in Claude Code. Tried to use Claude with a local llm but it is not fast enough / usable.
Disastrous_Meal_4982@reddit
My needs aren’t that great. Mostly just breaking up python code into classes and creating IaC. I’ve been testing out several models that can fit in 32GB of vram. It’s working great so far. That said, a subscription or two would have probably been cheaper and taken less of my time. I’m up to 3 systems with 8 total GPUs. Just getting these systems running was fun for me. If I were to start all over, Id buy the best single GPU I could afford so that I have something local to play with and not burn tokens on a subscription as much as possible, but Claude or Gemini is where I’d sub to. Maybe glm…
ArchdukeofHyperbole@reddit
Idk about Claude capabilities but I've had pretty good experience with Google Gemini flash in the past. It has 1M context and if nothing's changed in the past few months since I last used it, it's free and unlimited messages.
would-i-hit@reddit
OP is a moron jfc. and if/when Anthropic IPOs we are going to wish we had these prices
UnfortunateHurricane@reddit
What are people thinking about perplexity pro?
You can fully omit the websearch aspect and can use the models directly. You get smaller context 32k afaik but I am not sure if they get throttled anywhere else.
lurkingtonbear@reddit
These questions are so funny. If you think Claude’s limits were bad and you didn’t want to pay more, wait until you see what you’d have to pay to match their performance. Spoiler alert, you can’t get.
SourceCodeplz@reddit
I don't know, really. I've tried Gemini and Claude Code. Claude Code is above anything else for coding. I did get into limits with the $20 plan but I just took a break and came back later.
Dry_Explanation_7774@reddit (OP)
i was doing the same thing until i spent my weekly usage and can't use it anymore after a few days
kev_11_1@reddit
Antigravity gives this model with limits, but also Gemini3 Pro is free, so no complaints.
Low-Opening25@reddit
The usage limits aren’t ridiculous. If you use Claude over API from any provider you will quickly find that you would pay multiples of the subscription in API fees. Local LLMs are unfortunately unsuitable and results are poor compared to best in the class paid models.
Dry_Explanation_7774@reddit (OP)
I also thought like that at the beginning. Subscription was much cheaper than API usage when I began.
But i found some people running Claude API and it being more cheaper than when using subcription.
Maybe with a program that optimizes the prompt or whatever. Or maybe it's fake what I heard
Low-Opening25@reddit
never heard if anything like it
pokemonplayer2001@reddit
Imagine, if you will, something called a "search engine"....
Dry_Explanation_7774@reddit (OP)
You are right, before asking the question i searched on google, even on perplexity pro. Sometimes those searches are outdated and don't give me fresh and high quality answers. When I told perplexity to search "november 2025 reddit" it linked me to some threads including this forum LocalLLaMa.
I found that here in this forum there are a lot of people who really know about AI and I've seen some people solutions to other threads that IMO a "search engine" would never come up with.
pokemonplayer2001@reddit
You did all that an found nothing?
C'mon.
vicks9880@reddit
There are lots of post online tricking claude to get more 5 hour limit. Ask something 2-3 hours before you plan your coding session. And then when you start coding your current limit will reset in 2 hours. And you can continue coding for extended period
nad_lab@reddit
People will hate but Ollama local if cloud is amazing imo, and they make it simple to run or use any model they offer, and their discord is active which is nice lol
evilbarron2@reddit
In my experience: nothing. Claude are the best all-around models.
However - Claude is laughably expensive and crippled by rate limits, and it still makes plenty of stupid expensive mistakes.
More importantly, I don’t need “the best” to get all of my work done, so I pay a fraction of Claude for Kimi and Minimax M2 and get a ton of work done while everyone else is tweaking their tools to accommodate “updates” to the “best” model.
Terminator857@reddit
When I went to drive.Google.com I saw an offer for Gemini pro for half off for two months. I have Gemini pro twice. Also have codex. For me Gemini pro is better than Claude for creating new stuff. I also have local model. Local is great when not very complicated tasks. Claude excels at complicated tasks. I've heard good things about open router, so maybe I'll try that next. I'm enjoying my strix halo, so I recommend it. I bought a bosgame m5.
Mtolivepickle@reddit
Take Kimi k2 api key and use it inside of Claude code via Claude’s api key swop. You get all the functionality of Claude at a fraction of the price. Or better yet, stay with Claude subscription wise and use it until you reach your limit, then switch to the api key. It’s dirt cheap that way.
dash_bro@reddit
I've not had any complaints with the GLM Pro plan (15/mo) and setting it to glm-4.6. Plug it in with Claude code (follow the official guide on GLM to do this, takes 5 mins to do)
Then an API key for Gemini CLI + Qwen CLI
Between these three, I've been able to handle general software/coding work. Unless you're looking for a professional developer experience and work related software, this should work.
If you're using it for work, switch over to Cursor and use Claude for planning and Gemini/GPT for coding. Even Grok makes a decent enough option for following detailed plans.
Equivalent_Cut_5845@reddit
I think Google AI Pro plan is a great bang for your buck as you can share the plan with 4 or 5 others, and if you don't need to share to actual people then you can share to your other google accounts and have 5x or 6x the rate limit on gemini app and gemini cli.
GrennKren@reddit
For local LLMs, you can check recommendations from other users based on the kind of device you have. I don’t have a powerful computer myself, so I can’t really try local models.
As for Claude, you could try buying credits for token usage instead of getting the subscription. With credits, you just pay for however much you end up using. I’ve never used the subscription, so I’m not sure which one saves more money. Since I don’t use it that often, I personally prefer buying credits.
Lately, I’ve actually been buying credits on OpenRouter instead of directly on Claude, because you can use the same credits for different models
j_osb@reddit
There's quite literally no local model that is easily ran that comes close to sonnet 4.5, not even speaking opus 4.5.
Minimax M2, Deepseek v3.2, glm 4.6 and kimi k2 thiking are all great models. Not sonnet 4.5 tier, but... great models nontheless.
If you want to run any of these models locally, though, in this ram economy, be ready to shill out a ton of money.
Bob5k@reddit
Glm coding plan hands down. 10% cheaper aswell with my link - connect it to Claude code and roll.
Alywan@reddit
Mate, if i write 20 mins of code using OPUS thruogh API, it would cost me 20$ minium, and if i don't manage the context well that could reach 100$ easily.
What do you expect to get from a 20$ subscription ?
Dry_Explanation_7774@reddit (OP)
I know what you mean, that's why i'm looking for alternatives or solutions.
Maybe running a different LLM that performs good and it's cheap.
Or building a custom local llm solution?
Maybe there's someone achieving super good results like Claude but with a local llm solution.
Then there is domain-specific language model, maybe there is something for "SQL" coding for example, then another specific language model for "Express", another for "MongoDB". (this may be super specific, but you get the idea)...
Or maybe someone is able to use Claude API in a way that is optimized and spend less than claude code or whatever. Be it for Opus 4.5 or Sonnet 4.5.
Dependent-Today-133@reddit
Don't except Claude quality, but you can use GLM 4.6.
You can't run Claude cheaply, it's expensive and owned by a single company.