Is a high-end private local LLM setup worth it?

[-]

high_on_meh@reddit

You really need more than ONE significant reason to blow all that money. I'm "dipping my toes" into local LLMs with an AMD Strix Halo system. It was "cheap" at $2500 (LOL) But I'm a long-career computer nerd that has changed how I work in the last couple months with Claude Code at work. For professional reasons, I need to know how the sausage is made.

[-]

Red_Redditor_Reddit@reddit

Dude if you're going to go local, dip your toes in and start small. You don't need some monster machine to get basic llm's to work.

You're not going to get the same results as some 5T parameter model. That doesn't mean that it can't be worth while.

[-]

WarmRestart157@reddit

Do you think getting a used 7900XTX is a good place to start with a potential of acquiring a second one down the road?

[-]

Monad_Maya@reddit

Depends on the price but yeah. I'm running a 7900XT, should've splurged and got the XTX but oh well.

You can also consider the R9700 Pro. Or the W7800 / 7900 if you can find them for a good price.

[-]

WarmRestart157@reddit

R9700 Pro is great but for the same price I can get two used 7900XTX, wouldn't that be better?

[-]

Monad_Maya@reddit

Yup, two 7900XTX would be better for price/performance. Can you get a w7900 for the same approximate cost (used)? 48GB of VRAM per card allows better scaling.

[-]

WarmRestart157@reddit

w7900 is about 5 times more expensive that what I can pay for a used 7900XTX! Seeing that Qwen3.6-27B was just released I'll bite the bullet and get one :)

[-]

Monad_Maya@reddit

Alright, get two though :)

[-]

zakadit@reddit (OP)

Thanks a lot for the reply. It at least has the merit of somewhat answering my question.

I don’t know if local ML just makes people philosophical, but it’s been surprisingly frustrating to ask what should be a much simpler question than people make it: can I get an experience comparable to what I already have already?

So I’m basically taking the answer as more or less no.

Thanks.

[-]

Savantskie1@reddit

It’s like buying a car, I may like the Ford Mustang, but you may not. I could recommend it over and over, but it might be a terrible fit for you. Choosing an llm, or whether it’d be a good fit for you is completely subjective. And you’re trying to get an experience like frontier models from free models. You can get close, but you’ll never get there, open weights models will get you close, but frontier models like Claude, or ChatGPT are easily about half a year to a year ahead of time. Yeah you could get close with models like minimax, kimi, and a few others, but they’ll always be somewhat behind. We can also explain how it’s worth it to us, but we can never understand if it’s worth it for you

[-]

The_Hanumaniac@reddit

The thing is local is a completely different animal. One of the biggest learning curves I have found is the context you load the model with exponentially changes it's capabilities.

Like someone else said. You should just download some local models that will fit on what you currently have. Make some in depth context files for your specific use case. You mentioned needing it for writing and medical stuff. So make a context document with some examples of your writing and the overall tone you want. Then do some A/B testing between local and what you are currently using. Your own experience will answer your question better than anyone on here can because it really is subjective.

[-]

giant3@reddit

Run a local LLM only for privacy and security.

If you don't need either of it, then it is probably not worth it.

You could still do private by buying GPU time on one of the cloud providers.

For most people, it is not worth it unless you are using it for your job.

[-]

hugo-the-second@reddit

privacy, security and consistency.
Consistency in the sense of not needing to depend on the provider not nerfing the model, or increasing the price. Which both seem to be happening recently.

[-]

MaCl0wSt@reddit

cost itself too. Sometimes I have improvised pipelines where an LLM just acts as glue in the middle, like for semantic ranking, comparisons, or simple routing. In those cases, I basically just need an LLM API endpoint, and if the task is simple enough to run locally, doing it for free is feasible, there’s no noticeable quality difference compared to providers because the task itself isn’t very complex

[-]

zakadit@reddit (OP)

Not doing it for profit or being worth, let’s just say i don’t seek for approval or advices telling me if my reason is good enough but more about the orignal, main and only question i had

[-]

Ell2509@reddit

I totally agree with that guy.

I dropped 10k on a setup, and it is taking me a long time to even get it operational in the way I imagined. It will take even longer to figure out how to make it "worth it".

I actually think it will be, for me, but HOT DAMN is it a lot of work.

[-]

Best_Control_2573@reddit

Simple answer: not as good in raw model performance at peak.

Complex answer: I'll accept this downside in raw model performance in exchange for being free from uncontrolled outages, model nerfing, session limits inevitable price rises & arbitrary banning; provided the delta is not unacceptably large.

And, I feel this year, the delta has finally closed to acceptable levels.

[-]

Orolol@reddit

can I get an experience comparable to what I already have already?

No. Even if you get your hand on a full 8xB200 server, running Kimi 2.6 (which seems to be the best open weight model right now), you won't be at Opus 4.7 intelligence level, and I doubt that you can pull enougth optimization to be at the same speed level in term of prompt processing / token generation.

That said, I have Qwen 3.6 running blazing fast on a 5090, and I consider ditching Opus to use Kimi via API for big intelligence, and local Qwen for fast mid / low level work. I know I won't get the same level of raw intelligence and model output, but this is a sacrifice that I'm ready to do.

[-]

lemondrops9@reddit

lol... yeah I get why you would say people are being philosophical. Its just LLMs are not very cut and dry to which is better.

Right now people are saying the Qwen3.6 35B is better than the Qwen3.5 122B. I run the 6 quant version of both. The smaller 35B is around 38GB with full 256k context. Where as the 122B is 110GB with 90K context.

Where it gets philosophical is that lower quants will of course be smaller but cause more issues. Then people get the fancy tool calling right with one model it can out preform a different model that should be better.

Benchmarks are also subjective, so start off small and works on your system now to figure out where you want to go before you end up with a sports car you dont even want to drive.

[-]

hugo-the-second@reddit

This. It's not just philosophical, it's very practically relevant.

[-]

Fit_Window_8508@reddit

It depends on what you need it to do. I run a local 35B model that handles %70 of what I need llms for.
But some things i still need frontier models. I run that model on a Mac M4 Pro with 48gb of ram.
The set up is almost more important than what model you pick. And it will be a pain to fully optimize it, but it could be worth it depending on what you need it to do. My 35B model can watch videos on youtube most of the day, summarize and narrate them/clip etc. Also does all my personal assistant stuff like reminders, email briefs, some daily searches and reports.
It took me about 2 weeks to dial it in and sometimes it still if feels fragile, but it works and is actually completing tasks agentically.

[-]

ChemPetE@reddit

Yeah, as someone else with an m4 pro and 48 GB ram, I feel like it’s a decent balance between at least some capability and not breaking the bank for something that fit most aspects for me still amount to just tinkering.

[-]

Fit_Window_8508@reddit

It's super straight forward for me. My local model is doing things for free that I used to pay for.

[-]

ea_man@reddit

> can I get an experience comparable to what I already have already?

You can go to an online provider and test the "local models" yourself, es Qwen3.5-35B-A3B.

For performance on a local NVIDIA it will do some 50-100tok/sec.

[-]

octoo01@reddit

All the models you might host locally you can also try first online to find out what it would be like.

[-]

Red_Redditor_Reddit@reddit

I think you should try a smaller model with what you've got first. It's not like it costs you anything. It does better than you'd think.

[-]

chisleu@reddit

I was using $1k/mo in API keys with anthropic. I built a 4x blackwell rig to replace that usage. It should pay for itself faster when I onboard users. Working on stability first.

[-]

Purpose-Effective@reddit

Just rent a pod on runpod. Start it when you need to. If you know how to code you can have it all automated. I click two buttons and it auto starts and connects my terminal to the model.

[-]

zakadit@reddit (OP)

I’m not really seeing how that relates to the simple question I’m asking: can I get the same experience or not?

[-]

Purpose-Effective@reddit

You’re not ready for locally running AI.

[-]

zakadit@reddit (OP)

Yeah, maybe I’m just not ready for some of you.

I’ll get downvoted to hell for this, but damn, some of you talk like ancient masters.

My question was simple: can local LLMs provide an experience that’s as good as the top models?

Some people answered “no,” and actually gave useful detail, like “they’re 1–2 years behind,” which sounds totally reasonable to me. Others said, “maybe, but are you willing to spend a huge amount of time to get there?” Also a fair answer.

But seriously, if I wanted riddles, metaphors, or some philosophical detour, I would have asked for that. I asked a very direct question.

That said, the pod suggestion was actually good advice. I replied a bit like an asshole because by that point I’d already gotten 5 replies, most of them being goofy metaphors instead of answers.

[-]

RobotArtichoke@reddit

I’ll answer your question.

It depends on

[-]

robogame_dev@reddit

Just look at a benchmark. Top local models are 3-6 months behind top proprietary. Does that cover your question?

[-]

Purpose-Effective@reddit

You’re looking at the wrong thing. Proprietary AI models are 100% focused on trillions of params. They have huge knowledge, but it comes at the tremendous expense of covering compute. That’s why subscriptions are so expensive.

Open source focus on making AI models that people can actually run. Open source models are 100% better than proprietary. But you have to give the right tools for them to work.

For example, my little 3.6 Qwen 35B MoE beats Claude Opus 4.6. Why is that? Because I spend the time to code a setup that will work for me. I gave it unrestricted access to the internet to make up for the tiny size, i gave it 1M context, developed a better version of OpenViking for unlimited memory which allows it to hold complete functionality at 1,000,000 token context.

You have to put in the work.

[-]

zakadit@reddit (OP)

of course, hopes it was not to hard to reply only about my question and not adding “wise” yapping.

Thanks !

[-]

Purpose-Effective@reddit

My answer is not riddle, is as straightforward as it can be. I understand that if you haven’t been in the open source model community for a while you can’t understand what I’m talking about.

Runpod is basically a gpu provider, you pay them for the time you use the GPUs. That’s where I run my models. It’s pretty sweet. You don’t have to spend thousands in hardware, which will become obsolete in a few years. I replaced Claude Code with an open source model, it’s great. But you have to give it the right tools, you need to know how to code or at least be able to ‘vibe code’.

If people are trying open source models without giving them professional tools and try to compare it to Claude which has backend servers for hosting their own tools you will have a very bad experience.

It’s not hard but you will definitely spend a day or two figuring shit out. Use runpod, try locally run models, if you like it, buy the hardware and just transfer everything from your pod to your rig.

I can help you, send me a message.

[-]

silverud@reddit

My experience running local models has led me to see them as perfectly viable for workloads that must remain local (sensitive data), but do not compare to frontier models (GPT-5.4 or Opus 4.6).

Then again, I'm comparing models that will run on a MacBook with Apple Silicon and 128gb of unified memory, while running a ton of other apps at the same time, against the leading edge frontier models that are publicly accessible. I'm not even comparing the best or largest or most capable local models - I'm comparing what can run on my laptop without slowing me down.

I think it is fair to say that local models, at least the ones I can run, feel similar to frontier models from 6-12 months ago. Perhaps I'm just easily impressed, but that surely impresses me.

[-]

_millsy@reddit

They’re literally giving you an extremely low cost way to test exactly that. Only you can answer if it’s “good enough” but of course you’re not going to be able to replicate a rack of gpus with a few 3090’s, come on

[-]

zakadit@reddit (OP)

I’m pretty sure I made it clear that money wasn’t the issue, so I don’t really need a low-cost way to test it.

What I genuinely don’t understand is how many of you keep drifting away from the actual point when my question is very simple: if I invest enough, can I get an experience as good as Claude, or is that simply not possible?

I get that the question may sound simplistic, maybe even naive (which is totally understandable), but if you think it’s not a relevant question, then honestly ignoring it would have been a better response than answering something else.

[-]

evia89@reddit

money wasn’t the issue

Buy 8 x b200 server and test then. It will be good, a bit worse than current nerfed opus

Money is not a problem, right? this ~$500k will be good

[-]

TheMisterPirate@reddit

OP, you are missing the point entirely. You're asking a broad and subjective question, and people are telling you how you can get the answer for yourself. You don't need to spend thousands on a local rig right off the bat, you can test the exact same models you will run on that hypothetical rig and see what the performance is like. You can "try before you buy" essentially, for a low cost, using something like runpod. Heck, you can try most of these models using something like openrouter or some other provider too. Use opencode or whatever harness you like.

Are local models as good as frontier models? NO.

Are they good enough? MAYBE, that part is subjective, depends what you're trying to do with them. You can look at benchmarks to see that they've closed the gaps significantly in various areas, but all models have their strengths and weaknesses. Also depending on your local setup you may night quantized versions of the models, and/or have to deal with slower speeds.

But if your main concern is privacy, then local is really the only way to go tbh.

Also you come off as ungrateful in your replies. You asked for people to advise you and they are doing so for free, nobody owes you jack here.

[-]

CapitalDue7249@reddit

Kimi k2.6 is much better than sonnet and almost as good as opus overall and better in some aspects i don’t how much it will cost to run locally but it is probably north of 100k as it is a massive model and is the best open source model available and I would argue it is the second best behind opus overall

[-]

MoffKalast@reddit

I would, but I resent having to pay them a monthly ransom for holding a storage drive hostage.

[-]

Purpose-Effective@reddit

Several hundred? What are you storing?

[-]

dennis_linux@reddit

To me it's SIMPLE. "Cloud based LLM's are NOT YOUR LLM, it's intent is to make money for it's hosting provider" The question is what are you trading a small delta in performance for? Local open source LLMs are gaining on traditional cloud based LLM's. Also do you want to be always be just a user, or do you want to understand what you are using?

[-]

jannycideforever@reddit

It will virtually ALWAYS be cheaper per token to run Kimi in a giant warehouse running constantly at 90% capacity than it is to run a local version that will be idle 90% of the time. It's just economies of scale, and competition is too intense for profit margins to change the math.

The only major exception is if you were going to be using the hardware anyways. E.g., if you want a high end gaming PC, maybe consider splurging for a bit of extra VRAM to run Gemma 4 or Qwen 3.6. But you're not going to get near-frontier capabilities by any means.

[-]

kentrich@reddit

This is absolutely accurate. And it misses one thing that isn’t clear from the numbers; having your own hardware shifts the mental model from “every time I press enter costs money” to “I have free tokens, let me try this!” It’s a completely different about managing people’s behavior to tilt the scale towards trying things instead of discouraging trying things.

So you asked whether it ever makes sense, only if you are trying to solve for something else like human behavior and incentives.

[-]

jannycideforever@reddit

having your own hardware shifts the mental model from “every time I press enter costs money” to “I have free tokens, let me try this!”

I would agree, BUT with the caveat that it's still usually cheaper to just experiment using affordable models unless you're going to have the hardware for other purposes.

An example is a private project where I need to do some image and video generation. Never done anything like that before, and the fact we have such great local options means I'm mostly just using them to learn (and may just use them for everything if they're sufficient for my requirements). However, I'm also a gaming nerd, so I already wanted to have a stupidly big GPU. If I was starting from scratch, I'd probably just pay for what I need with caps in place to make sure I don't accidentally charge my card a billion dollars by mistake.

[-]

cmdr-William-Riker@reddit

It should be noted though, if you had techniques that could get good use out of earlier foundation models like sonnet 3.5 and gpt-4o, you'd be surprised how good qwen-3.6 is for coding. It doesn't have the depth of knowledge baked into it that the foundation models have and the limited context window can be challenging to work with, but with the right skills loaded into OpenCode, it can do quite a bit

[-]

averagepoetry@reddit

Can you elaborate on this more please?

I have a larger setup so I'm basically brute forcing by loading large models right now (at the expense of speed).

But it would be super nice to know that I can use smaller models and couple it with the right techniques to get better results. If you have any pointers or could describe how you set up your system, I'd really appreciate it. Thank you so much!

[-]

Fit_Window_8508@reddit

100% Agent context and memory scaffolding are becoming more important than models in most use cases, IMO
I haven't tried qwen3.6 but i run 3.5 locally on one of the node in my cluster and it has impressed more than it has disappointed.

[-]

jannycideforever@reddit

I agree and think there are perfectly reasonable use cases. The question for most people is if it's worth the opportunity cost. Lots of setup, lots of learning what it can and can't do, etc.

Claude code works out of the box and for $100 you're going to be set. If you want to go budget, cursor is $20.

Again, not shitting on qwen. I remember using qwen 2 in roo code or some shit and I wanted to die. Qwen 3.5 was a night and day difference, and for lots of people it will do exactly what they need.

[-]

stealthy_singh@reddit

Let's say you get something running on a pair of 5090s or even a 6000 with say 96gb of ram. What will be the capability limitations. For example I want it to help run my home assistant, I want to write medical letters for my job and having something local means it can actually write the letters fully for me rather me having to give it anonymised context and copy the letter across. I'd want to keep it up to date on medical updates for three letters, maybe by training it further with new clinical papers. I want to be too get it to research online via appropriate guard rails to give real time answers to questions. I just got Claude to collate the last year's data off my house energy usage, strip out car charging and tabulate it along with a simulation of a years worth of solar generation simulating cloud cover from weather history to give me rough idea of the cost, income and net value a solar system will net me. This actually took a while and took two while lots of 5 hourly usage limits to crunch the numbers and produce the excel sheet with graphs for me. I'd want it to write scripts to do jobs for me. The kind of stuff that you used to read threads of it professionals smashing out to automate mundane time consuming jobs. I'm not going to be writing complex code (or what I think would take complex code). Are these kinds of things doable on such a set up? I've been looking at it for a few weeks and I can't find a guide that gives what kind of outcomes I can expect outside of tokens per second. Which I do understand as concept in that so many words make up a token, I still find it hard to translate the tokens to capability. Thank you in advance if you answer.

[-]

MoffKalast@reddit

Still, though if we're looking at it objectively, is Kimi 2.6 not as useful as a frontier model from a year ago? Arguably we were just as fine using those back then.

[-]

ScoreUnique@reddit

Well if you have a lot of ram and patience, you can run sota as well, me running minimax 2.7 IQ2 quant with 48 VRAM 192 DDR5

[-]

Spectrum1523@reddit

Minimax is good and def the smartest you can run at 128gb vram but nothing that is quantrd to 2bit is sota

[-]

Savantskie1@reddit

Bleeding edge is rarely a good thing

[-]

jannycideforever@reddit

Cope, unironically. If opus 4.7 went open weight everyone would be losing their minds like it was the second coming because it's better than everything else.

I use open weight models all the time, especially for corporate work. Kimi K2.5 has been a fucking king and I probably use it more than anything because it's dirt cheap and great quality. I also know when it's time to bend the knee and pay for Opus, because when I need shit to work I don't have to worry.

[-]

CapitalDue7249@reddit

Kimi k2.6 is even better! Maybe 90% of opus and sometimes better than opus especially since opus has been nerfed

[-]

jannycideforever@reddit

My company hasn't added it to our current models 😭 but I'm starting to get some street cred so I'll be pushing to make sure we have access to new open models the moment they're online. I'm probably one of 5 people in the US offices using anything beyond openai lmao

Regarding the supposed opus nerfs, I don't follow it enough to know the degree that 4.6 was or wasn't nerded. I wouldn't be surprised, but genuinely unsure. However, 4.7 is new and they're probably not going to start lobotomizing it out the gate because it would lower the enthusiasm for the model.

Kimi is genuinely phenomenal though. Everyone else is using 5.4 mini to save costs and I'm just over here paying 1/3 less while getting near-5.4 level output. It's fucking kino.

[-]

woswoissdenniii@reddit

It’s possible, that you get a whole different t/sec in your environment than corpo or non api plebs. Can you comment on job vs. private inference speed on Anthropic and/or Moonshot? Thx in advance

[-]

The_Crazy_Cat_Guy@reddit

You sound like you know what you’re doing. So I’ll ask you this, what makes you say a model is better than another model? Is it the accuracy of the responses ? The speed of the response ? Etc ?

[-]

jannycideforever@reddit

Just prefacing I'm a fucking idiot so take everything with a grain of salt hahaha.

For me, it all depends on what I'm doing. For example, right now I have two use cases where I need to take an unstructured document and extract information from it. Even though the documents are actually all basically identical, what and how I need to extract the information is dramatically different:

Application A - It needs to extract a metric fuckton of information. Doesn't need to be fast, and it's not doing Olympian level math, and there is some margin for error. But it's a lot of info.

Application B - It needs to extract significantly less information, BUT it needs to do it perfectly and it needs to do it in 25 seconds or less.

For option A, it's pretty simple. Pick a model that performs well in general in benchmarks, has sufficient context window, and is cheap. Kimi K2.5 it is. I did a few A/B tests compared to GPT-5.4 and the difference was marginal, but at a fraction of the price. It's not fast, but who cares.

For option B, I'm having to balance:

Speed
Quality

Right now I'm shocked to find that it's 5.4 mini and Grok, because I fucking hate OpenAI and Twitter. But they're the only once balancing what I need.

If I got another use case where it needed to be absolutely perfect but time and money was no expense, I'd consider Opus. If I got a task where it needed to be faster AND the output required significantly more tokens, I'd have to see if I'd need to swap models to prioritize throughput over latency.

In my experience, the best way to go is to just experiment a bit, whether that's doing A/B testing or just seeing how well it performs based on your own judgement. The Arena AI Pareto frontier chart is a fantastic baseline, at least for comparing cost versus output quality. When you aren't sure if you're not getting the results you want because of a model or your setup, just try with a bigger model for a bit. If its not working any better, it's probably not the model causing the issue.

[-]

jannycideforever@reddit

Cope, unironically. If opus 4.7 went open weight everyone would be losing their minds like it was the second coming because it's better than everything else.

I use open weight models all the time, especially for corporate work. Kimi K2.5 has been a fucking king and I probably use it more than anything because it's dirt cheap and great quality. I also know when it's time to bend the knee and pay for Opus, because when I need shit to work I don't have to worry.

[-]

CharlesCowan@reddit

Depends on your bank account and your future. look up 4500 blackwell. 36gb ddr7 200 watt each. 3k each. get one at a time and start adding them. later power and cooling will be an issue you ignore

[-]

Few_Paint_8463@reddit

no you will never get the quality that claude has, you can come close but it costs literally thousands, more than your current suggested setup. 5x24gb is an ok amount of ram, but you wont run anything comparable to claude with that setup.

in short if you are willing to buy hardware and wait for improvements it could be worth it, the open source models are getting better and better, i would say 1 year at most till chinese models are beating american models. but atm it would be an expensive hobby that will not get the results you are looking for, even if you do match the speed of claude, the quality WONT be there.

the best models for running locally atm seem to be gemma4/qwen3.6 for smaller setups, im not sure what people are running on larger setups. when i had one i ran glm. qwen3.6 is actually quite powerful for its size, as is gemma4.

i would suggest playing around a bit with the llms before buying all the hardware, you may find you dont need the hardware, plus the hardware you are suggesting will not help you run anything really good, maybe tiny quants of higher end models. MoEs will be another story, you could run one of them efficiently, but again wont be claude level of good.

[-]

Vicar_of_Wibbly@reddit

Yes you can get a comparable experience, but boy is it expensive. This is mine: https://blraaz.net

This year I think models have finally reached the point where the likes of MiniMax-M2.7 and Qwen3.5 397B A17B are so good and so close to SOTA models that for most use cases they’re indistinguishable.

I’m not talking about Q3 GGUFs here. I’m talking about FP8 or NVFP4 models at full unquantized context length that fit into 384GB VRAM, which is what a 4-pack of RTX PRO 6000 96GB gets you.

A machine to host those is gonna set you back real money, but if that’s still in the realm of affordable then yes, you can have Claude code at home.

I do. I call it my datacenter in a box. It runs Claude code cli all day long. I love it.

[-]

BlobbyMcBlobber@reddit

If you need to ask, then no.

[-]

beltemps@reddit

Let me tell you about my experience with local LLMs, which is kind of a hobby of mine. I’m paying for three cloud models, ChatGPT, Gemini and Claude. Need that for my job in Pharma. My PC is a 32 GB Intel machine with a 4070 super, so nothing fancy. I’m using LM studio and my favorite models are Qwen 3.6 35B Apex I Quality, Gemma 4 31B instruct and Nemotron 3 Nano 30B. There’s no jack of all trades. Every model has a different strength. It takes them ages to load (4070 super has just 12GB VRAM) but once the model is loaded into GPU and System RAM they have pretty decent speed. As many have pointed out, from a cost perspective it doesn’t make any sense to let a PC run all day idle just for some questions. Cloud models are way cheaper. For me it’s for fun (starting to get my feet wet) and I prefer local models for some of the more sensitive questions (IP related and legal stuff of our company)

[-]

Korkin12@reddit

yes, worth it. if you wanna keep privacy and off network use is needed.
also models update very often and its capabilitites also increase. in a year on the same rig you may be able to run more advanced llms

[-]

Operation_Fluffy@reddit

When I read “high-end” I was expecting to read how someone is dropping around a million usd on one or two DGX Nodes. Now that’s not “globally” high-end (it’s not high end for data centers) but for local/home it would be. 3090s are solid for home use but I’m not sure I’d frame them as high end. I guess it’s a question of perspective.

[-]

Anonymous_Cyber@reddit

With Google's release of Gemma4 it's a good start. Not bleeding edge but if newer open models follow suit and start having a similar pattern to run these models on smaller setups then you'll be in a good spot having the equipment.

I say splurge and if all else fails well. At least you would have learned and kept the systems private. It's your software and your data instead of someone else's.

[-]

arousedsquirel@reddit

Yes it is, but all depends on your applications. 1. Privacy matters; no data is leaving your system if setup is done correctly. 2. Training can be easily rinsed and repeated during off hours. 3. Open weight Models are getting smarter and smarter so application values grows equally. 4. When you come to the point of running agents (like Hermes) and you got the appetite to understand the inner working of LLM's (reward junkies) you can create your own assistant that consumes a lot of tokens to be helpful. Verdict: yes a private setup can be rewarding. I wouldn't start with 3090 (I know they are relatively cheaper) but when starting all over I would go with 48gb 4090 (modded), 32 gb 5090 or the 6000 pro because I think they are a better investment on the long run.

[-]

Fresh-Resolution182@reddit

the math is pretty clear: if you are burning >/month on API costs consistently, the hardware pays off in 12-18 months. below that you are buying very expensive privacy. still might be worth it depending on what you are putting through it.

[-]

FullOf_Bad_Ideas@reddit

I have 8x 3090 ti setup that I paid about 8500 USD for, I run GLM 4.7, Qwen 3.5 397B, Hermes 4 405B and more.

So, is it worth doing?

no, it's not worth it as in I won't really recover my money. I did some training runs for my pre-trained LLM and I think the crossover point with rented H100s would be at around 1400 hours, I did around 200 hours of that so far.

But even with proper preparation, can I actually get an experience that matches Claude Pro Max x20 or GPT Pro in terms of speed, intelligence, and general smoothness?

no, it's slower. It's maybe as good as Sonnet 3.7/Sonnet 4 but it's slower. PP is way off. You get less done in the same amount of time in Claude Code. To get to same speed I guess you'd need to have 8x RTX 6000 Pro setup.

[-]

4444444vr@reddit

I've been wondering if I could take a model to run locally, fine-tune it just for the work I'm doing, like the software stack, if I couldn't be more or less off the grid with it. Haven't taken the time to fully explore it, but I imagine if it was this easy, I'd be hearing about it.

[-]

ttkciar@reddit

But even with proper preparation, can I actually get an experience that matches Claude Pro Max x20 or GPT Pro in terms of speed, intelligence, and general smoothness?

Short answer: No. The open models which match today's Opus are about a year away.

Longer answer: Whether it is worth it depends on whether the inference companies continue to nerf their services (I suspect they will) and/or restructure their price tiers out of reach (I suspect they will eventually, but maybe not for a while).

I think you will want a "serious" local rig at some point, but maybe not yet.

In the meantime, you could fiddle with a smaller model which already works on the hardware you already own, to get the learning curve out of the way at zero hardware cost. Then when you decide you are ready to pull the trigger on a Opus-killing rig, you will already be up to speed.

[-]

johnxreturn@reddit

A year ago, models really were a year away from the top private model. Today? I’m not so sure. The gap is so close it’s hard to tell sometimes.

[-]

Fit_Window_8508@reddit

I think the open source models are getting better, and people are also getting better at setting up context and memory structures. Which can elevate a model past it's weight class. Opus without Anthropic's agent and skills scaffolding probably would not be a very impressive model.

[-]

Automatic-Arm8153@reddit

Disagree. That’s why opus/sonnet is recommended everywhere.

It just works. All harnesses are better with Anthropic models. Even though they are nerfing it, it still is a quality model.

[-]

Fit_Window_8508@reddit

You can disagree, but Anthropic models are recommended because they're fine tuned specifically for agentic use and designed around their own scaffolding, that's not just raw model quality, that's the whole system working together.
Don't get me started on confirmation bias around Claude in general, every dev team i know using Opus 4.7 think it's shit, while it's sitting at the top of the leaderboard in arena.ai for no concrete reason.
The gap between open source and closed models is also closing fast enough that "just works" is becoming less of a differentiator than it was a year ago, especially with proper context and memory structure in place.
I love Claude, but yall gotta stop glazing

[-]

4444444vr@reddit

dude. for real on the claude not being the model I trust for code.

I kind of trust it... I don't trust it as much as others.

[-]

chopticks@reddit

Agreed it's hard to tell now, especially for me where my own ability to specify problems & context in clear language has improved significantly. If I think about the performance of the 12B model I ran 1.5 years ago versus the 14B model I run locally today, I wonder whether I could really tell the difference for my own real world use cases.

As other commenters have pointed out, context and tooling has improved too. For software dev specifically I sense people appreciating clear specs and code documentation just so that LLM tooling can operate on it better (which is ironically what I've been banging on about for years before any LLMs!). If this trend continues I can imagine it becoming harder to tell the difference between open-weight and expensive proprietary models.

[-]

Fit_Window_8508@reddit

Whole bunch or people learning proper code structure without learning a single line of code.
And the wild part is it works lol

[-]

Fit_Window_8508@reddit

I totally agree, especially on the pricing out point, Anthropic just today is fucking around with locking claude code behind a Max sub, Local is gonna be a must for anyone with a budget.
I would not buy hardware, especially rn, prices went down but are still inflated.
IF you have hardware laying around? It would be silly to not check it out for yourself. It's a fun project and never hurts to learn how tech that is taking over the world works.

[-]

Perfect-Flounder7856@reddit

By then what are the prices for hardware gonna look like. But now look like a hero by then..

[-]

CapitalDue7249@reddit

I would argue k2.6 beats opus in some aspects and comes very close in others especially since opus has been nerfed

[-]

mbrodie@reddit

qwen 3.6 plus is what opus 4.7 should be.

qwen 3.6 local is still extremely strong and has fixed a bunch of lazy work claude has been doing lately, i wouldn't sell them so short.

[-]

ExcellentDeparture71@reddit

I dont have the same experience for the moment How are you using it for coding? I have a MacBook Pro M5pro 48gb

[-]

Such_Land_5569@reddit

Definitely not one year away. Maybe like 3~4 months...

[-]

samandiriel@reddit

I can't tell you generally if it's worth doing, but I can tell you the use case that made it worthwhile for us.

We already had a gaming rig, and we just added about $2500 of upgrades. It is not a heavy duty box either - dual EVGA 3090 RTX FTW Ultras, 6TB nvram, ryzen 9 processor, 128GB ddr5 RAM (which weirdly was the cheapest pair per gig we could find)

The rig is used both for gaming and LLMs.

I'm a senior full stack developer / architect, depending on the day; my husband is starting a new career in devops from scratch.

We use the LLM for:

self guided education (particularly my husband, who has been working towards getting a devops career going)
setting up the home lab with LLM itself was valuable for this; so has been learning the accompanying technologies and stacks, as well as writing some small MCP servers as shims
a work in progress, and probably will be for at least a year a hobby we can share as well as a learning experience for both of us
research (for all kinds of things)
financial and medical management
home automation
personal knowledge base ('second brain')
planning assistant and project management (all kinds - house hold stuff, gardening, trips, etc)
coding assistance

So for us, it has been very effective (we're reached the point where the LLM is situated well enough to be helping us build itself out) as both a hobby, a couples' activity, and for career learning / development for both of us. Quite a lot of win, especially as as time goes on the value seems to be growing exponentially.

[-]

AdventurousFly4909@reddit

Ultimate nerd couple.

[-]

V0dros@reddit

I plan on building a similar machine. Can you share more on the specs?

[-]

Torodaddy@reddit

Not unless you have a serious business being supported

[-]

AdventurousFly4909@reddit

Wait for the m5 ultra and buy that.

[-]

Ylsid@reddit

Economies of scale say no. But, there are some privacy related reasons which might make it worthwhile to you.

[-]

kaeptnphlop@reddit

Is it frontier? No. But I can run Qwen-Coder-Next with checkpoints and speculative ngram-mod in VS Code GitHub Copilot to work on code to the point where I missed, that the selector slipped from Claude Haiku 4.5 to my local model.

If you don’t want to vibe code but actually put your own nagen to use - a $2000 128GB AMD Strix Halo machine is a great place to start.

Don’t forget that the current trajectory is still in favor of models becoming more capable at smaller size ever quarter, with better quantization techniques on the horizon.

It’s not this year’s frontier models, but basically last year’s that you can run locally. Given you have enough VRAM available.

[-]

Britbong1492@reddit

No, but mix and match, qwen3.5:9b on a Mac Pro can probably do the bulk of your work by volume, but you need to go outside for a brain now and then. If you want a degree of privacy use Venice.ai

[-]

see_spot_ruminate@reddit

How about this comparison. A person could probably survive with some cheap bus tickets, but some people want a sports car. Is it practical? Are you gonna beat one of the F1 teams? Does it matter?

[-]

Shronx_@reddit

thats the worst comparison ever

[-]

Best_Control_2573@reddit

It's like great art. It makes you think.

I mean, what is the sports car in this analogy?

Claude? The Local Rig?

Maybe it's a Sparticus-type thing. We are all the sports car.

[-]

zakadit@reddit (OP)

I don’t really see how that applies to my case. I’m not asking whether I can use it well by “driving” it. I’m asking whether my car can even realistically claim to offer performance in the same league as the billionaire team’s car.

[-]

see_spot_ruminate@reddit

I feel like your question "I’m asking whether my car can even realistically claim to offer performance in the same league as the billionaire team’s car" answers your ultimate question.

[-]

jannycideforever@reddit

Tbqh I actually agree with OP but not for the reason he thinks.

He's asking if running his own local models is more economical and will give him frontier performance. He could be a hobo or he could be Jeff Bezos. The answer will be the same: no.

[-]

zakadit@reddit (OP)

i think we agree for the same reason (i need someone on my side so ill take you in my downfall).

but idk seems pretty clear to me, no need to talk money when op said no need to talk about… money

but telling me « you don’t have the skills/time » or just « nop sorry » is actually a really really good reply in my opinion

[-]

jannycideforever@reddit

I'd say you could be more clear in the original post, just because you start off by saying LLMs are a revolution but then list that they're expensive. Most people are going to interpret that as meaning that is a thing you're trying to mitigate, even if you're willing to pay more upfront for it.

But now that it's clarified, I'd say if you're willing to pay in the tens of thousands of dollars, you CAN get GLM or Kimi models running. You'll get noticably better quality than sonnett, noticably worse than opus. You'd also have to factor:

It's very hard to learn how to set up
Even if you do it right, you will have to learn even more to know you did it right
You'd have to accept that you won't be able to keep up with the frontier open weight models if they get bigger (or have to invest more money into more memory)
You'd still get noticably worse throughput

Don't get me wrong, local is fucking cool. I'm seriously considering one of the DGX Sparks knockoffs just because I think it'd be fun as hell to have 128gb of unified memory to play with. But it's ultimately a hobby that can have some practical applications, not a practical replacement for the frontier labs.

Now, if after hearing all of that you still say "Fuck it, local is cool. I'm fine with all those drawbacks and headaches and know I'm gonna have to constantly tinker with shit and it won't ever be as good as just paying for a cheap model" then go for it. I wish I didn't work so much because I want more time to do that type of stuff, but that's because it's fun and interesting for me personally.

[-]

Automatic-Arm8153@reddit

What’s a DGX spark knockoff?

[-]

zakadit@reddit (OP)

yeah mb my english is kind of shitty

i wanted to say « in one hand you see post saying local llm is peak and on the other you see ppl only listing big, really big, inconvenience »

i just wanted to see how bad it is (or how good ).

find it kinda … impressive that is such a mess to get good llm at home but i feel like anthropic / open ai won’t keep respectful with their pricing (still laughing about how gemini went from almost being capable of memorizing 1 Googlions of token to having the iq of a newborn)

[-]

jannycideforever@reddit

Openai and anthropic will probably never be the best for the price, but they do have to keep it competitive for their business goals. They need economies of scale and they need training data to improve their models. They seem to view the AI market as a winner-take-most game so they'd rather burn cash now to be the winner in 5 years than save now and have it still be for nothing.

I am probably more of a "defender" of the frontier companies (except openai but that's because their product is shit), but not because I think they're altruistic. They keep their models closed and expensive for the same reason Chinese models are open and cheaper: it suites their financial self interest.

[-]

sob727@reddit

I program mostly in a language that is not in the top 10, maybe not even top 20.

I find value in paying for Claude as it is way more knowledgeable than the open weight models out there. I have tried llama, gpt, qwen and others. They simply dont match paying options for what I do.

[-]

TripleSecretSquirrel@reddit

I’d be really curious to see the performance of a finetune of a smaller open weight model on your esoteric language.

This episode of Dev Interrupted features an interview with Tim Dettmers who’s a professor at Carnegie-Mellon. His team created a finetune of I think Qwen 3.5 or Qwen 3 coder next, trained on their specific codebase, and now it outperforms Opus for their specific use case and is a hell of a lot cheaper. All for the low cost of a couple hundred bucks of gpu rental time.

[-]

a_beautiful_rhind@reddit

People pay to access deepseek, GLM and kimi. If you build a rig that can run it, then yes, you are in the same league.

They also pay for minimax, devstral and qwen and that's even more attainable with a gaggle of GPUs.

[-]

Hefty_Wolverine_553@reddit

I would say that the frontier open models like GLM 5.1 and Kimi 2.6 probably match the middle tier of closed source models (e.g. sonnet 4.6), but they're prohibitively expensive to run locally. The newly released Qwen3.6 35b probably matches or slightly surpasses the free tier of closed source models (e.g. Haiku 4.5, GPT 5.4 mini), and you can run that quite easily, either with a decent GPU and some extra RAM or a 3090.

[-]

Direct_Turn_1484@reddit

Depends. How often are you competing in professional racing? Or do you just enjoy driving a fast car that’s maybe not backed by sponsors and million dollar teams and wheels that need to be replaced a few times a day?

[-]

TractionLayer_ai@reddit

No, a home setup won't beat the latest GPT or Claude in raw speed or intelligence.

But yes, it is 100% worth it because your main goal is privacy. You are basically trading a slight edge in top-tier performance for complete ownership of your data. A multi-GPU setup is still incredibly capable and will easily handle 95% of your daily tasks.

Starting small with your 3090 + 3060 is the smartest move. Don't over-engineer it right away—spin things up simply using Docker, test the waters, and see if the offline experience meets your needs before you drop serious money on a massive 5-GPU rack.

[-]

kmp11@reddit

If the next gen model start using 1bit and turboquant and shrink 50-90%, your 3090-3060 setup should be plenty.

[-]

Shoddy_Cook_864@reddit

Try this project out, its a free open source project that lets you use large models like Kimi K2 with claude code for completely free by utilizing NVIDIA Cloud.

Github link: https://github.com/Ujwal397/Arbiter/

[-]

muyuu@reddit

it massively hinges on what you're planning to do

if you want something that will deal with sensitive data in some way, or that you need to be able to replicate, then really local is the only way

but will you be able to generate SotA quality code or multimedia assets? nope, not at a reasonable cost

I know it sounds like a broad question, but I do have enough money to seriously consider it. A setup like 5×3090s (i’m starting chill with 64GB, 3090 + 3060) with 128+ GB of DDR5 seems realistic for me.

But even with proper preparation, can I actually get an experience that matches Claude Pro Max x20 or GPT Pro in terms of speed, intelligence, and general smoothness?

LOL no

but you can have your orchestrator/playwright at home feeding remote trillion-parameter models, and you can run the biggest open models on rented hardware, including rental baremetal (see RunPod, Vast.ai, Jarvis Labs, Lambda Labs, etc) so you don't have to depend on specific vendors who can pull the rug from under your feet (although it's much simpler to hop models in opencode, openrouter, or kilocode)

concrete recommendation: use opencode, and consider either a smaller setup or a strix halo machine to start rather than 5x 3090s

[-]

Savantskie1@reddit

I’m of the mind that you don’t have to have the newest ai stuff. I’m a big believer in repurposing older ai cards. They still work and are great at serving. I’ve bought 2 MI50 32GB cards and plan to buy two more once their current price drops again. I’m also going to buy used server hardware with a modern power supply for the efficiency. I’ve already bought the 128GB ecc ram. Just because it’s old doesn’t make it useless. I’ve lived my whole computer life since the early 90’s relying on older hardware and making it live till it dies spectacularly. Don’t listen to these fools saying you have to buy the newest version and just play around

[-]

Best_Control_2573@reddit

Half agree, because I have a small army of old gpus running async tasks, and they're awesome to have. It was also how I started, and was a great way to learn.

But I'd also warn - most people don't want their daily drive to be on a push bike.

[-]

JLeonsarmiento@reddit

A high end local LLM capable setup could also be a MacBook with 48 to 64 GB ram for 3000~3500 USD.

[-]

theabominablewonder@reddit

Hire the compute from a cloud provider, then you can test your t and see if the capabilities are good enough for what you want to use it for.

[-]

journalofassociation@reddit

Whether you do it or not, don't put highly personal stuff into a cloud-based LLM.

[-]

Automatic-Arm8153@reddit

Hard not to. So much more useful when you do.

I use to be like you. Until one day I wasn’t. What you trying to hide from your cloud AI provider, that your phone doesn’t already know. If not your phone manufacturer then google or facebook with their heavy tracking across all domains of the internet

[-]

WishfulAgenda@reddit

Mac mini m4 pro - librechat, liquid lfm (summarily agent), Gemma 4 4b q8 ( quick chat/knowledge agent).

AMD 3950x 64 gb ram dual 5070ti 4tb nvme. Qwen 3.6 35b q6 100k context ( coding, agentic analytics, cad )

Next step is either rtx 6000 max q, a Mac Studio m5 ultra ( or both ) and have it all orchestrated via librechat

[-]

DataGOGO@reddit

So, is it worth doing?

No. It is a hobby, it will never be worth it financially.

To get anywhere close to Claude Opus / ChatGPT you would need a LOT more hardware than 6 3090's, you would likely spend close to $300k, and even then you would be really limited and it would not be anywhere near as good.

[-]

jacek2023@reddit

I have 3x3090 I am trying to purchase fourth one (it's not easy!) but probably instead going higher you should consider just 6000

[-]

HarrisCN@reddit

I mean this is always the answer, it depends.

Are you somebody that is exessivly using Tokens and already spending like 200/300+ a month on Tokens? - Big Yes long term

Do you care about data privacy, handle secrets or other sensitive data? - Big yes

Everything else depends. I always compare it to buying a car, if you never use it, do you buy one or is renting better? If you always use it, just purchase it, its the better option long term...

[-]

ortegaalfredo@reddit

It is 100 for me, but I easily use >50 million tokens/day. Under that I guess you are better off with a plan, unless you like the hobby

[-]

sob727@reddit

How do you consume this many?

[-]

ProfessionalSpend589@reddit

Probably model loops and user forgot to check on it.

[-]

ortegaalfredo@reddit

It happens sometimes, lol, particularly with quantized models.

[-]

ortegaalfredo@reddit

Code audit of large projects. Largest project ran for weeks nonstop.

[-]

My_Unbiased_Opinion@reddit

Probably has a wild OpenClaw setup

[-]

Sea_Manufacturer6590@reddit

Absolutely, you can achieve this. My company sets up LM Studio with custom MCP servers to achieve your vision. DM me if you want to see a demo of what you can do. I can also point you to a correct build; you do not need 128 GB of system RAM.

[-]

laterbreh@reddit

Youre not serious if youre using lm studio or llamacpp. Sorry. Thats called playing with toys.

[-]

MengerianMango@reddit

5x3090 and 128gb ram is far below "10's of thousands"

And llama/gguf is often the best if you're doing mixed inference. I'm running glm 5.1 q4_xl on an epyc + 6000 blackwell that I bought for $16k (before ram prices went up). You'd need a full server of 8 blackwells to run at the same precision I'm running, which would cost nearly $100k.

[-]

MengerianMango@reddit

Rent the GPU before you buy the GPU. I started with a 7900xtx. Wasn't happy, so I got a 5090. Wasn't happy, so I got a 6000. Wasn't happy, so I got an Epyc and a bunch of ram to pair with the 6000 (before ram got so darn expensive). Now I'm moderately happy. It's really smart but also pretty slow (using glm 5.1 now). I still use Claude sometimes when I'm too annoyed to wait for glm to run at ~20t/s or when the task is truly hard/niche/bleeding edge.

IMO, smaller models are just now becoming useful for agentic work. The new gemma and qwen are pretty decent. The old gen could only appear able. (They would run commands and emulate doing stuff but it was always so bad as to be effectively useless.) I think I could be moderately happy with just the 5090 now, maybe, but you will really feel the difference between the pro models and anything you can run on a single GPU, even a 6000. So definitely set a purchase budget and then try the exact hardware you're planning to buy with the model you plan to run before you sink all the money into it, or you might end up chasing the dragon and falling into sunk cost fallacy like I did. I'm not rich -- what I did was kinda dumb.

[-]

DeepWiseau@reddit

Simple answer is, no. Without $20,000 USD, you will not get within 10% of the frontier. With $5,000 you can maybe get 65%.

That is only for today. Inference will move away from the cloud sooner than later. Start learning now. Start getting familiar now. Figure out how to setup postgreSQL for memory, figure out context management, get general orchestrators made, and get small workflows figured out. Things are going to start to move quickly.

2 years from today you are going to wish you figured out how this works.

[-]

Herr_Drosselmeyer@reddit

Privacy is a valid reason, especially for companies/orgs dealing with sensitive data and even for private citizens.

But, the reality is that open-source models are released by commercial enterprises, and they have a strong incentive to not undermine their own commercial efforts. As a result, any open-source AI model will lag behind the state of the art, and that's not likely to ever change. As it is, we are quite lucky that the field is so competitive that they feel that releasing models in this way is needed to garner market share.

So don't try to match the latest Claude/ChatGPT/Grok etc., you're not going to get there. Instead, ask yourself what you really need. For high-end comsumer hardware, look to Gemma 4 (specifically the 31B and 26B-A4B), which, imho, are at the time of writing the best model that can be run locally. They will do most of what SotA models can do and run reasonably fast.

[-]

WishfulAgenda@reddit

Here’s my thoughts. You will never match a frontier model no matter how many $$$$$$ you spend. Yes it’s 100% worth doing.

Reasoning you can’t match them - frontier models have billions of dollars of infrastructure and thousands of the best mathematicians and engineers working on making the systems as capable as possible. Can’t really compete with that. That said diminishing returns like in most places and now you can get close for a decent amount and a little close for a reasonable amount.

Reasoning why you should - I’ve learnt so much setting this up for myself. The learning extends past ai and into containers, different os, configuration and integration. I can now talk effectively to how the frontier models work (kinda) and can to some degree see through the sales nonsense and bullshit that spouted every single day. I can also now advise clients on what might work for them as well as accelerating my own personal projects.

Further thought. I’ve been ramping up on this now for 6 months and to be fair my rig is pretty much stable. Has off days but generally runs pretty well now.

[-]

pentalobe@reddit

What’s your setup?

[-]

Ready-Hour2290@reddit

I would say yes, but that depends on you, what you use it for, it’s efficient? Cost wise?

[-]

Long_War8748@reddit

Worth it for what?

For the experience? For a business proposition? For Fun?

What do you actually use it for?

[-]

Cergorach@reddit

As always: It depends.

It's never cheaper to run locally.

It's only 'worth it' when you're required to run it locally with data that you can't run in the 'cloud' legally or contractually. Not because your gut says so, but because you have to.

That's not to say that it's not interesting to run things locally. You can learn a lot! But that's a whole different cost evaluation, and doesn't require huge models or huge hardware investments.

Your lives are not interesting, they are not important, when you're not dumping them into a LLM, you're dumping them through other means into the ether. Too many 'privacy conscious' people that are very activistic, tend to conveniently forget all the things they themselves tend to expose their privacy every day...

I run a Mac Mini M4 pro (64GB) locally, it can run up to 70b models (quantized), the results were never as good as the models available you can use on the Internet free, that's not even considering the high end paid models which are far, far better for their specific use cases. The newer Gemma 4 models do look good, but still not as good as some of the free/paid competition.

Running (free) cloud models for hobby projects is often good enough for me, no real personal data there. For clients I run what they've approved, quite often nothing, or on their hardware/subscriptions. The only occasion I've really used the local AI in the last year and a half was for an example CV rewrite once, which I personally rewrote again. Everything else I do locally is just testing open source LLMs that fit in the 64GB of unified memory I have.

I also find that often when I need LLMs is when I'm working an edge case, and then even the MS solutions don't give the correct answers for their own solutions... I generally don't write code.

[-]

gingerbeer987654321@reddit

Rent some dedicated gpus fo a few bucks and hour a try it for a week or so. If you really get use out of it then buy your own, eyes wide open.

Single use machine for home LLM is a hobby, not a good investment decisio

[-]

ikillas@reddit

All you need is max 12-16gb gpu to run local models. If you need more than this, it’s better to go with those big companies.

[-]

tgromy@reddit

btw which model is best for coding TODAY? Sorry for lame question, I am out of the loop

[-]

Prudent-Ad4509@reddit

Note that you will soon want to have 12x3090, and then up to 24x3090 (or whatever you are able to power). However, 5x3090 will allow you to run a pretty decent model already.

There is no valid cost-benefit analysis for this setup. It will likely never pay off financially, but it will definitely give you more control, and this allows you to try things you wouldn't otherwise. Just make sure that you know which motherboards will actually support such a config.

[-]

megadonkeyx@reddit

There's nothing so ultra wonderful about opus or gpt 5.4, they still get things wrong.

You also don't need a crazy multi GPU rig, there's strix halo, Mac studio, dgx etc.

So yes, totally worth it.

[-]

temperature_5@reddit

If you work with sensitive data (legal contracts, medical, trade secrets, etc) then it's worth doing. Depending on the volume and complexity of work you do, you might be able to get away with a modest system and a medium sized MoE. (I have a 96GB Ryzen APU for this, and it's not fast but can run 1xxGB MoEs well enough for what I need, often \~20 tok/s. Cost < $1000.)

If you depend on agentic coding to make a living but are not getting the consistency/performance you need, it may make sense, but you will have to spend a lot more to run a model like GLM or Kimi completely in VRAM. If you have friends that also *need* privacy or dedicated resources, it may make sense to go in on a server. Open model intelligence keeps improving, and is as good now as Claude was a year ago, IMHO.

[-]

zakadit@reddit (OP)

None of them i just found it cool to have my claude at home because i hate big companies and i think its a great hobbie

[-]

9gxa05s8fa8sh@reddit

there is no claude at home. check artificial analysis, models, scroll down, click coding

[-]

annodomini@reddit

You can run models like Qwen 3.6 35B-A3B, Qwen3.5 122B-A10B, the Gemma family, etc on hardware that costs a few thousand dollars. These models are benchmarking around a year behind the state of the art; so think maybe Sonnet 3.7 up to Opus 4 level for the general range of capability.

If you want to run models that are closer to modern Sonnet and approaching Opus levels (note that many people claim that they sometimes benchmark a bit stronger than actual usage, I haven't used them enough to compare), you'd be looking at GLM 5.1 or Kimi K2.6, which will require tens of thousands of dollars of hardware to run at any reasonable quant and speed, and a significant amount of power even at idle.

For those models, the economics don't really work out well unless building such a system is your hobby in itself. It generally makes more sense to pay for someone else to run it on a shared machine in a data center; you can generally run multiple inferences in parallel much more efficiently than a single one, and on a shared you have much less idle time when you're just wasting power.

So if you want those levels of models, just go to OpenRouter and play around with the different models and providers available.

[-]

9gxa05s8fa8sh@reddit

take advantage of cheap remote AI if you can, and buy hardware when the AI market crashes

[-]

mr_Owner@reddit

Local is safer, cloud is handy.

And with any SaaS or free services, you don't really know what happens with your data.

Saying i dont have anything to hide (imho bad approach to life), would mean training frontier models based on your usage and data.

Also, dependecy is not nice when cloud models gets randomly dumber and slower.

[-]

Awkward-Forever-259@reddit

Building something so users can run there on LLM for very affordable prices.

[-]

Fit-Produce420@reddit

Nope!

[-]

Fine_League311@reddit

Je nach Anwendungsfall, als Unternehmen ja. Privat zum spielen reicht HF & Co selbst mit 9$ pro Account und 25 min H200 lassen sich ne Menge reißen. 25 min auf H200 ist Killer.

[-]

WyattTheSkid@reddit

4x gpu rig user here. (2 3090 TIs + 2 3090s)

My honest answer isn’t a straight yes or no it’s honestly fully dependent on your use cases, needs and values. I lean more towards yes if you want to :

1: have personal conversations with AI (e.g. asking about sensitive documents or revealing personal details to it that you’re not comfortable with anybody from OpenAI, Google, Anthropic, etc. reading)

2: ditch subscriptions and apis

3: are willing to sacrifice a little speed

4: are serious about integrating AI into your daily work and life enough to justify the up front cost of the hardware. (Oh and also only if you’re comfortable with building a janky box strapped together with hopes and dreams that consumes an ungodly amount of electricity…)

I don’t recommend it if this is a hobby or fleeting interest for you though. A single modern (ampere or newer) mid range to high end consumer gpu is often more than enough to get a taste of local AI if you stick in the 8b to 30b range of models with reasonable quantization. GLM 4.7 flash is a really good example of this. Its around 30b parameters iirc and no its not going to match the big guys in every way but you would be incredibly surprised at just how close it can get in a lot of ways. I would say its a suitable chatgpt replacement for at least the casual user group.

Tl;dr: don’t waste money and agony if you just wanna toy around with it but if you’re serious and use ai for a lot of complex stuff, want data privacy, and self sustainability, being able to run minimax 2.7 on your own hardware kinda fucking rocks

[-]

Taurus-Octopus@reddit

You're not going to.be able to replace a frontier generalist model locally.

Local is worth for the right use case. Right sized for a specific job.

In my case, determining a classification for a commercial payment via SWIFT free text field in a wire payment, completing SQL queries when the schema is know, like in transaction tables for a bank's data warehouse, pre-processing data to flag and strip personally identifiable information or other sensitive data before I transmit a prompt to a cloud-based model. Or a moderately sized model with a decent RAG pipeline.

I was experimenting with using a RoBERTa classifier to identify risk ratings and spans of relevant narrative signals in large pdfs related to non-financial risk in a banking context. Spans and ratings were fed to a 7b to convert to json which were then used to create an SQLite table for the narrative signals as well as ratings. I'd query the db and feed reaults to a frontier model.

Turns out RoBERTa wasnt necessary, but that's another story

[-]

PermanentLiminality@reddit

It is all about what you spend and what is worth it to you. You are not going to be replacing Anthropic or OpenAI.

I have 72 GB of VRAM with 3x 24gb P40 GPUs that were about $200 each. I started with 10gb P102-100 that were $40. Just saying you don't have to spend the big bucks.

I never did too much useful with them, but with the Qwen 3.6 35b and Gemma 4 models, that has changed. These are powerful enough and small enough to be useful.

I just wanted to run them myself and it is hobby money.

Before you buy hardware, test the models on OpenRouter. If you find a smaller one that does what you need, look to see what hardware you need to run it

[-]

zakadit@reddit (OP)

good to know !

was it a mess to set up or convenient enough? (not open routers but the « real » llm result)

[-]

PermanentLiminality@reddit

Not too bad. Took me a few tries to get the Nvidia drivers and the cuda stuff installed, but after that, it was relatively easy.

[-]

Zyj@reddit

This topic comes up a lot. Value your privacy! There are sweet spots in terms of bang for buck. 2 GPUs on a desktop mainboard for example.

[-]

cmndr_spanky@reddit

You could literally pay for a Claude max plan for years and still not offset the cost of the kind of hardware you’d need for a comparable Local LLM. My advice is unless you have a legitimate business with revenue, a Claude subscription is worth it if you’re operating a successful business, maybe a lot of local hardware would be worth it and you can declare it as a business expense

[-]

Perfect-Flounder7856@reddit

Sorry you've hit the limit of this convo

[-]

Hefty_Wolverine_553@reddit

It's feeling more and more worth it these days. With how closed source providers are raising their API prices and making it more restrictive, and companies like Anthropic seemingly degrading their older models and overall giving an inconsistent experience, local LLMs are probably becoming more valuable. Especially with the recent release of GLM 5.1, Kimi 2.6 (although very difficult to run locally), they're actually approaching the levels of the closed source providers like Sonnet, which was definitely not something I expected.

Also, owning your own hardware has honestly paid off so far (although buying more hardware now is probably a bad idea with how the prices are). Even the smaller LLMs like Qwen3.6 35B have become very capable, and you can run that model on a single 3090 and some RAM. It has honestly come so far from running Llama 2 13b on my 3090, and I'm very glad I bought my 3090 when Llama 2 came around.

As far as high end setups go, I feel like if you want to run all the new open source LLMs coming out, a DGX Spark (and potentially two in the future as an upgrade) might be a good idea now, with how overpriced the GPUs and RAM has become.

[-]

Perfect-Flounder7856@reddit

Do you see the prices of new hardware going up or down in the next 6 months to a year. Same as the housing market. Buy now it's never going down

[-]

External-Piccolo7304@reddit

Depends on your goals.

I love running local LLMs, specs : 32gb 5090, 24core intel 192GB of RAM.

Yeah it was expensive, but I like local power not just for LLMs but also Blender 3d, Davinci Resolve, Comfy UI .. etc…

I’ve been testing goose with qwen 3.6 xl, and it’s pretty impressive for VC, it’s very Cursor-like.

If my only single goal was LLM inference would I have bought the hardware.. 🤷‍♂️

[-]

NoSegfaultPlz@reddit

Honestly AMD Strix Halo with 128 GB unified memory for 2.7k can get you pretty far as inference goes. Unless you want to also do fine-tuning I would recommend this over 5x 3090

[-]

kc858@reddit

You need a minimum of 2x rtx pro 6000, run minimax m2.7. everything lower than that is pretty shitty

[-]

CCloak@reddit

Claude's recent dramas, if anything, reinforce the value of going private local LLM.

[-]

shansoft@reddit

Exactly what is high end? It can range from like $1000+ to pretty much $50000....
And what are you planning on using it for?

[-]

zakadit@reddit (OP)

cooking meth, adderall, complot theory heavy research, using my free will & right to have privacy at his peak

[-]

DarePitiful5750@reddit

You might look at something like a NVidia DGX Spark system. Maybe $4k or so. 128GB for large models. But isn't going to run as fast as like an RTX6000.

[-]

zakadit@reddit (OP)

just look up RTX6000

seems to be a messed up thing to get in france // EU, prince range from 4-12k (not between normal and pro) and availability seems to be shittier than 3090. i’ll consider taking 2 of them, so they fit in a ATX case.

[-]

Stunning-Bit-7376@reddit

No, you can't get an experience that matches Claude even with all that investment in your local rig.

You can get an experience that matches Claude from like a year or two ago, probably. But you'll have to be your own tech support and you're relying on the companies that make the open source models to keep releasing new open source models just to stay a year or two behind the frontier models, and there's no guarantee this open source ecosystem will keep going.

[-]

zakadit@reddit (OP)

People seems to be according at a « 3-4 month behind » but above all, i’ll take the possibly mind fucking tech auto support issue.

[-]

jack-dawed@reddit

It is not worth it unless you have a company and can write it off on taxes.

Even then, it is still more cost effective to run like Kimi K2.6 on Fireworks if you are looking for Claude Sonnet/Opus level. It gets close.

I use K2.6 so much on Pi now that I only use Claude Code Opus for planning.

[-]

super1701@reddit

I mean...I just find this shit fun and have the income to blow. Single male, and no kids. So fuck it.

[-]

jack-dawed@reddit

Even then, the token efficiency isn’t worth it for any meaningful work.

If you find enjoyment out of setting up local LLMs and hardware, by all means go for it. It is comparable to tinkering with a homelab.

I’ve been a hardware engineer for over 10 years now and still find enjoyment learning and working on these kind of projects. But it’s easy to overindex on and invest too much time on your tools when you should be spending time on the project itself. Like what are you trying to achieve with local LLMs that couldn’t be accomplished with a hosted open source model.

[-]

super1701@reddit

It is not worth it unless you have a company and can write it off on taxes. I mean. I was providing context for the larger community. If they are also, single with expendable income and find interest in the topic. Why not? (Plus this feeds bad data to those who scrape this site :), sorry for you to get caught in the crossfire)

[-]

jack-dawed@reddit

You can start an LLC even if you are unmarried and have no kids.

If you have 20-50k to spend on hardware, you should be spending that on inference.

[-]

Embarrassed-Area4652@reddit

If you’re talking about a fresh spend, make it concrete: how does it pay for itself, and how fast does it need to break even for you to consider it worth it?

I have a single RTX2060 I bought years ago for gaming and am fine with it (and a lot of main RAM for what that’s worth). My use cases may not be yours, but I’d also be weighing it against what else I’d be buying. Like, I’d rather spend a fraction of what you’re talking about a new bike, but I can’t tell if that’s relevant in terms of talking about it as a hobby or if you could legit say, if this delivered me X in Y days faster, it’d pay itself off over Z development cycles or something like that. The accounting math is out there if you’ve got the inputs.

[-]

zakadit@reddit (OP)

doesn’t need to be anything, just to be as good, i don’t mind if it was not worth it or if ill need RTX6000 (beside that’s i don’t understand how they range from 5 to 10k? for the same model ofc) but sometime, rarely, i use gpt pro & claude max20 for « really » heavy tasks and i need it to be as good (im just acting needy but you got me)

[-]

Sea_Manufacturer6590@reddit

My local AI model has persistent memory, self-learning, and improves from any errors. It also has file system access, web browser access, can run scripts, build sites, make marketing content, post it, and publish files to my website. It also uses my Claude code locally.

[-]

Busy-Equipment-8958@reddit

what setup are you running?

[-]

Sea_Visual_9119@reddit

能分享下如何配置的吗大佬

[-]

Mission_Biscotti3962@reddit

You want a simple answer: it depends on your budget and how much you are willing to spend on electricity.
Even with your suggested setup you will still have lower quality and slower responses than if you pay for the API's.

[-]

zakadit@reddit (OP)

i don’t think electricity will be an issue & i’ve not set limit amount of gpu but i’ve seen at a certain amount it’s start to get really tricky to handle the management behind.

[-]

sleepy_quant@reddit

Running 35B locally on a Mac and i still pay for Claude Code on top. That's the honest answer: Local gets me the stuff i don't want leaving the laptop (drafting, evals, agent loops), frontier gets me the stuff that's actually hard. Privacy thing is real. "matches gpt pro" isn't. I'd start with one card and see how often you actually fire it up before stacking 5x3090

[-]

zakadit@reddit (OP)

I’m pretty sure I made it clear that money wasn’t the issue, so I don’t really need a low-cost way to test it.

What I genuinely don’t understand is how many of you keep drifting away from the actual point when my question is very simple: if I invest enough, can I get an experience as good as Claude, or is that simply not possible?

I get that the question may sound simplistic, maybe even naive (which is totally understandable), but if you think it’s not a relevant question, then honestly ignoring it would have been a better response than answering something else.

[-]

Stepfunction@reddit

I'm quite happy with a single 24GB card and 64GB RAM. Yes, I'm limited to 32B and below, but there's a ton of stuff you can do in that range.

I would say there's a lot of value in getting to that point, and with the R9700 32GB going for around $13, you could get that instead.

Beyond that, you start to get to diminishing returns pretty quickly and start to fall into an awkward middle ground where you have a lot of GPUs, but still can't run the very big models.

[-]

ranting80@reddit

But even with proper preparation, can I actually get an experience that matches Claude Pro Max x20 or GPT Pro in terms of speed, intelligence, and general smoothness?

Matches? I have Qwen 3.6 35b running local and I always use it now as my go to for coding. I haven't needed to use my Claude sub since it released.

Most people just don't need opus power. It's amazing to have and it produces extremely good results, but a lot of what I do is in-house so as long as it works, it's fine. My Qwen 3.6 is at 1.2m tokens tonight and cost me the power from the wall and it's a joy to run offline.

[-]

laterbreh@reddit

I'm coder and web applications developer about $40k in running minimax 2.7 at full precision and full context window at 60tps via LLM (3x rtx 6000 pros). Yes its fucking worth it. I also run qwen 3.5 400b at q4. Guess what? My throughput and latency is faster than either of those models provided through open router. Its local, im autonomous, my throughput is higher, my clients are happier and im happier.

I dont give a fuck about running 1t param model for a 5 to 8% difference on SWE benchmark. I tell it what to do it does it. a 200b model can follow instructions as well as the 1t, so can the 400b model. And im not gonna get rug pulled, and I get a huge tax write off every time i slap another card into this machine.

If youre serious then get serious hardware.

If this is a hobby, stick with the API until you are serious. But temper your expectations and scale your actual spend with models you know well and know you can run.

[-]

jeffwadsworth@reddit

My local setup can run all of these massive local models...but it is 2-3 t/s at best, but I am fine with that. It cost me $4000 last year, but today it would run $13K or more. No regrets at all because GLM 5.1 is great and its fun to play with.

[-]

catplusplusok@reddit

I find this to be a similar question to "is buying a home worth it when you can just rent a furnished apartment"? Initially may be no, but you have to play by landlord rules (censored models), rent can go up at any time and landlord can just decide to not offer apartment/model any longer. One compromise is to pay for API serving open weights models where you can always shop around for prices and if you can't find good options you can still host the exact same model locally. Check up MiniMax token place prices for example, $200 per year will get you a lot.

[-]

sunflowerapp@reddit

You can also try the open weights models hosted on cloud and if that is satisfying then you can host it yourself if you have a use warrant the privacy.

[-]

mohelgamal@reddit

You gotta keep in mind economics of scale. You can build your own, but you won’t get competitive performance to their players unless you spend very serious money.

And if you do spend that money, unless you have a multi person team that can use the hardware around the clock, chances are it will be cheaper to pay a company.

If your work requires absolute privacy, like important proprietary data, legal work, etc. then you should invest I. Your own set up.

On the other hand, if you needs are low, you could be ok running a small model on your own fairly recent computer. You don’t need Claude opus if you just want the AI to explain some basic stuff for you

[-]

laterbreh@reddit

This is not entirely true. $40,000 dollars in hardware can run minimax in full precision and full context at 60 tps which is typically faster than ANY provider ive ever used on openrouter.

[-]

wayfarer8888@reddit

I ran it through diverse Chatbots and we came to the conclusion it's probably not. I have a (free) subscription and recently bought credits for Clsude and Deepseek, the first is very expensive and the other supercheap. So I do all planning on my subscription, routine repetitive screening on Deepseek and Claude for complex prompts. If you don't code for a living or have some 24/7 token eating use case, I would not invest in a local LLM. I have installed some and it was underwhelming on older hardware when Deepseek runs 100 prompts fast for virtually free (<0.01$).

[-]

arcanemachined@reddit

I would say no, unless you need the privacy, are OK with the quality of today's models (there is no guarantee of free models in the future), and have a shitload of money that you want to get rid of.

I say that as an advocate of open models, and a fellow hater of big data.

Also you will need to spend tens of thousands of dollars to buy the hardware required to run the best open models that actually give the top-tier models a run for their money.

[-]

jonahbenton@reddit

Claude the product is a lot of combined things. The models themselves, sophisticated memory, sophisticated context management, sophisticated tools including web research, wrapped up in the best in class harness (claude code). For code writing use cases, open weight models like Qwen 3.6 running on $5k-$10k of local hardware, can stand up to being connected with claude code and get some work done autonomously. But assembling the whole rest of the suite by hand with variable quality and integration is its own chunk of work.

[-]

aallsbury@reddit

Dude my dual 3090 system running Qwen 3.6 35B A3B is really damn useful for a lot of things, and its not that highend/expensive. Does it replace SOTA API models? No. Does it reduce my monthly bills greatly by doing all the jobs that are not time/high-intelligence sensitive? Yes it does. Are there extra points for data privacy? Yes.

Not for everyone, but very useful for the right cases.

[-]

OddDesigner9784@reddit

I think a model like kimi k2.6 is right up there with claude chatgpt etc. But that's around 500gb which you need some serious hardware for absolutely won't be worth it. I've had a lot of experience with qwen Gemma working with them. I think playing around with quants fine-tunes tool calling can get you some real good value. And they are only getting better. But yeah working with qwen it will make mistakes. I'm not sure how it scales up to higher params. So I would play around with getting something up on whatever you have now. Maybe use try requests for a bigger model see how it does. Get good at context engineering. I like to use research plan work. But yeah hardware is only getting cheaper and small models are only getting better. All I really need from qwen tbh is for it to have better reinforcement learning. Like it uses thinking on stupid things pretty often and confuses itself. Needs to be trained on when to use thinking. So I am kinda waiting for a window to go for hardware. But don't underestimate how important token speed it. Like a claude that goes 20 toks a second is worthless to me compared to a fast 30b qwen model

[-]

ea_man@reddit

If you already have a GPU it costs you nothing

how much privacy and autonomy is worth for you?

look buying and config stuff ain't never gonna be cheaper than "free tier" online

[-]

zakadit@reddit (OP)

It helps me a lot for work, so if the difference is something I can genuinely feel, then I’m basically looking at two possibilities:

either I need more hardware, more optimization, a better setup, etc.;
or I’ve hit a real ceiling where it’s not even a money problem anymore, it’s just the underlying tech.

[-]

ea_man@reddit

I don't understand.

Can you upload your data to an external provider? Yes or no?

[-]

Comfortable-End-3731@reddit

No you will not get the same experience not today at least. You can get close but you’ll notice the limitations. Speed and smoothness, sure. Intelligence? Depends on what intelligence you’re looking for. You’ll be able to write papers and book reports, sure. You might be able to code. but you’ll probably need a frontier model to debug or refactor if it’s complicated code. But you won’t be able to do super complex tasks, super complex reasoning.

[-]

a_beautiful_rhind@reddit

I regret not buying more ram or a different board. But every time some API drama comes out or there are rate limits I can just fire something up that's not too shabby.

[-]

sinevilson@reddit

Yes! Build it.

[-]

SecondFriendly4255@reddit

For me its depend on your hobbies if you can spend 5h per day on it for tuning try different model experiment and other stuff local is worth no matter your setup. The main difference for me is local is token free you can do redo and no stress about the bills the only things is latency so you have to admit yeah that will take more time to have the result of your experience. In terms of quality honestly now we have good models that have vision audio you don’t need a one shoot models since you don’t buy anything after the hardware cost.

For me if each week or day you spend more than 5h talk to an llm is time to have something local.

What to buy it depend now on how you are involve on it.

Sorry for my English :) I hope that will help you for technical advice don’t hesitate to ask me

[-]

zakadit@reddit (OP)

I don’t really see how that applies to my case. I’m not asking whether I can use it well by “driving” it. I’m asking whether my car can even realistically claim to offer performance in the same league as the billionaire team’s car.