Does having an RTX 6000 blackwell make any difference for LLMs?
Posted by Specialist_Fox523@reddit | LocalLLaMA | View on Reddit | 45 comments
I'm trying to find a use case to justify keeping this card. It seems like the frontier models are so good and so fast and so cheap lately that the value proposition of local models has collapsed. Is there any reasons aside from privacy or specialized research that an average person would benefit from this much vram?
pfn0@reddit
no, there's no use-case, I'll take it off your hands for $1.
BonoboTrades@reddit
I'll do .90 cents :)
pfn0@reddit
Thanks for letting me keep the high bid.
BonoboTrades@reddit
Lol
Impossible_Disk_256@reddit
I'll double that offer.
seangalie@reddit
I'd do $3.50
userscren5@reddit
The reason I bought one is because it is a good Dev environment versus paying to run on the cloud and we host demos on it everything in production goes to the cloud everything not in production is ran locally to save money
user92554125@reddit
Rent it on runpod or something like that.
Specialist_Fox523@reddit (OP)
I did some quick LLM calculation for it, it does seem like it could be reasonable if it were being employed12-16hr per day. Not sure how likely that is though if you host on those sites.
user92554125@reddit
You could set a low TDP and undervolt the cards to get the best performance/watt ratio. Though have no idea whats the usual "capacity factor" of having the gpus in these services. Could be an option if power isn't really expensive where you live.
Specialist_Fox523@reddit (OP)
yeah it's not terribly expensive where I live, some simple math was suggesting it would net $500 per month with 12 hours per day of uptime usage, presuming it's consistently purchased. not terrible, but it would limit when i could use it. runpod doesnt seem like they accept individual gpus tho, so it would probably have to be vast.
Signal_Ad657@reddit
It comes down to uptime.
1.) Do you actually have the types of demands that would keep a GPU running pretty routinely?
2.) Do those needs, require 96GB VRAM?
If #1 is yes, proceed to #2, if #2 is no, get a smaller GPU, if #1 is no, buy a GPU anyway, they are cool 🤓🧙❤️👾🎉
Specialist_Fox523@reddit (OP)
I really dont have a legit workload, it feels like the learning curve for something like comfyui or agentic coding might be too difficult for a non-technical person
Signal_Ad657@reddit
Nah. I don’t want to self advertise but check this out. I’m having a lot of fun with it, you can just setup all your local AI stuff in one shot.
Took me like 6 months to get a great setup on my server and have all my apps and integrations figured out and now I can just hit install and go eat lunch or watch an episode of the Office and come back. Still early but pretty cool:
https://github.com/Light-Heart-Labs/DreamServer
I agree it’s hard right now getting started and it shouldn’t be if we want more people doing this.
NotYourMothersDildo@reddit
This looks really great! Unfortunately, the install didn’t auto detect my two GPUs. I opened a bug on GitHub.
Signal_Ad657@reddit
Yeah sorry! I just got Windows out the door today, I’m grinding away at Mac now, hoping to have dual GPU and multi GPU out this week. We are moving fast, I just brought in another person for multi platform and OS testing making sure we have good experiences on all the major setups. I’ll ping you as soon as multi GPU is up, and THANK YOU for checking it out and commenting!
Specialist_Fox523@reddit (OP)
This is cool, i'll check it out, thank you
Hefty_Development813@reddit
Privacy and hobby unless you have a business where you need it. How much did you spend on it? Send to me
Specialist_Fox523@reddit (OP)
I got it from a friend but I spent around 5.5k making a build for it. I haven't put it together yet and I'm getting cold feet haha
Hefty_Development813@reddit
Have you tried out comfyui out for making videos? That thing would be great for that.
With the recent news about openai cooperating with the govt, I think the privacy side is a real concern tho
Specialist_Fox523@reddit (OP)
I watched some of the pixaroma videos to see what it would take, but it seems to be a trend of newer models being closed so i'm a little uncertain about the future of it.
Hefty_Development813@reddit
No way, there are certainly closed models always coming out, but open models will continue, ltx 2.3 literally released today. It is always true, like LLMs, that the open models are not quite as good as the closed frontier models. But you gain control and freedom from censorship, plus the privacy side as we already stated.
Specialist_Fox523@reddit (OP)
I just see what's being put out on grok and wan 2.6 and im blown away. It doesnt seem like the gap between open and closed models is getting more vast to you?
Hefty_Development813@reddit
I think the open models trail the closed, yes, that's been the case for images, LLMs and now video since this all started. We have had big releases of open models that are more capable and I expect that to continue.
But overall, if you only care about having the current best model then yea, get memberships to the subscription closed models and sell the GPU. That's serious compute to have if you aren't interested in running things locally sort of on principle. These closed models are running in data centers, they will remain more capable, it's just that we lose control over our own access to compute by using them.
cosimoiaia@reddit
If you don't know what to do with an RTX 6000 and are advocating for subscription models, just sell it and subscribe, there's never a shortage of need for suckers.
sine120@reddit
What's your use case? If you're bulk processing tons of information that don't need the largest cloud models, running it locally might save you in API costs. If you need a local coding agent, you have several great Qwen options. If you only do limited inference sporadically, the cost of that card could go a long way on more capable models in the cloud.
Specialist_Fox523@reddit (OP)
Yeah, i'm starting to think that cloud use might more sense for someone like me
Double_Cause4609@reddit
Where that GPU starts making a huge difference is personalization and customization of LLMs.
A 24GB GPU is enough to *run* models, but not necessarily enough to train decently sized LLMs to customize for your needs.
Ie: if you want to train a 32B LLM, even with QLoRA, it can be tricky on a 24GB GPU at best, and you might even depend on CPU optimizers, etc.
In contrast, with a 96GB GPU, you can optimize 32B models, possibly even with FFT, or certainly at least with LoRA (not QLoRA), and you can very heavily customize them for your use cases. This could be training it on your specific codebase, or coding style, updating its knowledge of a specific framework, etc.
Frontier models can generally adapt to these things in-context, but often make the same mistake a hundred times, because they aren't being updated live.
If you're not doing training, tbh, that card's kind of overkill.
Specialist_Fox523@reddit (OP)
hmm interesting, thanks for the reply. I work in medicine and the frontier models or wrappers like OpenEvidence already seem incredibly good, i'm not sure if i could improve upon them even with more customization and training.
RudeboyRudolfo@reddit
Just wait till they want the real price for hosted models. At the moment they giving it for free because they want to lock you in. If you sell the card now, you will buy a similar card for much more money in the future. For the moment the rule is: never sell a gpu.
Specialist_Fox523@reddit (OP)
I just did an analysis and it suggested prices would have to increase by \~4.5X which still isn't too crazy for most subscription plans (\~50-60/month for 5mil tokens for gpt and gemini, almost double that for claude)
RudeboyRudolfo@reddit
Pretty sure it's impossible to make such an analysis at the moment. But for example look at streaming like netflix and stuff. Pirating went down because it was convenient and not so expensive at the time. Then it got more expensive, then they put ads it, then also the market fractured so you became less for the higher price...
At the moment the american retards started another war, maybe china takes taiwan... Nobody knows what happens next. Maybe the bubble bursts and only one company remains...
Sure you can sell the card, but I tell you: there is high chance that you will regret it in the end.
Specialist_Fox523@reddit (OP)
That's a fair point that there's a lot of uncertainty that's hard to calculate and price in, but the same is also true for the possibility of more disruptive advances that make inference cheaper, no? I think if i sold it, i would probably just try and get a 5090, it just doesnt seem like there's much for someone at my skill level to do that takes advantage of the extra vram.
RudeboyRudolfo@reddit
That's a question of efficiency. But don't count on the hosted models. That's just a bait. I use the free account from claude, but also have two cards, one with 16 gigs and one with 32 gigs. I'm pretty sure that the small models will become much better in the future.
Easy-Unit2087@reddit
Opus 4.6 is head and shoulders above everything else. But, if you're using heavily, you'll run into usage limits or pay exorbitant amounts for API. So it's better to let Opus 4.6 do planning and writing detailed specs, then offload some work to capable local LLMs. The best part is, you can just use Claude CLI for that too! I do feel like 64GB RAM, preferably more is needed to load qwen3-coder-next or similar at decent quants. I run a dual GB10 node with vLLM for agentic work with local models.
ProfessionalSpend589@reddit
If you can rise your income as the inflation rises or more, then probably not.
If you can’t easily do it - all services are increasing prices. Just look at what Broadcom did to VMware - cut off smaller companies and jacked up the prices.
Specialist_Fox523@reddit (OP)
What about the gap for local models though? Doesn't it also seems like things are trending to be more and more paywalled to access decent open models?
ProfessionalSpend589@reddit
I don’t think everyone is affected by that ‘gap’ equally.
We don’t even use it at work and certain people have been explicitly forbidden (I’m not among them). I do have permission to use it, but I haven’t done it on a work computer yet.
false79@reddit
Value proposition of local models has collapsed? Brah, it's sounds like you bought this card without a plan of ROI on it.
Current_Ferret_4981@reddit
If you fine tune or train it's a game changer vs other options. Basically nothing competes in that class, although renting computing to train is going to be cheaper most likely. But iterating a few times or wanting to fine tune new models as they come out will eventually prefer the 6000 over renting
baseketball@reddit
I think you have it right although I'm sure many people here would disagree. The sweet spot for local models seems to be the 8b-20b range because the GPU for running that is attainable. $8000 can pay for a lot of tokens or months of subscription. It doesn't make sense unless you need absolute privacy or you're continuously crunching data. The calculus could change if the frontier providers want to stop bleeding money and start jacking up prices.
Specialist_Fox523@reddit (OP)
Any reason to think that they may do this? It seems like they're willing to burn cash as long as it takes to hit AGI.
Tall_Instance9797@reddit
In terms of speed, it's as fast as a 5090. However, it's got 96GB of VRAM, which means you can run quite a lot of larger models at pretty reasonable speeds. It certainly makes a difference.
emprahsFury@reddit
What? Are you sure you are just not using it? You can have an entire chatgpt at home. TTS, STT, the llm, image gen. All at low latency and super speedy cause they're in vram. Buy yourself a domain to front it. I guess I'm just not sure really why you'd ask this question when you you can do all those things.
FPham@reddit
click bait post? Because this question would make sense in like cats or cooking sub but not in locallama