One Bottleneck After Another - First GPU & now RAM
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 10 comments
So many threads like this on multiple subs for last couple of months.
Terrible timeline for some *sigh*
Massive-Question-550@reddit
Let's be clear, buying a rtx 6000 pro was never cheap or a good value.
Kitchen-Year-8434@reddit
I’m on the fence on the good value part. Being able to get 170t/s on gpt-oss-120b with 2.5-4K prompt processing speed has changed when and how I think about using ai. Basically everywhere all the time on everything just to see how it does.
80t/s on glm-4.5-air on q5. These are all imminently usable models for real local work and the combination of latency and gen speed really does make a difference.
So it depends on what value you’re looking for I guess.
Long_comment_san@reddit
Idk is it worth like 5-10 grand? If that earns you money sure, but I wanted 24-32 gb vram at ~800-1200$ for cool roleplays and now I'm kinda baffled sitting on my ass with my 4070 with 12gb VRAM. God saved us with MOE models there. Job is one thing, casual guys like me are kind of screwed for a couple of years, it's mining all over again.
Kitchen-Year-8434@reddit
As disappointing as it is, my answer is: it depends.
So - it depends. At that price point (say 5-10k), there's a lot of directions to go, and it seems like each offering kind of has its own niche based on some hardware limitations and a balance of tradeoffs between speed, size, power consumption, and compute.
MOE is a godsend for sure, but tbh I remain more impressed with gemma-3-27b as a model for pretty much anything other than code gen, and the qat version of that is, while not quite where you'd need with 12gb VRAM, still quite modest at 16.8GB at Q4 (link) or just over 14GB at 3.0bpw with exllamav3.
12GB of VRAM on a 4070 makes a ton of sense for gaming; that's a great amount of footprint there. Just when it comes to LLMs and needing VRAM for these super sparse, redundant models, turns out using a GPU isn't exactly what they were designed for. So it's pretty amazing we can get as far as we have with the current general-purpose GPU architecture, but just take a peek at what groq is doing for inference or google with their TPUs and realize we're all kind of hammering square pegs into round holes when it comes to our approach to inference right now.
CoralBliss@reddit
I feel this and am going through a similar experience. I spent 1200 on my computer. I do not have that kind of money to play with everyone right now. 5 to 10k? Fuck.
Salt_Discussion8043@reddit
Ur meant to utilise AI to make more money LOL not do roleplay only
MitsotakiShogun@reddit
"buying a rtx 6000 pro" wasn't, but what about 4-8x?!
CharmingRogue851@reddit
"When's AI bubble burst?"
Looool, real
Disastrous_Meal_4982@reddit
For real, we need slow deflation that tappers off to a steady market asap. If the bubble bursts it’s going to be more pain, not less.
Aggressive-Bother470@reddit
There's no waiting for a better deal in this game.
It's buy now or get fucked later.