Is there any top level hobbyist hardware you guys are waiting to come out this year?
Posted by Tired__Dev@reddit | LocalLLaMA | View on Reddit | 74 comments
So I've explored buy everything from an RTX 6000 to Mac Studio 512gb M3 ultra to a DGX Spark (I need to travel) for local llm generation. I was about to start looking into a M5 macbook, but I figured I'd ask you guys if there was anything you were waiting for?
Double_Cause4609@reddit
If Taalas pans out and delivers a read-only LLM-on-a-card like they promised at crazy speeds I'd 100% buy one. There's a lot I can do with four digit token speeds.
Anything else...I'm a little tepid about buying right now. I get the feeling that by \~2029 we're going to be looking at a fundamentally different paradigm in hardware (the industry follows five years cycles roughly, due to how long it takes to bring up hardware, which means by 2029 you're going to see the first wave of hardware that really "gets" what running an LLM needs).
To that end I don't really want to invest too much into top-end hardware right now.
Plus a lot of things I was excited for have been pushed back to 2027 or even 2028.
I'd rather wait and see if anything comes out that's better suited to running the really big MoE models than current solutions before I pull the trigger on current-paradigm hardware, and even then I'm a bit tepid.
Comfortable_Ad_8117@reddit
I tend to agree here too.. I make the best of my 3060 and 5060 for all my needs. However as this space matures we will be able to run better models on less hardware and at some point Ai will be like spell check - Everyone has it and all hardware can handle it.
Cold_Tree190@reddit
Yup agreed, for now I’ve been trying to make my current hardware (3090, 64 gb ddr4) work, and when that fails then I just fall back to openrouter and use different API’s for models. Gonna play the waiting game for consumer hardware unfortunately
ryfromoz@reddit
Indeed and i have double your ram and four 3090s!
vdc_hernandez@reddit
Very good comment!
suprjami@reddit
Waiting for llama.cpp TQ/RQ to arrive so 2x 16Gb becomes the new 27B Q6 powerhouse.
gaspoweredcat@reddit
something that hasnt yet been announced since nothing seems to hit the mark yet, if we had better memory speeds on things like the ai max it wouldnt be so bad, still stupid pricey but better, and if we could have up to 512mb maybe camm memory will help but thatll no doubt be even more pricey.
Pretend_Engineer5951@reddit
This year I bought two strix halos. Everything I wish now is that llama.cpp had tensor parallelism supported for running between these two nodes. vllm is not so handy and it's pain to find appropriate model for it.
SmartCustard9944@reddit
I’m eyeing clustering two Strix Halos for hobby purposes, just to toy around with a pseudo toy server.
How do you find the experience going so far with it? I have a Bosgame M5 coming soon and if I pull the trigger I might get a second one.
Pretend_Engineer5951@reddit
I've got GMKTec's Evo X2. Connected them via USB4. I know the low latency is the first priority so started researching on optimizations under Linux. With all kernel tweaks ping latency didn't go below 0.2ms. Then I found OdinLink-Five kernel driver which helped to drop latency down to 0.065ms. Under special benchmark p95 \~30ns. It's nice but not so good as special RoCE network cards which operate on \~5ns.
I tested vllm and was disappointed by poor performance on one node comparing to similar quant model on llama.cpp. I know vllm demonstrates its benefits under many concurrent requests but I don't need it with casual agent mode. But the good news is that with tensor parallelism even with relatively high latency at USB4 connection performace scaled about 1.5-1.7x.
As for llama.cpp the only available mode is RPC. It's very stable but it almost don't give any performance boost. The only benefit is that you can work with models up to 200-210Gb (leaving headroom for KV cache). But after Qwen 3.6 27B release and it's capabilities I wonder if having \~200Gb for model still have a reason? When I made comparison with Minimax M2.7 UD-Q6 (\~190Gb) I didn't discover much benefits.
At the moment having all described software capabilites maybe there's another useful case - fine-tuning. I'll give a try maybe one day.
Jords13xx@reddit
OdinLink-Five sounds like a solid improvement! Have you tried tweaking any other settings or hardware? I’ve heard some good things about using specialized network cards for even lower latency, but it sounds like you’re already on top of your optimizations.
Pretend_Engineer5951@reddit
Yes it is. Confirmed working on Minisforum S1 and GMKtec Evo X2. But another caveat is high cpu utilisation even on idle. Cpu just burns upto 86 Celsius degrees without any work. I suppose it can be resolved particularly refusing from vllm toolbox to exclude network userspace overhead ops. But it would take some time to rebuild vllm with all patches. As for other tweaks I just followed typical guides for linux setup on strix halo.
SmartCustard9944@reddit
So, are you saying that the hardware is there and it could offer 1.7x over single device for both pp and tg, it’s just that currently llama.cpp is undercooked here? That sounds like the type of challenge I enjoy digging into
Pretend_Engineer5951@reddit
Yes there's a good potential to speed boost. But I bought 2nd strix halo also keeping in mind that if I fail with usb4 link I'll build RoCE with nvme->pci-e adapters-> Mellanox.
grunt_monkey_@reddit
RDNA5! Hopefully a radeon 9800 or something with 64gb vram would be the sweet spot.
_underlines_@reddit
RTX 6000 PRO 96gb vram, fast, 8-9k
DGX Spark, 128gb, slow, 4-5k
mac studio 512gb m3 ultra - discontinued?
macbook m5 128gb - slow, 5-6k
Minisforum MS-S1 MAX ryzen ai max+ 395 128gb - slow, 3k
Framework Desktop ryzen ai max+ 395 128gb - slow, 3k
pick your poison
RemarkableGuidance44@reddit
I got 4 x B70's $6000 128GB VRAM on a 64 core thread ripper and 512GB of memory. Great for large models, still a lot of software tweaks from Intel to come, good alternatives to my dual 5090s. Run it 24/7 at 180w per card on full load.
Compared to my dual 5090s 650w. lol
Asthenia5@reddit
How does it performance compare to the dual 5090s?
fallingdowndizzyvr@reddit
Currently it's comparable in speed to Strix Halo or Spark. I wouldn't hold your breath in hopes that the drivers get better. I'm still waiting for my A770 to get better.
RemarkableGuidance44@reddit
They already are making a ton of gains... A card from 2022, you should upgrade... You remind me of a friend of my who is complaining that their 2080ti is not getting the latest updates..
https://www.reddit.com/r/LocalLLaMA/comments/1swgwvh/mesa_pr_with_37130_llamacpp_pp_perf_gain_for/
fallingdowndizzyvr@reddit
LOL. Yeah. I've heard that before. Let's see what actually comes out.
You remind me of a newb who just go into it and doesn't know any history.
RemarkableGuidance44@reddit
Err, not as fast, still early software, but Intel is updating daily. But I can run models that require a lot more VRAM and I can also split models up and automate a lot of tasks 24/7 without the huge power bill.
I use my 5090's for Image Gen and 3d Gen + Gaming.
hlzn13@reddit
Are there any cons to go with Asus GX10? I'm thinking on buying soon, there hasn't been any news about a second version of that dgx spark device, right?
redmctrashface@reddit
How the hell m5 128gb is slow? Could you elaborate?
3dom@reddit
Its output speed is at 2/3 of 5090 which in turn is half of RTX 6000.
iMrParker@reddit
M5 Max is like 600 GB/s which is more like a third of a 5090 or Pro 6000
redmctrashface@reddit
Thx for all the answers. Another question: 96gb is not a lot, so what is the point to have a very fast device if you can't run 100+ models?
DAlmighty@reddit
You’re not running 100+ models simultaneously on anything, or do you mean 100B+ parameter models or what? Your question is a bit ambiguous.
Also, there’s not much of a great reason to choose a Pro 6000 over cheaper hardware just for inference. These accelerators are crossing over the line into model training and fine tuning.
redmctrashface@reddit
Sorry about that, I meant 100B+ parameter models. I didn't know for fine tuning, is it because CUDA is more developped than other platform?
PS: funny how some morons downvote me as I am asking question because I am still learning. Impressive.
DAlmighty@reddit
You can run 100B+ parameter MOE models on a pro 6000.
The internet is a strange place that’s becoming increasingly unreliable. Don’t let the anonymous downvotes get to you.
iMrParker@reddit
It's "fast", but in LLM context it's lacking in compute for pre-fill. It's excellent for a laptop, and probably good enough for most people, but not for anything serious
Norwood_Reaper_@reddit
Compared the bandwidth of the RTX Pro 6000 blackwell it is slow
fallingdowndizzyvr@reddit
You mean 3.5K.
https://www.centralcomputer.com/asus-ascent-gx10-personal-ai-supercomputer-with-nvidia-gb10-grace-blackwell-superchip-128gb-unified-lpddr5x-memory-1tb-pcie.html
Or get a M5 for $2600.
https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395
RogerRamjet999@reddit
You seem to have forgotten that the more you buy, the more you save! /s
Glad-Audience9131@reddit
https://i.redd.it/ob7b9ioejkxg1.gif
Ok-Internal9317@reddit
4xv100 with nvlink still holds a beat, around 2k for full system, power draw is a bit insane tho
Hyp3rSoniX@reddit
I'm waiting for the Tiiny AI thing to come out to run local models on it. Hope I didn't get massively scammed by kickstarter...
Negative-Fishing3287@reddit
Macrumors is indicating that they expect a MacStudio refresh mid to late this year.
DeepOrangeSky@reddit
Yea, although to clarify, up until this past week, it was being expected for that WWDC conference in June (June 8th - June 12th). But, about a week ago, the main leaker guy about Apple said that the Mac Studio was going to be significantly delayed, to probably around October, rather than June.
Also unclear if it'll even be a Mac Studio Ultra in October, or if that will merely even be one of the lesser versions, with the Ultra being delayed even further.
When considering how many iPhones worth of RAM is in the highest ram spec'd versions of the Ultra, and how bad the shortages are, and how many people would try to buy the maxxed out version of the Ultra if it was at the old price, it starts getting pretty scary to wonder about, as far as whether we'll actually get anything really awesome and good value for an M5 Ultra anytime soon :\
thrownawaymane@reddit
Yeah, Apple is supposedly going to run through their pre shortage RAM around that time. We as consumers will definitely pay more for the new Studios. The question is how much more... Apple does hate rasing prices so expect either some sort of crazy justification or a complete refusal to talk about the increase at all
Southern_Sun_2106@reddit
Macbook Pro M5 Max is probably the best right now in terms of travel/local models. However, recent qwen 27B release made 5090 24GB VRAM notebooks an attractive option too. I believe it will be faster on peepee and generation than the MacBook.
Wild-File-5926@reddit
TAALAS ASIC Card to run models without GPU and Steam Frame
GMerton@reddit
No way they get to sell it to prosumers though…?
Wild-File-5926@reddit
The market dynamics materialize however they might, no one knows for sure. Cost aspect aside, its stupid fast. Reminged me of this discussion https://www.reddit.com/r/LocalLLaMA/comments/1r9e27i/free_asic_llama_31_8b_inference_at_16000_toks_no/
GMerton@reddit
Yeah it would be awesome if they could sell it to prosumers. At least make it available as a VM on the cloud.
AndreVallestero@reddit
Mac studio M5 ultra, though I suspect it'll be $20k... Though with the good news of Deepseek V4 running inference on Huawei hardware, I might holdout for Chinese GPUs/NPUs.
GMerton@reddit
I remember I checked ascend chips and they were not very good value.
AndreVallestero@reddit
The previous Ascend chips weren't great caused they used DDR4. The new ones (Ascend 950) they're using for deepseek v4 are based on ddr5 and hbm1
SpicyLentils@reddit
I'm waiting for the Mac Studio M3 Ultra 256 GB to become available, or the Studio M5 Ultra with 256 GB.
flower-power-123@reddit
I just saw a kickstarter for a 3000 euro machine called the "Hilbert Agentic Computer". If you feel like the international situation is stable enough (and you trust the company) to put money into a kickstarter that might be a good option.
HopePupal@reddit
literally just a standard Strix Halo with some completely insane marketing claims and a price tag to match. doesn't even seem to have a PCIe slot like the Minisforum or Framework.
randomperson32145@reddit
Yes, with the ai recent ai wave, i think tons of potential mods for hobby enthusiasts will be available, like customizing software without needing major engineering skills. Im looking forward to creating my own plugins for example.
Sabin_Stargem@reddit
Not this year, no. I am waiting for AM6 and DDR6 to be released, before building a new machine.
I am figuring on taking one of two paths on the skill tree.
A: Build an endgame Threadripper DDR5 machine. This means that I can mostly just update all the firmware in one go, and be fairly sure everything will be reliable. Plus, I might be able to get bargains for this generation of gear.
B: Go for AM6/Threadripper+DDR6. More expensive and less reliable, but this generation of equipment has greater potential for running bigger AI models.
Odds are that I would use a Threadripper. If there are gaming-oriented server boards, I might go down the path of a "F" class AMD server processor. Server boards as of now, lack things like audio. I want to hear my waifu talk someday, so that is an issue for this. The main appeal of server boards is simply the huge amount of cores and threads that are cheaper, but weaker, than a Threadripper's offerings.
Currently leaning towards the AM5 endgame option. Money is the biggest issue, so going relatively cheap is probably the way to go.
HopePupal@reddit
you don't really need motherboard audio support. your GPU is perfectly capable of sending sound to your monitor and your monitor probably has a speaker/headphone jack. if it doesn't, or if you want a fancier amp, there are plenty of nice USB DACs.
Terminator857@reddit
AMD Gorgon halo / Intel nova lake ax . https://www.google.com/search?q=amd+gorgon+halo+intel+nova+lake+ax
2027: AMD Medusa Halo, 50% performance improvement with 6 memory channels up from 4 channels.
fallingdowndizzyvr@reddit
If you already have Strix Halo, not worth the upgrade. It's a minor rev.
ProfessionalSpend589@reddit
What if you already have 2 strix halos, but can’t run GLM 5?
mindwip@reddit
Yep the 2027 can't come fast enough. Of you don't have a strix halo now might be ok to wait for 2026version if you can't wait 2027
akali1987@reddit
Gorgon looks like it finally matches NVIDIA DGX spark. Intel nova lake looks really interesting Medusa looks to match apple silicon. Wonder if it’s going to be unified memory too.
I didn’t know these were coming out, thanks!
mindwip@reddit
Amd GPU with more memory, hoping for 64 to 96gb card.
Next strix halo, the 2026 version is slight upgrade, really want the 2027 with lpddr6x with its wider bandwidth
floconildo@reddit
Medusa Halo is scheduled to be unveiled H1 2026, although I personally think it will be delayed to H2 2026.
IF they manage to show it in the first half I expect new devices to be arriving by late 2026 and the software to be working properly by early 2027.
If you need it to be ultra portable then I'd argue your best option is the MacBook Pro/Ultra M6 that's speculated to launch in 2026.
None of those will be cheap, but since you're talking buying a RTX 6000 I assume money isn't exactly an issue here.
fallingdowndizzyvr@reddit
You mean M5.
floconildo@reddit
Yup! Fixed it, thanks!
taking_bullet@reddit
I've been waiting for RTX 5070 Ti Super 24GB, but it's not on the table anymore.
a_beautiful_rhind@reddit
I am waiting for ram to go down, especially used DDR4 ram because current price is ridiculous.
Will it happen? Eh.. i dunno.
jeremyckahn@reddit
I'm waiting for the Tiiny AI box to ship!
47FsXMj@reddit
I was thinking a new Mac Mini as headless server (openclaw, opencode, hermes...whatever) and DGX Spark for the muscle. Anybody got an idea when a DGX Spark update/refresh is probable for release?
Elegant_Tech@reddit
Memory prices and constraints have pushed almost everything back to next year. All the things I was hoping for last fall won't be coming now.
ttkciar@reddit
Me too. I'm planning on not buying any more hardware until 2028 or 2029.
hyouko@reddit
I don't think nVidia has anything slated for this year. It's at all possible there might be an M5-based Mac Studio, but I wonder if the 512GB of RAM models are coming back with the current RAM shortage.
More-Curious816@reddit
NVIDIA has nothing consumer targeted this year, probably next year too unless they release the ti versions of 5x to appeal the gamers.
Anbeeld@reddit
R.I.P 5070 Ti Super.
RogerRamjet999@reddit
Yeah, I was hyped for that one, but it looks to be gone for good.
SlimPerceptions@reddit
I just want a mac mini m5. That’s all