Is there any top level hobbyist hardware you guys are waiting to come out this year?

[-]

Double_Cause4609@reddit

If Taalas pans out and delivers a read-only LLM-on-a-card like they promised at crazy speeds I'd 100% buy one. There's a lot I can do with four digit token speeds.

Anything else...I'm a little tepid about buying right now. I get the feeling that by \~2029 we're going to be looking at a fundamentally different paradigm in hardware (the industry follows five years cycles roughly, due to how long it takes to bring up hardware, which means by 2029 you're going to see the first wave of hardware that really "gets" what running an LLM needs).

To that end I don't really want to invest too much into top-end hardware right now.

Plus a lot of things I was excited for have been pushed back to 2027 or even 2028.

I'd rather wait and see if anything comes out that's better suited to running the really big MoE models than current solutions before I pull the trigger on current-paradigm hardware, and even then I'm a bit tepid.

[-]

Comfortable_Ad_8117@reddit

I tend to agree here too.. I make the best of my 3060 and 5060 for all my needs. However as this space matures we will be able to run better models on less hardware and at some point Ai will be like spell check - Everyone has it and all hardware can handle it.

[-]

Cold_Tree190@reddit

Yup agreed, for now I’ve been trying to make my current hardware (3090, 64 gb ddr4) work, and when that fails then I just fall back to openrouter and use different API’s for models. Gonna play the waiting game for consumer hardware unfortunately

[-]

ryfromoz@reddit

Indeed and i have double your ram and four 3090s!

[-]

vdc_hernandez@reddit

Very good comment!

[-]

suprjami@reddit

Waiting for llama.cpp TQ/RQ to arrive so 2x 16Gb becomes the new 27B Q6 powerhouse.

[-]

gaspoweredcat@reddit

something that hasnt yet been announced since nothing seems to hit the mark yet, if we had better memory speeds on things like the ai max it wouldnt be so bad, still stupid pricey but better, and if we could have up to 512mb maybe camm memory will help but thatll no doubt be even more pricey.

[-]

Pretend_Engineer5951@reddit

This year I bought two strix halos. Everything I wish now is that llama.cpp had tensor parallelism supported for running between these two nodes. vllm is not so handy and it's pain to find appropriate model for it.

[-]

SmartCustard9944@reddit

I’m eyeing clustering two Strix Halos for hobby purposes, just to toy around with a pseudo toy server.

How do you find the experience going so far with it? I have a Bosgame M5 coming soon and if I pull the trigger I might get a second one.

[-]

Pretend_Engineer5951@reddit

I've got GMKTec's Evo X2. Connected them via USB4. I know the low latency is the first priority so started researching on optimizations under Linux. With all kernel tweaks ping latency didn't go below 0.2ms. Then I found OdinLink-Five kernel driver which helped to drop latency down to 0.065ms. Under special benchmark p95 \~30ns. It's nice but not so good as special RoCE network cards which operate on \~5ns.

I tested vllm and was disappointed by poor performance on one node comparing to similar quant model on llama.cpp. I know vllm demonstrates its benefits under many concurrent requests but I don't need it with casual agent mode. But the good news is that with tensor parallelism even with relatively high latency at USB4 connection performace scaled about 1.5-1.7x.

As for llama.cpp the only available mode is RPC. It's very stable but it almost don't give any performance boost. The only benefit is that you can work with models up to 200-210Gb (leaving headroom for KV cache). But after Qwen 3.6 27B release and it's capabilities I wonder if having \~200Gb for model still have a reason? When I made comparison with Minimax M2.7 UD-Q6 (\~190Gb) I didn't discover much benefits.

At the moment having all described software capabilites maybe there's another useful case - fine-tuning. I'll give a try maybe one day.

[-]

Jords13xx@reddit

OdinLink-Five sounds like a solid improvement! Have you tried tweaking any other settings or hardware? I’ve heard some good things about using specialized network cards for even lower latency, but it sounds like you’re already on top of your optimizations.

[-]

Pretend_Engineer5951@reddit

Yes it is. Confirmed working on Minisforum S1 and GMKtec Evo X2. But another caveat is high cpu utilisation even on idle. Cpu just burns upto 86 Celsius degrees without any work. I suppose it can be resolved particularly refusing from vllm toolbox to exclude network userspace overhead ops. But it would take some time to rebuild vllm with all patches. As for other tweaks I just followed typical guides for linux setup on strix halo.

[-]

SmartCustard9944@reddit

So, are you saying that the hardware is there and it could offer 1.7x over single device for both pp and tg, it’s just that currently llama.cpp is undercooked here? That sounds like the type of challenge I enjoy digging into

[-]

Pretend_Engineer5951@reddit

Yes there's a good potential to speed boost. But I bought 2nd strix halo also keeping in mind that if I fail with usb4 link I'll build RoCE with nvme->pci-e adapters-> Mellanox.

[-]

grunt_monkey_@reddit

RDNA5! Hopefully a radeon 9800 or something with 64gb vram would be the sweet spot.

[-]

_underlines_@reddit

RTX 6000 PRO 96gb vram, fast, 8-9k
DGX Spark, 128gb, slow, 4-5k
mac studio 512gb m3 ultra - discontinued?
macbook m5 128gb - slow, 5-6k
Minisforum MS-S1 MAX ryzen ai max+ 395 128gb - slow, 3k
Framework Desktop ryzen ai max+ 395 128gb - slow, 3k

pick your poison

[-]

RemarkableGuidance44@reddit

I got 4 x B70's $6000 128GB VRAM on a 64 core thread ripper and 512GB of memory. Great for large models, still a lot of software tweaks from Intel to come, good alternatives to my dual 5090s. Run it 24/7 at 180w per card on full load.

Compared to my dual 5090s 650w. lol

[-]

Asthenia5@reddit

How does it performance compare to the dual 5090s?

[-]

fallingdowndizzyvr@reddit

Currently it's comparable in speed to Strix Halo or Spark. I wouldn't hold your breath in hopes that the drivers get better. I'm still waiting for my A770 to get better.

[-]

RemarkableGuidance44@reddit

They already are making a ton of gains... A card from 2022, you should upgrade... You remind me of a friend of my who is complaining that their 2080ti is not getting the latest updates..

https://www.reddit.com/r/LocalLLaMA/comments/1swgwvh/mesa_pr_with_37130_llamacpp_pp_perf_gain_for/

[-]

fallingdowndizzyvr@reddit

LOL. Yeah. I've heard that before. Let's see what actually comes out.

You remind me of a friend of my who is complaining that their 2080ti is not getting the latest updates..

You remind me of a newb who just go into it and doesn't know any history.

[-]

RemarkableGuidance44@reddit

Err, not as fast, still early software, but Intel is updating daily. But I can run models that require a lot more VRAM and I can also split models up and automate a lot of tasks 24/7 without the huge power bill.

I use my 5090's for Image Gen and 3d Gen + Gaming.

[-]

hlzn13@reddit

Are there any cons to go with Asus GX10? I'm thinking on buying soon, there hasn't been any news about a second version of that dgx spark device, right?

[-]

redmctrashface@reddit

How the hell m5 128gb is slow? Could you elaborate?

[-]

3dom@reddit

Its output speed is at 2/3 of 5090 which in turn is half of RTX 6000.

[-]

iMrParker@reddit

M5 Max is like 600 GB/s which is more like a third of a 5090 or Pro 6000

[-]

redmctrashface@reddit

Thx for all the answers. Another question: 96gb is not a lot, so what is the point to have a very fast device if you can't run 100+ models?

[-]

DAlmighty@reddit

You’re not running 100+ models simultaneously on anything, or do you mean 100B+ parameter models or what? Your question is a bit ambiguous.

Also, there’s not much of a great reason to choose a Pro 6000 over cheaper hardware just for inference. These accelerators are crossing over the line into model training and fine tuning.

[-]

redmctrashface@reddit

Sorry about that, I meant 100B+ parameter models. I didn't know for fine tuning, is it because CUDA is more developped than other platform?

PS: funny how some morons downvote me as I am asking question because I am still learning. Impressive.

[-]

DAlmighty@reddit

You can run 100B+ parameter MOE models on a pro 6000.

The internet is a strange place that’s becoming increasingly unreliable. Don’t let the anonymous downvotes get to you.

[-]

iMrParker@reddit

It's "fast", but in LLM context it's lacking in compute for pre-fill. It's excellent for a laptop, and probably good enough for most people, but not for anything serious

[-]

Norwood_Reaper_@reddit

Compared the bandwidth of the RTX Pro 6000 blackwell it is slow

[-]

fallingdowndizzyvr@reddit

DGX Spark, 128gb, slow, 4-5k

You mean 3.5K.

https://www.centralcomputer.com/asus-ascent-gx10-personal-ai-supercomputer-with-nvidia-gb10-grace-blackwell-superchip-128gb-unified-lpddr5x-memory-1tb-pcie.html

Minisforum MS-S1 MAX ryzen ai max+ 395 128gb - slow, 3k Framework Desktop ryzen ai max+ 395 128gb - slow, 3k

Or get a M5 for $2600.

https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395

[-]

RogerRamjet999@reddit

You seem to have forgotten that the more you buy, the more you save! /s

[-]

Glad-Audience9131@reddit

https://i.redd.it/ob7b9ioejkxg1.gif

[-]

Ok-Internal9317@reddit

4xv100 with nvlink still holds a beat, around 2k for full system, power draw is a bit insane tho

[-]

Hyp3rSoniX@reddit

I'm waiting for the Tiiny AI thing to come out to run local models on it. Hope I didn't get massively scammed by kickstarter...

[-]

Negative-Fishing3287@reddit

Macrumors is indicating that they expect a MacStudio refresh mid to late this year.

[-]

DeepOrangeSky@reddit

Yea, although to clarify, up until this past week, it was being expected for that WWDC conference in June (June 8th - June 12th). But, about a week ago, the main leaker guy about Apple said that the Mac Studio was going to be significantly delayed, to probably around October, rather than June.

Also unclear if it'll even be a Mac Studio Ultra in October, or if that will merely even be one of the lesser versions, with the Ultra being delayed even further.

When considering how many iPhones worth of RAM is in the highest ram spec'd versions of the Ultra, and how bad the shortages are, and how many people would try to buy the maxxed out version of the Ultra if it was at the old price, it starts getting pretty scary to wonder about, as far as whether we'll actually get anything really awesome and good value for an M5 Ultra anytime soon :\

[-]

thrownawaymane@reddit

Yeah, Apple is supposedly going to run through their pre shortage RAM around that time. We as consumers will definitely pay more for the new Studios. The question is how much more... Apple does hate rasing prices so expect either some sort of crazy justification or a complete refusal to talk about the increase at all

[-]

Southern_Sun_2106@reddit

Macbook Pro M5 Max is probably the best right now in terms of travel/local models. However, recent qwen 27B release made 5090 24GB VRAM notebooks an attractive option too. I believe it will be faster on peepee and generation than the MacBook.

[-]

Wild-File-5926@reddit

TAALAS ASIC Card to run models without GPU and Steam Frame

[-]

GMerton@reddit

No way they get to sell it to prosumers though…?

[-]

Wild-File-5926@reddit

The market dynamics materialize however they might, no one knows for sure. Cost aspect aside, its stupid fast. Reminged me of this discussion https://www.reddit.com/r/LocalLLaMA/comments/1r9e27i/free_asic_llama_31_8b_inference_at_16000_toks_no/

[-]

GMerton@reddit

Yeah it would be awesome if they could sell it to prosumers. At least make it available as a VM on the cloud.

[-]

AndreVallestero@reddit

Mac studio M5 ultra, though I suspect it'll be $20k... Though with the good news of Deepseek V4 running inference on Huawei hardware, I might holdout for Chinese GPUs/NPUs.

[-]

GMerton@reddit

I remember I checked ascend chips and they were not very good value.

[-]

AndreVallestero@reddit

The previous Ascend chips weren't great caused they used DDR4. The new ones (Ascend 950) they're using for deepseek v4 are based on ddr5 and hbm1

[-]

SpicyLentils@reddit

I'm waiting for the Mac Studio M3 Ultra 256 GB to become available, or the Studio M5 Ultra with 256 GB.

[-]

flower-power-123@reddit

I just saw a kickstarter for a 3000 euro machine called the "Hilbert Agentic Computer". If you feel like the international situation is stable enough (and you trust the company) to put money into a kickstarter that might be a good option.

[-]

HopePupal@reddit

literally just a standard Strix Halo with some completely insane marketing claims and a price tag to match. doesn't even seem to have a PCIe slot like the Minisforum or Framework.

[-]

randomperson32145@reddit

Yes, with the ai recent ai wave, i think tons of potential mods for hobby enthusiasts will be available, like customizing software without needing major engineering skills. Im looking forward to creating my own plugins for example.

[-]

Sabin_Stargem@reddit

Not this year, no. I am waiting for AM6 and DDR6 to be released, before building a new machine.

I am figuring on taking one of two paths on the skill tree.

A: Build an endgame Threadripper DDR5 machine. This means that I can mostly just update all the firmware in one go, and be fairly sure everything will be reliable. Plus, I might be able to get bargains for this generation of gear.

B: Go for AM6/Threadripper+DDR6. More expensive and less reliable, but this generation of equipment has greater potential for running bigger AI models.

Odds are that I would use a Threadripper. If there are gaming-oriented server boards, I might go down the path of a "F" class AMD server processor. Server boards as of now, lack things like audio. I want to hear my waifu talk someday, so that is an issue for this. The main appeal of server boards is simply the huge amount of cores and threads that are cheaper, but weaker, than a Threadripper's offerings.

Currently leaning towards the AM5 endgame option. Money is the biggest issue, so going relatively cheap is probably the way to go.

[-]

HopePupal@reddit

you don't really need motherboard audio support. your GPU is perfectly capable of sending sound to your monitor and your monitor probably has a speaker/headphone jack. if it doesn't, or if you want a fancier amp, there are plenty of nice USB DACs.

[-]

Terminator857@reddit

AMD Gorgon halo / Intel nova lake ax . https://www.google.com/search?q=amd+gorgon+halo+intel+nova+lake+ax

2027: AMD Medusa Halo, 50% performance improvement with 6 memory channels up from 4 channels.

[-]

fallingdowndizzyvr@reddit

AMD Gorgon halo summer this year. 15% faster memory clock speeds / bandwidth, than strix halo.

If you already have Strix Halo, not worth the upgrade. It's a minor rev.

[-]

ProfessionalSpend589@reddit

What if you already have 2 strix halos, but can’t run GLM 5?

[-]

mindwip@reddit

Yep the 2027 can't come fast enough. Of you don't have a strix halo now might be ok to wait for 2026version if you can't wait 2027

[-]

akali1987@reddit

Gorgon looks like it finally matches NVIDIA DGX spark. Intel nova lake looks really interesting Medusa looks to match apple silicon. Wonder if it’s going to be unified memory too.

I didn’t know these were coming out, thanks!

[-]

mindwip@reddit

Amd GPU with more memory, hoping for 64 to 96gb card.

Next strix halo, the 2026 version is slight upgrade, really want the 2027 with lpddr6x with its wider bandwidth

[-]

floconildo@reddit

Medusa Halo is scheduled to be unveiled H1 2026, although I personally think it will be delayed to H2 2026.

IF they manage to show it in the first half I expect new devices to be arriving by late 2026 and the software to be working properly by early 2027.

If you need it to be ultra portable then I'd argue your best option is the MacBook Pro/Ultra M6 that's speculated to launch in 2026.

None of those will be cheap, but since you're talking buying a RTX 6000 I assume money isn't exactly an issue here.

[-]

fallingdowndizzyvr@reddit

MacBook Pro/Ultra M6

You mean M5.

[-]

floconildo@reddit

Yup! Fixed it, thanks!

[-]

taking_bullet@reddit

I've been waiting for RTX 5070 Ti Super 24GB, but it's not on the table anymore.

[-]

a_beautiful_rhind@reddit

I am waiting for ram to go down, especially used DDR4 ram because current price is ridiculous.

Will it happen? Eh.. i dunno.

[-]