AMD has invented something that lets you use AI at home! They call it a "computer"
Posted by 9gxa05s8fa8sh@reddit | LocalLLaMA | View on Reddit | 107 comments
autonomousdev_@reddit
Ran a local model last year. power bill went up like $40 and it was way slower than just using the api. neat idea but not really ready for real products. unless you got some super secret stuff going on cloud is way cheaper for now
CatalyticDragon@reddit
All computers are the same just like all cars are the same. It's a fine comparison if you don't know anything about computers.
Otherwise, AMD is saying products based on Strix Halo, which offer a relatively low cost/low power/lots of memory, are ideal for autonomous local AI agents.
Strix will happy generate out 10-20 tokens / second on a 27-35b model in less than 100 watts so I tend to agree. If their prices hadn't gone up due to RAM shortages I'd have more than one of them.
Howard_banister@reddit
Is This is for dense models or MOE? I don't think they are good to run dense.
ComplexType568@reddit
MoE. Dense models run MUCH MUCH slower. I've heard single-digit tg tps from Q3.6 27B on a Strix, though maybe it could be better if optimized.
MoffKalast@reddit
Yeah that's probably accurate, I'm seeing about 2tg for G4-31B on regular DDR5, so probably 4-5 on the Halo.
CatalyticDragon@reddit
People are getting 10+ t/s on Qwen 3.6 27B dense and twice that with 35b MOE.
Hefty_Acanthaceae348@reddit
How come only twice the speed? I would've expected roughly a 9x increase given the active parameters
kaeptnphlop@reddit
I get consistent 40-50tks/s with Q6 and spec-mod decoding in coding tasks.
HomsarWasRight@reddit
I have not been able to get spec decoding working with Qwen models on my Strix Halo machine. Can I inquire about your software stack?
Look_0ver_There@reddit
With DFlash enabled variants of vLLM, Strix Halo's are now running at 18-20t/s with 27B, and around 60t/s with 35B, although this is admittedly a very recent thing (like in the last 4 days)
pulse77@reddit
AMD Strix Halo has 215 GB/s memory bandwidth. 27B model in 8-bit quantization requires \~27GB/token => 215 GB/s divided by 27GB/token gives you \~8 tokens/second. With 4-bit quantization this will be doubled to \~16 tokens/second.
1ncehost@reddit
I get 31 tok/s with a 229B model on mine. Thats minimax m2.7, which is also probably the smartest model for strix halo.
woct0rdho@reddit
20 t/s is slow by today's standard. I'm getting > 50 t/s for Qwen 3.6 35B-A3B Q6 with fairly long context.
ArtifartX@reddit
I think maybe you missed the joke OP was making, but disregarding that the point the joke makes still kind of stands - there have been low cost per power, low cost per performance 'computer' options out there as well (both prebuilt or DIY build your own) before this.
I'm all for this product announcement btw, just finding it even more hilarious that the top comment is someone who got offended and is lashing out about OP's knowledge of computers. You sound more like someone who take offense to anything remotely negative said about AMD than someone who knows things about computers.
turtleWatcher18@reddit
Yeah the ram prices really killed the utility of these unless you desperately need a large model
taking_bullet@reddit
Dear Lisa Su, I don't care about your Agent Computers. Give me RX 9080 XT with 24GB VRAM. Thanks in advance.
suprjami@reddit
You accidentally spelled 48GB VRAM wrong.
AdOne8437@reddit
you spelled 512GB vram wrong
suprjami@reddit
This card would cost more than the GDP of a small country 😅
AdOne8437@reddit
Really? That is heaper than expected.
EndlessZone123@reddit
You mean the AMD Radeon AI PRO R9800 48GB?
Etroarl55@reddit
All the VRAM in the world but inference speed is still an Rtx 5060.
The r9700 is an gpu I seriously considered, until the blower fan complaints and slow inference speed made it clear it’s not mature yet for use.
Look_0ver_There@reddit
The fans that the Powercolor and Sapphire use absolutely terrible. The fans on the XFX and AsRock branded cards are actually quite reasonable and don't have the annoying intrusive whine that the PowerColor and Sapphire based models have.
I've tried, and returned, the PC and Sapphire models. The XFX was great (noise wise). The AsRock was even a bit better.
Eyelbee@reddit
I prefer a unified memory chip configurable up to 512gb+. Why do you need a 9080 XT? You can find all kinds of 24GB cards everywhere anyway.
hainesk@reddit
Or a cheaper 96GB VRAM RTX 6000 Pro competitor.
ImportancePitiful795@reddit
Well at this point can get 3 or 4 R9700s for the cost of a single RTX5090.
And 128GB VRAM at 1/3 cost of a RTX6000 96GB. 🤔
Before someone says about electricity... Even if you run the extra 2 R9700s (4 R9700 system) for 5 years 24/7 won't pay the extra money the RTX6000 costs.
Rude_Ambassador_6270@reddit
It's not about electricity but rather having to deal with overhead of running 4 GPUs
I have strix halo machine and can extend it with an RTX6000 via USB4 and an external rig+psu. Or I need an entire machine (and expensive one) or other complex setup for running 4 GPUs or more. Also AMD support is still crap while novidya is being just literally "plug'n'play" for most cases. CUDA supports even ancient GPUs whie ROCm, to my knowledge, doesn't even support the current line in full, just the few selected models, not to mention dropping older GPUs completely. "AMD - getting your GPGPU customers fucked since Fury X."
ImportancePitiful795@reddit
It gives you R9700 32GB which is cheaper than the RTX5080 16GB and 1/3 to 1/4 (model depending) the price of the RTX5090 32GB.
What more you want?
Can get 3-4 R9700s (128GB VRAM) for as much a single 5090 these days. And 1/3 price that of RTX6000
taking_bullet@reddit
I have no idea where you found such cheap R9700. It costs literally 300€ more than a RTX 5080 at every retailer.
ImportancePitiful795@reddit
UK
Powercolor R9700 is £1200 and the good 5080s start at £1200. Except if you want to buy Gainward for £1100. While cheapest 5090 is £3000
In most European countries R9700 is €1200-€1400 with the good 5080s starting at €1300. While cheapest 5090 starts €3900. So effectively 3 R9700s.
madaerodog@reddit
DGX spark with no CUDA :))
ImportancePitiful795@reddit
At half price. Because DGX now is close to $5000 officially after last month price increase....
And also can do everything else because is x86 machine. Even run Windows
HomsarWasRight@reddit
Really? That’s cool. Any potential benefit to performance over ROCm?
ayu-ya@reddit
I'm saving for DGX atm and I was in pain seeing that price increase with my non US/western EU level earnings. But I want to do video gen on it, not only LLMs, and it looks like the best option for it among the smaller machines, because yea I also need it to be small and possible to stuff in a bag. Halo should be fine with it, from what I looked up, but from talking to people with better hardware knowledge than I have the consensus is 'if you want to have a good time, get the Spark'.
I'm also the most used to everything CUDA from my current GPoor PC, so there's that
DeliciousGorilla@reddit
I wonder if a laptop with a 16gb 4090 and 64gb ram would be better for image diffusion than a DGX spark? I see them selling for \~$2,000 usd.
SomeoneSimple@reddit
For image/video the 4090 mobile will be faster, having 128G shared memory doesn't really help you unless you want to run exotic models like Flux2.dev-32B or HunyuanImage-3.0 (at a pace of several minutes per img).
ChocomelP@reddit
Why DGX over Apple with higher amounts of unified memory? I don't know much about this hardware.
ayu-ya@reddit
Macs really aren't good for video gen at the moment, I was told (and looked at others' results) a video that would take few short minutes on a DGX could take even around an hour on a Mac. For just LLMs I'd probably aim for Mac Studio
ChocomelP@reddit
Do I understand it correctly that NVIDIA is faster at the same model size, but the cheaper unified memory lets you run much bigger models at the same price on Apple silicon?
redpandafire@reddit
Good luck, friend. I’m also trying to save up for the dgx for the same purpose. May we both get a sudden financial windfall.
ImportancePitiful795@reddit
Tbh trying to find any store doing 6 months finance for it as I do not want to give upfront all the money tbh. 😁
ImportancePitiful795@reddit
Well, I have a 395 and still want a DGX to put my hands on it, but raising the price just annoyed me.
JohnSane@reddit
The days of cuda dominance won't last forever.
i_am__not_a_robot@reddit
This CUDA dominance that we're seeing today is entirely due to Nvidia's competitors' historical negligence and complete failure to provide even the most basic developer support needed to make their architectures viable for GPGPU computation. And this goes way back to the early GPGPU days of the mid-to-late 2000s when it was CUDA vs OpenCL.
SkyFeistyLlama8@reddit
OpenCL is as good as dead and so is Vulkan, at least for ML work. It hurts to say this but CUDA is the ten ton elephant in the room squashing everyone else flat.
Altruistic_Heat_9531@reddit
"The best product nvidia has ever made is its GPU, second best product is their CUDA"
i_am__not_a_robot@reddit
You could even turn this around. There were plenty of times in the past when AMD's GPU offerings (in terms of raw compute-to-price ratio) were superior. I specifically remember the FirePro W9100/S9150/S9170 vs. Quadro K6000, Tesla K40/K80 around 2014-15, but the developer support for anything non-gaming (and, from what I heard, even gaming, but that's not my area of expertise) sucked so bad, we still went with Nvidia for prototyping, and guess what, once viability was established on CUDA, nobody bothered to port it, we just deployed on Nvidia GPUs.
SomeoneSimple@reddit
I remember this, you'd get laughed out of the Linux forum's with an AMD GPU. Early 2010's, for gaming, the closed source 'fglrx' driver was absolutely terrible compared to the closed source NV drivers, they were in an even worse position than the CUDA vs ROCM/Vulkan situation right now.
The open source 'radeon' driver didn't even support 3d.
Altruistic_Heat_9531@reddit
You dont have to mentioned past, MI350X on paper (before B300) is basically more capable B200. MI300X is also a 192 G monster, this is on 2023 mind you.
I worked on Mi200 personally, and installing anything is ultra pain in the neck.
MDSExpro@reddit
You couldn't be more wrong. OpenCL is constantly growing, Khronos provides nice yearly snapshots. It just grows in professional space, so average reddit cannot see that and repeats nonsense.
SkyFeistyLlama8@reddit
Is OpenCL being used for machine learning applications? I doubt it.
MDSExpro@reddit
Aurora supercomputer runs ML workload via OpenCL (wrapped in Intel's framework, but still) to name one.
SkyFeistyLlama8@reddit
Come on, we're talking about home llamaistas here, not HPC.
ImportancePitiful795@reddit
ROCm, MLX. Which both are supported by AMD. (MLX - AMD is right now on Beta)
OverclockingUnicorn@reddit
Ten ton elephant in a room full of ants.
AMD and intel are getting close to Nvidia for inference if you are a competent user, but they are next to useless for any serious training workloads
conockrad@reddit
Just to remind this “Nvidia competitor” originally was “Intel competitor”
i_am__not_a_robot@reddit
Technically, this failure is on ATI, not AMD, since AMD's acquisition of ATI happened just around the time CUDA was first released. So, no, this is not entirely accurate.
madaerodog@reddit
For sure but 20 years of Nvidia supporting the community and offering real performance wont disappear over night either, especially since it continues.
JohnSane@reddit
NVIDIA supporting the community... Sorry but that gets a big fat ROFL from me.
RoomyRoots@reddit
Nvidia is infamous on the FOSS community, that is a fact. As a Linux user I still went with AMD because I can't trust Nvidia. Do I regret the decision? Kinda, their GPUs were significantly cheaper and at least since ROCm it has been much easier to use them. But CUDA support isn't comparable and won't be for years and years.
CheatCodesOfLife@reddit
AMD MI50 (Vega20) -> Released 2019 -> EOL 2025 (no Rocm7 support without dodgy hacks)
Nvidia GTX 750ti (Maxwell) -> Released 2014 -> EOL 2025 (Cuda 12.x still works)
So Nvidia supported a cheap gaming / HTPC card for 11 years, while AMD barely make it to 5 years for high end data center card.
I'm happy to buy the cheap landfill AMD cards (MI5), but why would I risk real money/effort on something serious when they can't be bothered to support their cards for long?
-main@reddit
The community here is ML researchers, not localllama users.
MoistRecognition69@reddit
True
But the day they won't be dominating isn't here yet :|
That's like saying that buying a PS1 is a good choice because the PS5 will be awesome a few years down the line
ANR2ME@reddit
Can't it use Zluda if CUDA really needed? 🤔
protestor@reddit
rocm doesn't support strix halo? that's stupid
Prof_ChaosGeography@reddit
For the vast majority of users no cuda is fine and doesn't make a difference by the. Vulkan works just fine for infrence. There is Rocm amds cuda compatibility layer for those who need cuda. rocm is great for 99% of things h
deseven@reddit
No, Vulkan is typically 5-30% slower, also not everything works on Vulkan.
No, you still can't run a lot of things easily or at all, or without major speed penalties.
No, ROCm is still notoriously unstable, people often pick Vulkan over ROCm to trade speed for stability.
Due-Memory-6957@reddit
More things work in Vulkan than in anything else.
deseven@reddit
If we're focusing on LLMs specifically - yes, true.
CatalyticDragon@reddit
It has a clone of CUDA called HIP. Which is a large part of why everything runs on it because of you can write for one you can write for both.
Mollan8686@reddit
Hence it’s better…?
Slasher1738@reddit
Better have a better NIC than 10G rj-45
dragoon7201@reddit
Dear Lord! Does this mean the abacus will be obsolete in the future?
MoneyPowerNexis@reddit
This is a slide rule moment
Quartich@reddit
I keep an abacus at both my work and home desks
xXprayerwarrior69Xx@reddit
I am a bit worried they trying to make fingers redundant. Is it the next thing ai will take from us ?
OkFly3388@reddit
I wish they revive their idea about ssd attached to gpu.
Honestly, if you have like 1tb storage for model weights, and all other ram just for context, it will be so overpowered.
am2549@reddit
Yeah but that’s slow. You need ram because it’s fast and leads to high t/s
OkFly3388@reddit
its something like top tier ssd 10x slower than ram and 100x slower than gpu ram.
so if you place like 10x 128gb ssd chips, you can have same bandwidth as ram and it should work with same speed as this "unified" memory, no ?
holchansg@reddit
Memory controllers doesn't work that way.
CPU's have like 128bit or 256bit's memory controllers, the good ones, and this is a physical barrier, and can be in many configurations, quad channel, octa channel, but the barrier is physical, it also has a limit of how fast it can work, meaning how fast of a RAM it does accept, you can't push 1000GB/s on a Strix Halo by adding faster memory, or more channels...
Same goes for GPUs, it has a memory controller that goes up to X and you can do anything about it, be it DDR, LPDDR, GDDR or SSD chips.
Also PCIe 5 x16 for example is slow, slower than a dual channel LPDDR5x.
So the inference must be local, either on the CPU memory(and yes, you can have faster memories by adding LPDDR's, having more channels, and soldering the memory chip on top of the CPU, reducing latency and bumping signal integrity, but at the end of the line the chip memory controller is the ultimate barrier).
SSD's in PCI-e will always be slower than RAM, no matter how much you add in "RAID" because the line they talk, PCI-E is slow.
OkFly3388@reddit
And I dont really see where all of your saying invalidate my point.
Like, on conceptual level nothing stops placing ssd chips near gpu chip and allocating some transistor budget to separate controller that managing that.
holchansg@reddit
Prohibitively expensive, prohibitively bigger, would make the entire setup slower, more prone to impedance problems and signal integrity.
am2549@reddit
I think the factor is 25-50x, but that includes the pipeline, not sure about the chip itself.
ServiceOver4447@reddit
AMD doesn't even a factory they outsource everything
FunkyMuse@reddit
so how will this be different than strix halo or dgx spark? no info i guess
DarkArtsMastery@reddit
you mean computing tokens at home? sounds illegal to me
triynizzles1@reddit
Lot of its not X its Y slop in this script…
AnotherAvery@reddit
What can they do? This product isn't a lot of things! ;-)
Euphoric_Emotion5397@reddit
ya, i'm holding my purchase of strix halo .. because i'm sure next half of the year will bring new and better products that will really be good enough for next phase of AI.
TerryTheAwesomeKitty@reddit
My gosh! A Mashine that does computations for you? What's next, they make a personal version of the computer??? Madness!!!
Southern_Sun_2106@reddit
They 'solve problems' for you too, the man said. Revolutionary!
gitsad@reddit
they will do next product called "smartphone"
sk1kn1ght@reddit
I hear that if they continue they might be able to put it into your wrist as well. Some lunatics even say they can make a frame you put on your nose... Blasphemers!
seppe0815@reddit
no cuda no fun bye bye amd
toolsofpwnage@reddit
Here me out, what if we use this "computer" to compute interactive motion pictures
putrasherni@reddit
i think 256 or 512GB ram strix halo variants will be nice
_wOvAN_@reddit
1 Tb unified DDR5 RAM with x16 bus, would be nice
ImportancePitiful795@reddit
I do not get all this bruhaha. Is just a AMD 395 miniPC. 🤔
Except if they are using the new 495 which basically is the same as the 395 but with 8533Mhz RAM. And would be interesting to see the BIOS settings, because all 395 miniPCs came with 8533Mhz downclocked to 8000Mhx. 🤔
TwoPlyDreams@reddit
100% Spark RAM with 50% Spark performance at 50% Spark cost would be very compelling.
ImportancePitiful795@reddit
Actually is a AMD 395. So 50% Spark cost at same performance 😂
patricious@reddit
This hardware means absolutely nothing without the software stack support. The usual AMD fumble.
ImportancePitiful795@reddit
Is basically an AMD 395. Nothing else. And has the software stack. Hell, right now there is a close beta by the AMD Lemonade team, using MLX on AMD 395.
Interesting_Key3421@reddit
just give us fast ram in more power efficient computers at a reasonable price
qado@reddit
r/LocalLLaMA, posted by AMD :-D
SrijSriv211@reddit
You mean Xbox?
Terminator857@reddit
Did he say there is going to be an announcement April 30th?
alfpacino2020@reddit
como? jajajajaja
https://i.redd.it/adehqb2yh2yg1.gif