AMD has invented something that lets you use AI at home! They call it a "computer"

[-]

autonomousdev_@reddit

Ran a local model last year. power bill went up like $40 and it was way slower than just using the api. neat idea but not really ready for real products. unless you got some super secret stuff going on cloud is way cheaper for now

[-]

CatalyticDragon@reddit

All computers are the same just like all cars are the same. It's a fine comparison if you don't know anything about computers.

Otherwise, AMD is saying products based on Strix Halo, which offer a relatively low cost/low power/lots of memory, are ideal for autonomous local AI agents.

Strix will happy generate out 10-20 tokens / second on a 27-35b model in less than 100 watts so I tend to agree. If their prices hadn't gone up due to RAM shortages I'd have more than one of them.

[-]

Howard_banister@reddit

Is This is for dense models or MOE? I don't think they are good to run dense.

[-]

ComplexType568@reddit

MoE. Dense models run MUCH MUCH slower. I've heard single-digit tg tps from Q3.6 27B on a Strix, though maybe it could be better if optimized.

[-]

MoffKalast@reddit

Yeah that's probably accurate, I'm seeing about 2tg for G4-31B on regular DDR5, so probably 4-5 on the Halo.

[-]

CatalyticDragon@reddit

People are getting 10+ t/s on Qwen 3.6 27B dense and twice that with 35b MOE.

[-]

Hefty_Acanthaceae348@reddit

How come only twice the speed? I would've expected roughly a 9x increase given the active parameters

[-]

kaeptnphlop@reddit

I get consistent 40-50tks/s with Q6 and spec-mod decoding in coding tasks.

[-]

HomsarWasRight@reddit

I have not been able to get spec decoding working with Qwen models on my Strix Halo machine. Can I inquire about your software stack?

[-]

Look_0ver_There@reddit

With DFlash enabled variants of vLLM, Strix Halo's are now running at 18-20t/s with 27B, and around 60t/s with 35B, although this is admittedly a very recent thing (like in the last 4 days)

[-]

pulse77@reddit

AMD Strix Halo has 215 GB/s memory bandwidth. 27B model in 8-bit quantization requires \~27GB/token => 215 GB/s divided by 27GB/token gives you \~8 tokens/second. With 4-bit quantization this will be doubled to \~16 tokens/second.

[-]

1ncehost@reddit

I get 31 tok/s with a 229B model on mine. Thats minimax m2.7, which is also probably the smartest model for strix halo.

[-]

woct0rdho@reddit

20 t/s is slow by today's standard. I'm getting > 50 t/s for Qwen 3.6 35B-A3B Q6 with fairly long context.

[-]

ArtifartX@reddit

I think maybe you missed the joke OP was making, but disregarding that the point the joke makes still kind of stands - there have been low cost per power, low cost per performance 'computer' options out there as well (both prebuilt or DIY build your own) before this.

I'm all for this product announcement btw, just finding it even more hilarious that the top comment is someone who got offended and is lashing out about OP's knowledge of computers. You sound more like someone who take offense to anything remotely negative said about AMD than someone who knows things about computers.

[-]

turtleWatcher18@reddit

Yeah the ram prices really killed the utility of these unless you desperately need a large model

[-]

taking_bullet@reddit

Dear Lisa Su, I don't care about your Agent Computers. Give me RX 9080 XT with 24GB VRAM. Thanks in advance.

[-]

suprjami@reddit

You accidentally spelled 48GB VRAM wrong.

[-]

AdOne8437@reddit

you spelled 512GB vram wrong

[-]

suprjami@reddit

This card would cost more than the GDP of a small country 😅

[-]

AdOne8437@reddit

Really? That is heaper than expected.

[-]

EndlessZone123@reddit

You mean the AMD Radeon AI PRO R9800 48GB?

[-]

Etroarl55@reddit

All the VRAM in the world but inference speed is still an Rtx 5060.

The r9700 is an gpu I seriously considered, until the blower fan complaints and slow inference speed made it clear it’s not mature yet for use.

[-]

Look_0ver_There@reddit

The fans that the Powercolor and Sapphire use absolutely terrible. The fans on the XFX and AsRock branded cards are actually quite reasonable and don't have the annoying intrusive whine that the PowerColor and Sapphire based models have.

I've tried, and returned, the PC and Sapphire models. The XFX was great (noise wise). The AsRock was even a bit better.

[-]

Eyelbee@reddit

I prefer a unified memory chip configurable up to 512gb+. Why do you need a 9080 XT? You can find all kinds of 24GB cards everywhere anyway.

[-]

hainesk@reddit

Or a cheaper 96GB VRAM RTX 6000 Pro competitor.

[-]

ImportancePitiful795@reddit

Well at this point can get 3 or 4 R9700s for the cost of a single RTX5090.

And 128GB VRAM at 1/3 cost of a RTX6000 96GB. 🤔

Before someone says about electricity... Even if you run the extra 2 R9700s (4 R9700 system) for 5 years 24/7 won't pay the extra money the RTX6000 costs.

[-]

Rude_Ambassador_6270@reddit

It's not about electricity but rather having to deal with overhead of running 4 GPUs

I have strix halo machine and can extend it with an RTX6000 via USB4 and an external rig+psu. Or I need an entire machine (and expensive one) or other complex setup for running 4 GPUs or more. Also AMD support is still crap while novidya is being just literally "plug'n'play" for most cases. CUDA supports even ancient GPUs whie ROCm, to my knowledge, doesn't even support the current line in full, just the few selected models, not to mention dropping older GPUs completely. "AMD - getting your GPGPU customers fucked since Fury X."

[-]

ImportancePitiful795@reddit

It gives you R9700 32GB which is cheaper than the RTX5080 16GB and 1/3 to 1/4 (model depending) the price of the RTX5090 32GB.

What more you want?

Can get 3-4 R9700s (128GB VRAM) for as much a single 5090 these days. And 1/3 price that of RTX6000

[-]

taking_bullet@reddit

It gives you R9700 32GB which is cheaper than the RTX5080

I have no idea where you found such cheap R9700. It costs literally 300€ more than a RTX 5080 at every retailer.

[-]

ImportancePitiful795@reddit

UK

Powercolor R9700 is £1200 and the good 5080s start at £1200. Except if you want to buy Gainward for £1100. While cheapest 5090 is £3000

In most European countries R9700 is €1200-€1400 with the good 5080s starting at €1300. While cheapest 5090 starts €3900. So effectively 3 R9700s.

[-]

madaerodog@reddit

DGX spark with no CUDA :))

[-]

ImportancePitiful795@reddit

At half price. Because DGX now is close to $5000 officially after last month price increase....

And also can do everything else because is x86 machine. Even run Windows

[-]

HomsarWasRight@reddit

Also with MLX support soon. (is in beta right now).

Really? That’s cool. Any potential benefit to performance over ROCm?

[-]

ayu-ya@reddit

I'm saving for DGX atm and I was in pain seeing that price increase with my non US/western EU level earnings. But I want to do video gen on it, not only LLMs, and it looks like the best option for it among the smaller machines, because yea I also need it to be small and possible to stuff in a bag. Halo should be fine with it, from what I looked up, but from talking to people with better hardware knowledge than I have the consensus is 'if you want to have a good time, get the Spark'.

I'm also the most used to everything CUDA from my current GPoor PC, so there's that

[-]

DeliciousGorilla@reddit

I wonder if a laptop with a 16gb 4090 and 64gb ram would be better for image diffusion than a DGX spark? I see them selling for \~$2,000 usd.

[-]

SomeoneSimple@reddit

I wonder if a laptop with a 16gb 4090 and 64gb ram would be better for image/video

For image/video the 4090 mobile will be faster, having 128G shared memory doesn't really help you unless you want to run exotic models like Flux2.dev-32B or HunyuanImage-3.0 (at a pace of several minutes per img).

[-]

ChocomelP@reddit

Why DGX over Apple with higher amounts of unified memory? I don't know much about this hardware.

[-]

ayu-ya@reddit

Macs really aren't good for video gen at the moment, I was told (and looked at others' results) a video that would take few short minutes on a DGX could take even around an hour on a Mac. For just LLMs I'd probably aim for Mac Studio

[-]

ChocomelP@reddit

Do I understand it correctly that NVIDIA is faster at the same model size, but the cheaper unified memory lets you run much bigger models at the same price on Apple silicon?

[-]

redpandafire@reddit

Good luck, friend. I’m also trying to save up for the dgx for the same purpose. May we both get a sudden financial windfall.

[-]

ImportancePitiful795@reddit

Tbh trying to find any store doing 6 months finance for it as I do not want to give upfront all the money tbh. 😁

[-]

ImportancePitiful795@reddit

Well, I have a 395 and still want a DGX to put my hands on it, but raising the price just annoyed me.

[-]

JohnSane@reddit

The days of cuda dominance won't last forever.

[-]

i_am__not_a_robot@reddit

This CUDA dominance that we're seeing today is entirely due to Nvidia's competitors' historical negligence and complete failure to provide even the most basic developer support needed to make their architectures viable for GPGPU computation. And this goes way back to the early GPGPU days of the mid-to-late 2000s when it was CUDA vs OpenCL.

[-]

SkyFeistyLlama8@reddit

OpenCL is as good as dead and so is Vulkan, at least for ML work. It hurts to say this but CUDA is the ten ton elephant in the room squashing everyone else flat.

[-]

Altruistic_Heat_9531@reddit

"The best product nvidia has ever made is its GPU, second best product is their CUDA"

[-]

i_am__not_a_robot@reddit

You could even turn this around. There were plenty of times in the past when AMD's GPU offerings (in terms of raw compute-to-price ratio) were superior. I specifically remember the FirePro W9100/S9150/S9170 vs. Quadro K6000, Tesla K40/K80 around 2014-15, but the developer support for anything non-gaming (and, from what I heard, even gaming, but that's not my area of expertise) sucked so bad, we still went with Nvidia for prototyping, and guess what, once viability was established on CUDA, nobody bothered to port it, we just deployed on Nvidia GPUs.

[-]

SomeoneSimple@reddit

the developer support for anything non-gaming (and, from what I heard, even gaming, but that's not my area of expertise) sucked so bad, we still went with Nvidia

I remember this, you'd get laughed out of the Linux forum's with an AMD GPU. Early 2010's, for gaming, the closed source 'fglrx' driver was absolutely terrible compared to the closed source NV drivers, they were in an even worse position than the CUDA vs ROCM/Vulkan situation right now.

The open source 'radeon' driver didn't even support 3d.

[-]

Altruistic_Heat_9531@reddit

You dont have to mentioned past, MI350X on paper (before B300) is basically more capable B200. MI300X is also a 192 G monster, this is on 2023 mind you.

I worked on Mi200 personally, and installing anything is ultra pain in the neck.

[-]

MDSExpro@reddit

You couldn't be more wrong. OpenCL is constantly growing, Khronos provides nice yearly snapshots. It just grows in professional space, so average reddit cannot see that and repeats nonsense.

[-]

SkyFeistyLlama8@reddit

Is OpenCL being used for machine learning applications? I doubt it.

[-]

MDSExpro@reddit

Aurora supercomputer runs ML workload via OpenCL (wrapped in Intel's framework, but still) to name one.

[-]

SkyFeistyLlama8@reddit

Come on, we're talking about home llamaistas here, not HPC.

[-]

ImportancePitiful795@reddit

ROCm, MLX. Which both are supported by AMD. (MLX - AMD is right now on Beta)

[-]

OverclockingUnicorn@reddit

Ten ton elephant in a room full of ants.

AMD and intel are getting close to Nvidia for inference if you are a competent user, but they are next to useless for any serious training workloads

[-]

conockrad@reddit

Just to remind this “Nvidia competitor” originally was “Intel competitor”

[-]

i_am__not_a_robot@reddit

Technically, this failure is on ATI, not AMD, since AMD's acquisition of ATI happened just around the time CUDA was first released. So, no, this is not entirely accurate.

[-]

madaerodog@reddit

For sure but 20 years of Nvidia supporting the community and offering real performance wont disappear over night either, especially since it continues.

[-]

JohnSane@reddit

NVIDIA supporting the community... Sorry but that gets a big fat ROFL from me.

[-]

RoomyRoots@reddit

Nvidia is infamous on the FOSS community, that is a fact. As a Linux user I still went with AMD because I can't trust Nvidia. Do I regret the decision? Kinda, their GPUs were significantly cheaper and at least since ROCm it has been much easier to use them. But CUDA support isn't comparable and won't be for years and years.

[-]

CheatCodesOfLife@reddit

AMD MI50 (Vega20) -> Released 2019 -> EOL 2025 (no Rocm7 support without dodgy hacks)

Nvidia GTX 750ti (Maxwell) -> Released 2014 -> EOL 2025 (Cuda 12.x still works)

So Nvidia supported a cheap gaming / HTPC card for 11 years, while AMD barely make it to 5 years for high end data center card.

I'm happy to buy the cheap landfill AMD cards (MI5), but why would I risk real money/effort on something serious when they can't be bothered to support their cards for long?

[-]

-main@reddit

The community here is ML researchers, not localllama users.

[-]

MoistRecognition69@reddit

True

But the day they won't be dominating isn't here yet :|

That's like saying that buying a PS1 is a good choice because the PS5 will be awesome a few years down the line

[-]

ANR2ME@reddit

Can't it use Zluda if CUDA really needed? 🤔

[-]

protestor@reddit

rocm doesn't support strix halo? that's stupid

[-]

Prof_ChaosGeography@reddit

For the vast majority of users no cuda is fine and doesn't make a difference by the. Vulkan works just fine for infrence. There is Rocm amds cuda compatibility layer for those who need cuda. rocm is great for 99% of things h

[-]

deseven@reddit

No, Vulkan is typically 5-30% slower, also not everything works on Vulkan.

No, you still can't run a lot of things easily or at all, or without major speed penalties.

No, ROCm is still notoriously unstable, people often pick Vulkan over ROCm to trade speed for stability.

[-]

Due-Memory-6957@reddit

More things work in Vulkan than in anything else.

[-]

deseven@reddit

If we're focusing on LLMs specifically - yes, true.

[-]

CatalyticDragon@reddit

It has a clone of CUDA called HIP. Which is a large part of why everything runs on it because of you can write for one you can write for both.

[-]

Mollan8686@reddit

Hence it’s better…?

[-]

Slasher1738@reddit

Better have a better NIC than 10G rj-45

[-]

dragoon7201@reddit

Dear Lord! Does this mean the abacus will be obsolete in the future?

[-]

MoneyPowerNexis@reddit

This is a slide rule moment

[-]

Quartich@reddit

I keep an abacus at both my work and home desks

[-]

xXprayerwarrior69Xx@reddit

I am a bit worried they trying to make fingers redundant. Is it the next thing ai will take from us ?

[-]

OkFly3388@reddit

I wish they revive their idea about ssd attached to gpu.

Honestly, if you have like 1tb storage for model weights, and all other ram just for context, it will be so overpowered.

[-]

am2549@reddit

Yeah but that’s slow. You need ram because it’s fast and leads to high t/s

[-]

OkFly3388@reddit

its something like top tier ssd 10x slower than ram and 100x slower than gpu ram.
so if you place like 10x 128gb ssd chips, you can have same bandwidth as ram and it should work with same speed as this "unified" memory, no ?

[-]

holchansg@reddit

Memory controllers doesn't work that way.

CPU's have like 128bit or 256bit's memory controllers, the good ones, and this is a physical barrier, and can be in many configurations, quad channel, octa channel, but the barrier is physical, it also has a limit of how fast it can work, meaning how fast of a RAM it does accept, you can't push 1000GB/s on a Strix Halo by adding faster memory, or more channels...

Same goes for GPUs, it has a memory controller that goes up to X and you can do anything about it, be it DDR, LPDDR, GDDR or SSD chips.

Also PCIe 5 x16 for example is slow, slower than a dual channel LPDDR5x.

So the inference must be local, either on the CPU memory(and yes, you can have faster memories by adding LPDDR's, having more channels, and soldering the memory chip on top of the CPU, reducing latency and bumping signal integrity, but at the end of the line the chip memory controller is the ultimate barrier).

SSD's in PCI-e will always be slower than RAM, no matter how much you add in "RAID" because the line they talk, PCI-E is slow.

[-]

OkFly3388@reddit

And I dont really see where all of your saying invalidate my point.

Like, on conceptual level nothing stops placing ssd chips near gpu chip and allocating some transistor budget to separate controller that managing that.

[-]