non-nvidia gpus
Posted by Ok-Secret5233@reddit | LocalLLaMA | View on Reddit | 43 comments
Because I'm cheap, I'm seeing if non-nvidia gpus are worth the effort.
Here's the article that got me thinking: https://www.hardware-corner.net/huawei-atlas-300i-duo-96gb-llm-20250830/
Anybody want to add anything from experience?
floconildo@reddit
I'm also cheap, but the software maturity for these alternative GPUs pushed me back from buying one. If ROCm support is still somewhat wobbly for Strix Halo after a year, I can only imagine what it looks like for CANN. It has some of potential though, and chinese (esp. Huawei) usually catch up fast to developments.
If you just want raw power on non nvidia consumer-level hardware then Strix Halo, B70 or just a plain old Mac might be your best bet. Memory bandwidth is already an issue on the first two and the article you shared ain't exactly making up a good case for the 300i if you ask me.
Confident_Ideal_5385@reddit
ROCm support for gfx1100 (7900xtx etc) seems pretty solid with ROCm 6.2. I'm not regretting the purchases at all.
Vulkan on Linux TTM is another story entirely. A sad saga of betrayal by a memory manager that optimises compute memory allocations to avoid dropping frames in wayland. On Linux, stick to ROCm if you plan to use more than about 80% VRAM.
floconildo@reddit
ROCm support is not bad, but it can take a while for things to rollout to AMD devices (even more so if you're running an APU like me). But that's alright, Strix Halo is for my personal projects and I don't regret buying it at all, especially when I check my electricity bill haha.
Not to blame AMD engineers at all ofc. I fully understand that it's a hard game to catch up and honestly they've been doing a great job so far all things considered. Swimming against a sea of CUDA users must be tiring and I really hope other departments at AMD are doing their part to increase adoption.
ZCEyPFOYr0MWyHDQJZO4@reddit
If your goal is only to run consumer-level models then you probably shouldn't get a non-Nvidia GPU or Apple system. As an independent developer you can't replicate the work effort necessary to get hardware to work in the first place. There is no free lunch here.
Confident_Ideal_5385@reddit
As an independent developer, it's non-trivial but not at all impossible to get this stuff working, and even get your PRs merged.
Total yak shaving compared to, y'know, running LLMs tho.
Ok-Secret5233@reddit (OP)
Right, unfortunately I suspect that is going to be the bottom line for me.
ZCEyPFOYr0MWyHDQJZO4@reddit
Things used to be so much cheaper a year ago.
SSOMGDSJD@reddit
What is your use case?
That Huawei card is a dead end, needs a Chinese CPU and Mobo to function. https://youtu.be/qGe_fq68x-Q?si=71WWyt6NcFXVyTG9 see relevant gamers nexus video.
Cheapest GPU I would consider is a v100 16gb sxm2 ($100), sxm2 to PCIe adapter (50-100), and an arctic p8 max to cool it ($10). The v100 sxm2 32gb fits much better models but is $500 these days. Cheaper than that, an mi50 32gb can be found on Alibaba for around 400 as of last month.
Intel just isn't there price to performance wise, b70 is a grand and is worse than a v100 32gb for bandwidth and driver support for more money. Maybe improves, idk, Intel has a bad track record of supporting their promising tech though. If you're spending a grand, 3090 24gb or v100 32gb.
A770 16gb ($250) sounds interesting but it's more than a v100 16gb for worse bandwidth and more jank.
Mi50 16gb is like 150 on eBay, but you might as well cop the Nvidia support with the v100 16gb for a few more dollars.
Tldr just get a v100
Ok-Secret5233@reddit (OP)
Why are some V100 open and others closed? Looking at the results on ebay, the open ones look substantially cheaper. Is it just aesthetics or...?
SSOMGDSJD@reddit
Not sure what you mean? The skinny flat gray ones do not have a heat sink. You would need to get one to attach to the the GPU and the spring screws can be annoying. I would recommend getting one that looks like this. Heatsink attached, just need the sxm2 to PCIe adapter and probably a riser cable and bracket unless you turn your PC on its side so that it stands straight up out of the PCIe slot. It's a big chunky boi
Ok-Secret5233@reddit (OP)
I mean, what's the difference between these two?
https://www.ebay.co.uk/itm/167792617369
https://www.ebay.co.uk/itm/198270108538
SSOMGDSJD@reddit
The first one is sxm2. It uses a proprietary interface instead of PCIe, meant for datacenter servers.
The second is pcie native, note the gold teeth at the bottom. Plugs straight into a pcie slot. Easier, more expensive generally. I believe they still need you to supply cooling with a fan and a shroud.
For the first one, you'll need something like this: https://ebay.us/m/mcekFn which bridges the PCIe lanes from your motherboard to the sxm2 pins on the GPU. For that particular one you linked you would also need a sxm2 v100 heatsink and thermal paste, as well as a fan like an arctic p8 max (high static pressure) to cool it.
Ok-Secret5233@reddit (OP)
Thank you!
Hedede@reddit
The issue with V100 is that they have very high idle power. I left 4xV100 idling for a day, and in 24 hours they consumed almost 10kWh just from idling.
FullstackSensei@reddit
Here's a crazy idea: shut down the thing when not in use. Will beat even the most frugal idle power consumption.
It takes 5 minutes at most to startup and load a model. You can use waken on LAN or IPMI if your board has it to wake the system. Pair it with tailscale or VPN, and you can start it from anywhere on the planet.
SSOMGDSJD@reddit
Tailscale is so goated. Checking on Claude code (wrong sub to be mentioning this I know) on my local computer while I'm out and about is peak, except for mosh eating my terminal history lmao. Small grievances.
sekh60@reddit
Burn the heretic!
FullstackSensei@reddit
đ¤ˇđťââď¸
SSOMGDSJD@reddit
Fair point but that's like a dollar per day at avg us residential rates for 4 gpus
xandep@reddit
2x mi50 16gb w/ integrated cooling for 200 something in alibaba. Can run Qwen 3.5 35B, 27B and the new Gemmas. Or just one if you are ultra cheap, running 35B w/ ncmoe (some 27B and 26B quants if willing to quantize to Q3, IQ4 top).
ccbadd@reddit
I think I would get a 32GB V620 for about $400 and add a $25 cooler just so I would only need 1 slot and have a card that is still officially supported by ROCm.
International-Try467@reddit
AMD GPUs work with ROCm and Vulkan
nakedspirax@reddit
Yeah they work.
For certain use cases like image/video generation, NVIDIA wins by a mile + some
Fit-Produce420@reddit
In speed.Â
I generate image and video to the max length supported by wan, ltx, etc.Â
It just takes longer.
adeadfetus@reddit
Thatâs the same as saying CPU and RAM is just as good as GPU except they take longer.
Fit-Produce420@reddit
No, it's not.Â
You can't run rocM or Vulkan on your CPU+RAM.
You're completely ignoring how APUs work.Â
adeadfetus@reddit
You completely missed the point but ok.
RoomyRoots@reddit
ROCm got much better, like much, much better. Sure it was because it was laughable some years ago and there is still lots to grow but if you get a compatible card it's not hard to set things up.
fallingdowndizzyvr@reddit
You won't get better price to performance than a V340. 16GB of VRAM for $49. And now with TP in llama.cpp, you can TP both GPUs on that card.
LankyGuitar6528@reddit
Where are you finding one for $49?
fallingdowndizzyvr@reddit
Ebay. There are a couple of sellers that sell them for $49. Don't pay more. While they claim they are "used", the seal on the static bag was intact on mine. And there wasn't a speck of dust on it or even any wear on the fingers. So mine seemed new.
https://www.ebay.com/itm/306835007605
LankyGuitar6528@reddit
Thanks!
Several-Tax31@reddit
Yeah, seems incredibly cheap to me. You cannot get regular RAM for those prices, no?Â
jpedlow@reddit
And donât forget the intel b70 just got released
semangeIof@reddit
I'm still amazed this card sold out its initial wave on Newegg so quickly. Even though it's 32GB VRAM its low bandwidth and also a fairly slow chipset.
Intel cards also run inefficient on Vulkan as of now. SYCL is hardly mature. Some models run okay when Intel works direct with vendor (ex. Gemma 4 on vLLM) but you still get slower Tok/s compared to even a legacy card like an RTX 3090 because of a) super low power chipset, b) low memory bandwidth, and c) CUDA being so superior to Intel's ecosystem.
There is a reason people you can trip over B60s wherever you look and it is the same reason the B70s will not sell out again following their restock on April 24.
CelvestianNesy@reddit
Unfortunently, driver support is experimental and we have yet to wait until Intel adds more support, finicky stuff. Good VRAM but, yeah.
no-adz@reddit
1300 euros!
leonbollerup@reddit
wait a min.. you are cheap.. and want to play with AI ?!?..HAHAHHA...
Be like the rest of us.. be poor.. but with a shit-ton of cool hardware that we use to create picture w.. ...cats! ;)
overflow74@reddit
okay the ascend hardware is really nice but their software (cann toolkit) isnât really mature enough like cuda youâll find yourself struggling alot with errors and wasting time on fixing things that you wouldnât normally face with a normal nvidia gpu however if you want, you could try it out first on huaweiâs cloud and try the âmindspore framework â , they have like a clone from everything for the ascend hardware
overflow74@reddit
in addition, youâll have limitations to what you can run eg. vllm ascend supported models
also quantization/training is usually supported for specific set of cards (donât remember the list exactly) so heads up haha
Nexter92@reddit
You say "I am cheap", i you are, then you are a slave ? đ
Maybe "I am poor" no ?
666666thats6sixes@reddit
"I'm cheap" in this context means stingy, frugal, as in j'suis radin
Creepy-Bell-4527@reddit
No, he meant he is cheap. And often rich people are some of the cheapest bastards you'll ever know.