AMD R9700: yea or nay?
Posted by regional_chumpion@reddit | LocalLLaMA | View on Reddit | 61 comments
RDNA4, 32GB VRAM, decent bandwidth. Is rocm an option for local inference with mid-sized models or Q4 quantizations?
| Item | Price |
|---|---|
| ASRock Creator Radeon AI Pro R9700 R9700 CT 32GB 256-bit GDDR6 PCI Express 5.0 x16 Graphics Card | $1,299.99 |
ForsookComparison@reddit
The w6800 Pro has chilled on used markets for about this price for over a year now.
This is that but probably with some better Prompt Processing and a hair faster inference.
If you didn't get excited by or come across the w6800, then you don't have to put much thought into the R9700 unless prompt processing was your only big stopper, yet is still not a huge requirement(?).
AeroelasticCowboy@reddit
for some use cases like self hosted voice pipeline to replace google home / alexa devices in a smart home system prompt processing speed is all that matters, even the simplest ask requires all the entity state info for all your home devices to be sent in the prompt to provide context, so often something like "turn on the basement lights" is a 6,000-9,000 token prompt followed by maybe 150 token response and speed matters because you don't want users waiting 5 seconds for a response.
ForsookComparison@reddit
It's been so many months that I've adopted a similar opinion since the launch of the R9700. Nearly 2x prompt processing (sometimes more..?) is significant
_WaterBear@reddit
Got the card. Asrock version. It is the most affordable way to get 32gb VRAM with solid performance. ROCm is def not as user friendly as CUDA and is lacking in features - but AMD is rapidly catching up (seriously, the past 6 months has narrowed the gap in performance and compatibility drastically). If you only need inference, are comfortable working in linux, and dont want to shell out $3k for an NVIDIA option, then this is a good choice.
It might also be a good choice if you want an affordable solution but dont want to risk the market becoming even worse in the near-term.
Only issue i had with the card is a serious fan rattling/grinding issue when under heavy sustained load (not to be mistaken for coil whine).
I just swapped out for a new unit, so fingers crossed this isnt an ASRock design issue…
CyclonusDecept@reddit
How is it for gaming ?
sascharobi@reddit
Suboptimal. The blower [and price] will be a dealbreaker for many.
_Cyclimse_@reddit
Sorry to comment on a 2months old comment but a friend is gonna gift me the same GPU
I've been trying to check compatibilty vith other components (mainly motherboards) but this card doesn't seem to be listed anywhere :/
I'll mostly use it for gaming/video editing and was wondering if you any lead on a compatible MB? I know it might be a little overkill but I guess i'll have no issue running stuff like CP2077 in QHD right?
Thank you for your time have a great day!
(Might be a good time to start learning about LLM as well, this community seems pretty chill :) )
Zeikos@reddit
I have been litteraly waiting for it to hit the DIY market for months.
It'll take a while more to become available in the EU, hopefully it won't get scalped to oblivion.
sascharobi@reddit
Did you buy one?
Zeikos@reddit
Two actually :>
Coincidentally I installed them this weekend
sascharobi@reddit
Did you get the ASRock ones? How happy are you with them?
Zeikos@reddit
I got the Sapphire ones.
Too early to tell, I just plugged them in, currently they're working fine.
But I only played minecraft since getting them, I haven't put them under load.
I am on Linux Mint by the way.
This coming weekend I'll get a container set up with RoCm and give them a more rigorous spin.
sascharobi@reddit
Any particular reason you got the Sapphire, or just price or availability?
Zeikos@reddit
Availability, it was what I could get, that's all.
sascharobi@reddit
Yeah, same here. First I ordered the cheapest one here, a PowerColor. After I had paid, it was suddenly out of stock. Though, I suspect what they really meant but didn't want to say is that it was out of stock for the price I ordered it for. Then I ordered the next cheapest one, the ASRock one, somewhere else one week later. After I had paid, they told me as well they suddenly have no stock anymore. They could only source me a Sapphire for the same price. Well, online the price for the Sapphire one had already climbed up, so I just took it.
RottenPingu1@reddit
Glad I found your post. Have you had any issues running different partner cards together?
I have a Sapphire on order but its hard to find another one. Im looking at ASRock.
Thoughts?
Zeikos@reddit
I grabbed two basically the hour the offer went up, got it at 50 euros above MSPR.
The order got delayer for like a month before shipping, they insistently offered me a refund, I politely answered that it was okay for me to wait and eventually it arrived.
I expected it anyways.
Creative-Struggle603@reddit
It is already available in EU area (low stocks). More brands are incoming this month.
Ssjultrainstnict@reddit
Captured some of benchmarks on my thread https://www.reddit.com/r/LocalLLaMA/comments/1on4h8q/amd_ai_pro_r9700_is_great_for_inference_with/
sascharobi@reddit
> long term support
I hope AMD doesn't let us down once they have a new series of GPUs.
sascharobi@reddit
Did you get one?
Now it's $1,349.99.
regional_chumpion@reddit (OP)
I did, for the original price from Newegg a couple of days after I posted, I think. Took me a while to test though, and now it’s back in the bag waiting for parts for an AMD Epyc build (the little AM5 Epyc, not one of the big boys).
sascharobi@reddit
How is the blower of the ASRock? Well, I suspect all R9700s are the same apart from the design.
regional_chumpion@reddit (OP)
It’s louder than my RTX Pro blowers (4000, single slot so maybe not directly comparable) if that reference helps, but I’m not sure if that’s because of its design or just because AMDs runs hotter than Nvidia gpus. I suspect the latter. It ramps up much earlier during inference and stays in high rpms for longer.
sascharobi@reddit
> It ramps up much earlier during inference and stays in high rpms for longer.
But it doesn come down during idle times?
regional_chumpion@reddit (OP)
It does but it takes its sweet time to slow down. I’m used to Nvidia stuff operating at lower power than gaming cards, it seems 9700 is more like a gaming card with a smaller cooler in it. The noise isn’t obnoxious, it just comes up sooner and stays for longer.
mustafar0111@reddit
Its a decent card but they have it priced almost $300 too high for what it is.
sascharobi@reddit
Yes, but in today's market, the pricing doesn't look that bad anymore.
Rich_Artist_8327@reddit
yes, 4x 7900 xtx 600€ each is ok
Rich_Repeat_22@reddit
However 7900XTX is much slower for LLM workloads without ECC
RnRau@reddit
Eh? The 7900XTX has higher mem bandwidth than the 9700.
Rich_Repeat_22@reddit
And? Doesn't mean is slower.
Also RDNA4 has a lot of enhancements when comes to matrix computations. Supports FP8, BF8 too with improved performance and R9700 comes with ECC VRAM.
R9700 is even 50% faster than the RTX3090 at dense FP16 matrix. Which is generally faster than the 7900XTX.
MixtureOfAmateurs@reddit
And... Llm inference is memory bound. Faster memory is faster inferance. There's a degree of compute bottlenecking, and also driver optimisation that the 9000 series would have an edge in but 644gb/s vs 960gb/s is too big a gap
Rich_Artist_8327@reddit
Yes and no. wide Memory bandwidth does not always mean faster inference. There are many factors
Rich_Repeat_22@reddit
Yet
M3U is slow even if having a lot of GB/s
5090 has 70% more bandwidth than 4090 yet is just avg 35% faster which is totally based on the +30% more cores and +15% higher clocks.
R9700 is faster than the RTX4500 Blackwell which again the latter has 2.5x more bandwidth.
Tell me why?????
RnRau@reddit
What benchmarks are you referencing? Do you have a link?
Rich_Artist_8327@reddit
does ECC speed up inference?
shing3232@reddit
No, but ECC is needed for production deployment.
Rich_Artist_8327@reddit
Is there r9700 inference benchmarks somewhere? sOme youtube videos have seen
sascharobi@reddit
Did you find some?
Long_comment_san@reddit
It's far too expensive. I expect 5000 super to release and then this thing goes to 1100$ max, optimally at 900$. It's about 5070-5070ti super level of performance with 8gb extra ram. But no cuda and no 4 bit precision. With a lot of driver shenanigans. For extra 300$. It's not an amazing deal. 900$ is where it becomes fair and 800$ is where it starts to undermine hypothetical 5000 super. But AMD charge a huge premium because there's no competition. Intel B60 dual with 48gb VRAM at 1600$ is exactly the same thing.
grabber4321@reddit
ya thats a not happening any time soon.
EvilPencil@reddit
I kinda disagree here. You're not discussing many of the features that separate the "professional" cards from "gamer" cards such as ECC and certified drivers. IMO the R9700 is more comparable to something like the RTX Pro 4000 Blackwell. On paper the R9700 punches well above the 4000, and with an extra 8gb memory, for less money.
Of course that also means stepping away from all the benefits of the CUDA ecosystem. From that lens I'd say it's fairly priced.
Long_comment_san@reddit
Well, we're looking at mainstream/enthusiast segment at this price range. Assume people like me are going to buy this for local usage, and there's giant demand for local runners. But this comment aged like milk with the news that super refresh goes away for a year. R9700 is totally competitive without supers to be a competition.
AppearanceHeavy6724@reddit
...and 650 GB/s bandwidth. For $300 extra.
NTFSynergy@reddit
I am confused why nobody talks about how hard it is to work with ROCm - getting it to run is one thing, getting it to run good is a whole another level.
The main priority of ROCm is MIxxx cards, PRO is consumer card (9070xt) on VRAM steroids - it still has problems almost 3/4 year after release, pytorch on RDNA4 is a performance rollercoaster, and Vulkan based llama has better performance than ROCm. From experience pytorch_tunable variable must be set to get decent performance, but that has its own caveats. GEMM kernels are still not a thing on RDNA4.
I own 9070xt since March and went all through the pain - before ROCm 6.4.4, using theRock, switching linux kernels... be aware that ROCm needs an older kernel (with HWE) than latest stable. And the documentation was so broken , full of contradictions and mistakes. It got better, but, for example, you still have to take a wild guess which version of pytorch wheels you need to install - is it the official stable, nightly, or the rocm fork on AMD repo (also there are two repos)? It is absolutely not "BFU" friendly.
jumpingcross@reddit
How feasible is it to just solely use Vulkan for anything involving an AMD GPU and avoid ROCm entirely?
teleprint-me@reddit
Depends. Its up to amd to work with khronos to release driver support. You can look it up, but its not accurate (unfortunately).
https://vulkan.gpuinfo.org
lly0571@reddit
Basically an AMD version of 4080S 32GB. Good if you need warranty and can solve possible software issues.
Baldur-Norddahl@reddit
Get a motherboard with PCIe 5 and 4x R9700. A consumer motherboard will only have x8 lanes for this, but that is probably ok since we are working with slower cards. Tensor parallel and we are looking at a combined memory bandwidth of 3600 GB/s and 128 GB VRAM for considerably cheaper than a RTX 6000 Pro (especially if you include the whole system).
Only_Situation_4713@reddit
It's slower than a 3090 and doesn't offer fp4. 3090 can emulate fp8 and it's almost twice as fast. Also less of a headache...
Terminator857@reddit
Faster than a 3090 for models that fit in 32gb of ram but not 24gb, such as popular qwen3 coder 30b at int8 / fp8.
PaulMaximumsetting@reddit
I’ll have to give a VLLM model a try next. GGUF models are usually a bit slower.
Qwen3-VL-32B-Instruct-UD-Q6_K_XL.gguf
KillerQF@reddit
3090 is 35TF fp16
R9700 is 97TF fp16
the latter can likely emulate fp4 or fp8 faster.
where the 3090 is better is bandwidth
936GB/s vs 645GB/s
one is new with 32GB the other is used.with 24GB
Tyme4Trouble@reddit
The 3090 is 142TF dense FP16 matrix.
KillerQF@reddit
Thanks for the correction
3090 - 142 tf
R9700 - 191 tf
b3081a@reddit
fp8 marlin kernels are way slower than native and is nowhere near its theoretical tensor performance. If all you want is single user decode performance (rather than batch decode/prefill) then 3090's bandwidth is much more favorable though.
Rich_Repeat_22@reddit
3090 doesn't offer FP4 nor FP8, needs emulator and the perf tanks doing so.
On R9700 FP8 and BF8 are fully supported, with improved perf.
FYI, FSR4 is FP8.
Don't confuse it with the 7900XTX and the rest of the RDNA3/3.5 lineup.
And here is the full list
v_wmma_f32_16x16x16_f16
v_wmma_f32_16x16x16_bf16
v_wmma_f16_16x16x16_f16
v_wmma_bf16_16x16x16_bf16
v_wmma_i32_16x16x16_iu8
v_wmma_i32_16x16x16_iu4
v_wmma_i32_16x16x32_iu4
v_wmma_f32_16x16x16_fp8_fp8
v_wmma_f32_16x16x16_fp8_bf8
v_wmma_f32_16x16x16_bf8_fp8
v_wmma_f32_16x16x16_bf8_bf8
v_swmmac_f32_16x16x32_f16
v_swmmac_f32_16x16x32_bf16
v_swmmac_f16_16x16x32_f16
v_swmmac_bf16_16x16x32_bf16
v_swmmac_i32_16x16x32_iu8
v_swmmac_i32_16x16x32_iu4
v_swmmac_i32_16x16x64_iu4
v_swmmac_f32_16x16x32_fp8_fp8
v_swmmac_f32_16x16x32_fp8_bf8
v_swmmac_f32_16x16x32_bf8_fp8
v_swmmac_f32_16x16x32_bf8_bf8.
Woof9000@reddit
3.6 Roentgen, not great, not terrible.
regional_chumpion@reddit (OP)
That’s 1000 chest X-rays though.
Repsol_Honda_PL@reddit
Good price but Low cores count, average performance.