AMD Radeon RX 6900 XT - ROCm vs Vulkan - Gemma 4 and Qwen 3.5 speed benchmarks
Posted by grumd@reddit | LocalLLaMA | View on Reddit | 23 comments
Did some quick tests after building llama.cpp with ROCm 6.4.2 and latest Vulkan for my 6900 XT
gemma4 E2B Q4_K
| ubatch | ROCm pp512 | Vulkan pp512 | ROCm tg128 | Vulkan tg128 |
|---|---|---|---|---|
| 32 | 1536.60 | 1423.49 | 151.92 | 174.59 |
| 64 | 1590.65 | 1930.60 | 151.41 | 173.76 |
| 128 | 2651.11 | 2998.42 | 151.53 | 173.71 |
| 256 | 3653.19 | 3233.44 | 151.45 | 173.45 |
| 512 | 3807.60 | 3950.71 | 151.47 | 173.67 |
| 1024 | 3806.77 | 3948.27 | 151.49 | 173.35 |
qwen35 4B Q8_0
| ubatch | ROCm pp512 | Vulkan pp512 | ROCm tg128 | Vulkan tg128 |
|---|---|---|---|---|
| 32 | 1368.32 | 706.18 | 77.57 | 88.58 |
| 64 | 1841.68 | 1323.46 | 77.65 | 88.57 |
| 128 | 2577.95 | 1672.51 | 77.97 | 88.46 |
| 256 | 2984.38 | 2244.62 | 77.72 | 88.50 |
| 512 | 3023.75 | 2390.09 | 77.81 | 88.57 |
| 1024 | 3019.70 | 2386.97 | 77.60 | 88.53 |
FullstackSensei@reddit
Why are you still using ROCm 6? 7 has been out for a while and should bring a good performance uplift.
grumd@reddit (OP)
7.1 doesn't recognize my GPU
FunkyMuse@reddit
Have you tried 7.2.2
grumd@reddit (OP)
Nope, don't really intend to, I'm running this thing as an RPC server, and my pp is ~400 anyway due to ethernet overhead
FullstackSensei@reddit
Why are you running it over RPC? And why don't you try updating your graphics driver?
grumd@reddit (OP)
I have two PCs, one with a 5080 and one with a 6900XT. The latter is an RPC server and the former is where I'm running my models.
I'm also using the latest gpu drivers obviously
FullstackSensei@reddit
Why are you doing this though? That's like the worst possible way to run two GPUs, and arguably the least economic. A single PCIe lane will be much faster and have way less issues. Beyond chocking the card over 1gb ethernet, llama.cpp RPC is far from optimized, and in fact disables a ton of the optimizations you'd have if running both GPUs in the same machine. And you can run both on the same machine. You just need to build llama.cpp from source with with GGML_BACKEND_DL and both CUDA and ROCm backends enabled.
grumd@reddit (OP)
Because it's my and my wife's gaming PCs. I'm not building a datacenter here.
Jatilq@reddit
Test both in lmstudio because it has both runtimes.
grumd@reddit (OP)
No. I'm compiling these myself
Jatilq@reddit
I understand that. I’m saying test to help nail down the problem.
grumd@reddit (OP)
And what's the problem?
Jatilq@reddit
Omg! I just realized I responded to the wrong thread. I’m sitting in a hospital gown about to go into surgery. I must be loopy. I’m sorry.
grumd@reddit (OP)
I hope the surgery goes well!
Jatilq@reddit
Thank you. Your watch ever say you have Afib; tell your doctor right away.
grumd@reddit (OP)
My watch is mechanical sadly haha, but with my WPW syndrome I should probably do an ECG once in a while
spaceman_@reddit
You should also test at non-zero context depths. Since a few months ago, Vulkan PP speeds typically decline way less on larger prompts / context sizes.
Vulkan also seems to do better with "weird" quantizations like Q5/Q6 vs ROCm in my experience.
grumd@reddit (OP)
Yeah that's true, I only built these binaries to use this machine as RPC server so didn't bother with long depth, just did some quick tests
RoomyRoots@reddit
Have you tried the preview builds of ROCm? I am getting better results with ROCm than Vulkan now.
grumd@reddit (OP)
ROCm 7.1 didn't even recognize the GPU, this is the latest build that worked
taking_bullet@reddit
I believe in Vulkan supremacy 👌
ps5cfw@reddit
This Is useless!
MikeLPU@reddit
Like your comment