What is your build? (dual gpu)

Posted by Middle-Broccoli2702@reddit | LocalLLaMA | View on Reddit | 21 comments

Hi everyone,

I want to build a dedicated PC for Local LLM + agents, starting with one gpu, and possibly a second.

From what I have read, using consumer gpu's can be problematic due to the thickness of the gpu's and airflow.

I want to build on the AMD platform and build inside a case. I do not want to have an open rig. I have an EVGA FTW 3090 to start and do not want to make a mistake with my component selection.

How did you build yours?

It would be educational to see what people have done and which components they selected.

Thank you very much

[-]

Mountain_Patience231@reddit

i am with 2x 9070xt for my daily AI usage

[-]

ea_man@reddit

That only works on ROCm or on vulkan too?

Is the 9070xt stable (kernel panics) and well optimized with ROCm?

[-]

Mountain_Patience231@reddit

Using llama.cpp on Windows. The Tensile split for HIP seems broken (or is it just me?). I'm using the Vulkan backend instead, and I'm getting Qwen3.5 35B A3B Q4 with 4000 t/s max PP and 70-90 TG/s.

[-]

ea_man@reddit

Aye I'm asking because some time ago I disinstalled ROCm to run just vulkan, was wondering how it's improving...

[-]

Mountain_Patience231@reddit

Can you run 2x GPUs under ROCm? I always fail to do this.

[-]

deepspace_9@reddit

I have 3 amd gpu, try rebuild llama.cpp with this option -DGGML_CUDA_NO_PEER_COPY=ON however I prefer vulkan over rocm.

[-]

Mountain_Patience231@reddit

Vulkan was very slow in llama.cpp before because there was a bug that forced the data to be processed by the CPU. It's fixed now.

[-]

ea_man@reddit

Ohhh I'd love to be able to run an other cheap GPU like mine for 27B dense :')

[-]

Mountain_Patience231@reddit

yes, its very stable and optimized for llamacpp vulkan backend currently (for Qwen 3.5)

[-]

ea_man@reddit

Sorry can you clarify plz?

you mean that you can run 2x GPU with vulkan or "yes it works only with ROCm"?

BTW: I use vulkan too on RDNA2, here it's faster at small context lenght <30k but it bogs down with >100K, so it makes kinda sense with little VRAM. Also when your context spill outside of VRAM the LM keeps stable, I guess that ROCm has more of a tendency to give OOM problems.

[-]

Mountain_Patience231@reddit

2x gpu with vulkan works fine for me, i can load full ctx in total 32gb vram (16gb each)

here is my config for your reference:
"llama-cpp-qwen-3.5-35b-vision":

name: "llama-cpp-qwen-3.5-35b-vision"

cmd: |

"${llama-exec}" --model "D:\gguf\unsloth\Qwen3.5-35B-A3B-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_L.gguf" \

--mmproj "D:\gguf\unsloth\Qwen3.5-35B-A3B-GGUF\mmproj-F32.gguf" \

--port ${PORT} \

--jinja \

--fit on \

--fit-target 1000 \

--tensor-split 1,1.2 \

--cache-type-k q8_0 \

--cache-type-v q8_0 \

--no-mmap \

--mlock \

--flash-attn on \

--split-mode row \

--device Vulkan0,Vulkan1 \

--parallel 1

ttl: 300

[-]

ea_man@reddit

thank you very much, that's good news, a lot of people still reports that only ROCm was able to run multi GPU.

[-]

Middle-Broccoli2702@reddit (OP)

Thank you everyone for your contributions!

[-]

reto-wyss@reddit

Two cards are not usually a problem even if they are large, as long as your motherboard has x8/x8 support and proper spacing between the slots.

I believe the X870 Taichi Creator board is good for that kind of thing. There are a few other decent options for AM5, but the Asrock one is the cheapest.

Make sure your chassis has extra space below the bottom slot because the card may overhang by 1.5 or 2 slots.

[-]

toooskies@reddit

Yep, there’s usually 1-2 boards per vendor that have two PCIE 5x slots, and they share bus bandwidth. Taichi, Crosshair, AI Top, Godlike.

[-]

Mountain_Patience231@reddit

Yes, I used to have a motherboard with only PCIe 16x and 4x slots. Once I changed it to the X870 Taichi, the upgrade was massive.

[-]

FishChillylly@reddit

i used to have a dual gpu setup and ended up with just have the beefy one stayed. it was a setup of a 4090 48G unofficial customized edition with custom loop water cooling system, and a lil A2000 12G that i only load it with some lil llms around 7B Q4 which i eventually gave up using.

[-]

brickout@reddit

Consumer gpus are completely fine and easy to cool since they almost always have good built in cooling. You can always add additional fans.

More important is power, pcie lanes, and of course a mobo with two x8-x16 slots and physical space. You can get around space with pcie risers, if necessary.

Pcie is shared between ssd and gpus. You should fine, but might have to run 1 card at x8. But you'll still get like 90% of the performance. Vram total is more important, of course.

Amd cpus are great. Amd gpu support is quickly getting better, but everything is built for nvidia of course.

I have multiple builds I'm playing with. I'm stuck on AM4 but it's been fine. 5900xt/64GB DDR4/titan rtx 24gb. 5600x3d/64/amd pro v620 32GB. Threadripper 3970/128GB/2x 3090, hoping to add 2 more 3090 soon.

I also have a setup with Intel arc b580.

I'm basically trying to learn how to deploy on any basic hardware platforms, including laptops and android.

[-]

AurumDaemonHD@reddit

Gpus get their dedicated lanes.

Nvme can be chipset or cpu, depends.

They share pcie as platform but have their own lanes usually. Depends on the board best check.

U should get 100% even on pcie 4 8x in inference. Problem is p2p drivers if u r going this route thiugh for nvidia.

1 pcie space between cards is ok for me.

Amd cpus are great unless they fry on asrock.

[-]

Signal_Ad657@reddit

Very interested to see what pops up in this chat. I haven’t seen a lot of multi GPU AMD builds here.