Anyone tried a setup like this? Is it a bad idea? 😅

Posted by Librarian-Rare@reddit | LocalLLaMA | View on Reddit | 19 comments

I’m considering building a local machine for AI inference using a Dell Precision T5820 and 2 Intel Arc A770’s.

From this I could get 32GB DDR4 RAM, 1TB SSD and 32GB VRAM, all for like $1000. It sounds great, but it means that it’ll be running on pcie gen3, and have a MB with no reBar support while trying to split a model across two Intel GPUs.

I’m wanting to run Qwen 3.6 35b a3b q6 since everyone has been hailing it.

Just don’t know what I’m getting myself into.

[-]

Global_Tap_1812@reddit

So I looked into this with my 7900 xtx - apparently it's a whole different ball game when you want to share one model over two cards. First things first there's a limited number of pcie lanes. Not just on the motherboard but also related to the design of the CPU. I've got a 24 core Intel i9-14900k but it only has like 20 lanes and my motherboard can only run a second GPU at pcie x4 so logistically there was some additional complexity.

On top of that, not all software is created equal. CUDA has the deepest and most mature support which is why you see so many of those builds, and comparatively fewer AMD builds as ROCm (and presumably Intel as well) support lags.

My experience on qwen 3.6 35b a3b MoE has been that you can save a lot on vram usage by offloading the experts to CPU/RAM - using that setup at 128k context window (I think 8 bit quant for kv cache) it takes up like 13gb VRAM on my setup and 12gb of RAM and performance isn't noticeably worse than qwen3 14B dense. So if I were in your shoes, for that model specifically, I would target something like a 9060 xt which has 16gb and then get a second one down the road. Or maybe try to find a 5060 ti used. But again if you're looking for a project the arc a770s can be that, it just occurs to me as a lot of work vs AMD or Nvidia.

[-]

DeathGuppie@reddit

I've found that Vulcan tends to work better than ROCm for consumer amd graphics cards. It's more robust and doesn't have all the memory routing stuff that you need for server stuff but don't for graphics cards.

[-]

Librarian-Rare@reddit (OP)

Apparently the 9060 XT supports x16 but the 5060ti only suports x8 ?!??!??

[-]

see_spot_ruminate@reddit

People always post about pcie lanes being limited, when the most limiting factor is vram.

That said intel gpus are not good enough.

[-]

Librarian-Rare@reddit (OP)

Hmm, yeah the lotta work doesn’t sound nice. 5060ti’s are the alternative I was eyeing, but $600 vs $300 for the arcs makes the arcs attractive lol

[-]

Agitated-Fly3564@reddit

Damn a lot of thought has been put into this

[-]

Normal-Ad-7114@reddit

If you just want to mess with llama.cpp and see how well qwen runs on different hardware, just rent a server, you can choose different configs including gpu models, count, ram size etc., all at a fraction of the cost. They're usually paid per-hour or per-minute even. This way you can figure out yourself what to expect

[-]

etaoin314@reddit

ive been very curious about those chinese 2080ti with the double ram, did you run into any driver problems with those?

[-]

Normal-Ad-7114@reddit

In terms of drivers and software they are exactly the same as the regular 2080ti

[-]

Vusiwe@reddit

I had a 5820 maxed out at 512GB RAM (all 64GB DIMMs, the largest it can support) + a Max-Q

I recently upgraded to a 7920 instead

[-]

Librarian-Rare@reddit (OP)

Biggest reason to upgrade? 5820 is $350 while 7920 is $2k 😅

[-]

slavik-dev@reddit

7920 has 1400W power supply. That's good. There are cheap under $700 deals.

But it's dual GPU, which is bad for inference. And if you pull remove one CPU, then you can only use half PCIe slots

[-]

FullstackSensei@reddit

Gen 3 and no rebar will be the least of your issues with Arc.

While there aren't many, search this sub for posts or comments about using Arc cards for LLMs. One thing is sure, it won't be as pleasant as Nvidia or AMD.

Building rigs for the model of the day is generally a very bad idea. Models have a shelf life of like 3 months. While 32GB VRAM is nice, don't tie yourself to a single model. A quick Google search tells me the 5820 runs LGA2066/C422. That's the workstation/server cousin of X299, so you get quad DDR4 memory. You can get some decent t/s numbers with larger models running hybrid if you choose your hardware wisely.

[-]

Librarian-Rare@reddit (OP)

So you’re saying Intel GPUs tend to be a bad time

[-]

Hope springs eternal

[-]

YOU_WONT_LIKE_IT@reddit

What’s the PCIe slots in the T5820?

[-]

Librarian-Rare@reddit (OP)

Has two PCIe x16 gen 3