Anyone tried a setup like this? Is it a bad idea? 😅
Posted by Librarian-Rare@reddit | LocalLLaMA | View on Reddit | 19 comments
I’m considering building a local machine for AI inference using a Dell Precision T5820 and 2 Intel Arc A770’s.
From this I could get 32GB DDR4 RAM, 1TB SSD and 32GB VRAM, all for like $1000. It sounds great, but it means that it’ll be running on pcie gen3, and have a MB with no reBar support while trying to split a model across two Intel GPUs.
I’m wanting to run Qwen 3.6 35b a3b q6 since everyone has been hailing it.
Just don’t know what I’m getting myself into.
Global_Tap_1812@reddit
So I looked into this with my 7900 xtx - apparently it's a whole different ball game when you want to share one model over two cards. First things first there's a limited number of pcie lanes. Not just on the motherboard but also related to the design of the CPU. I've got a 24 core Intel i9-14900k but it only has like 20 lanes and my motherboard can only run a second GPU at pcie x4 so logistically there was some additional complexity.
On top of that, not all software is created equal. CUDA has the deepest and most mature support which is why you see so many of those builds, and comparatively fewer AMD builds as ROCm (and presumably Intel as well) support lags.
My experience on qwen 3.6 35b a3b MoE has been that you can save a lot on vram usage by offloading the experts to CPU/RAM - using that setup at 128k context window (I think 8 bit quant for kv cache) it takes up like 13gb VRAM on my setup and 12gb of RAM and performance isn't noticeably worse than qwen3 14B dense. So if I were in your shoes, for that model specifically, I would target something like a 9060 xt which has 16gb and then get a second one down the road. Or maybe try to find a 5060 ti used. But again if you're looking for a project the arc a770s can be that, it just occurs to me as a lot of work vs AMD or Nvidia.
DeathGuppie@reddit
I've found that Vulcan tends to work better than ROCm for consumer amd graphics cards. It's more robust and doesn't have all the memory routing stuff that you need for server stuff but don't for graphics cards.
Librarian-Rare@reddit (OP)
Apparently the 9060 XT supports x16 but the 5060ti only suports x8 ?!??!??
see_spot_ruminate@reddit
People always post about pcie lanes being limited, when the most limiting factor is vram.
That said intel gpus are not good enough.
Librarian-Rare@reddit (OP)
Hmm, yeah the lotta work doesn’t sound nice. 5060ti’s are the alternative I was eyeing, but $600 vs $300 for the arcs makes the arcs attractive lol
Agitated-Fly3564@reddit
Damn a lot of thought has been put into this
Normal-Ad-7114@reddit
If you just want to mess with llama.cpp and see how well qwen runs on different hardware, just rent a server, you can choose different configs including gpu models, count, ram size etc., all at a fraction of the cost. They're usually paid per-hour or per-minute even. This way you can figure out yourself what to expect
etaoin314@reddit
ive been very curious about those chinese 2080ti with the double ram, did you run into any driver problems with those?
Normal-Ad-7114@reddit
In terms of drivers and software they are exactly the same as the regular 2080ti
Vusiwe@reddit
I had a 5820 maxed out at 512GB RAM (all 64GB DIMMs, the largest it can support) + a Max-Q
I recently upgraded to a 7920 instead
Librarian-Rare@reddit (OP)
Biggest reason to upgrade? 5820 is $350 while 7920 is $2k 😅
slavik-dev@reddit
7920 has 1400W power supply. That's good. There are cheap under $700 deals.
But it's dual GPU, which is bad for inference. And if you pull remove one CPU, then you can only use half PCIe slots
FullstackSensei@reddit
Gen 3 and no rebar will be the least of your issues with Arc.
While there aren't many, search this sub for posts or comments about using Arc cards for LLMs. One thing is sure, it won't be as pleasant as Nvidia or AMD.
Building rigs for the model of the day is generally a very bad idea. Models have a shelf life of like 3 months. While 32GB VRAM is nice, don't tie yourself to a single model. A quick Google search tells me the 5820 runs LGA2066/C422. That's the workstation/server cousin of X299, so you get quad DDR4 memory. You can get some decent t/s numbers with larger models running hybrid if you choose your hardware wisely.
Librarian-Rare@reddit (OP)
So you’re saying Intel GPUs tend to be a bad time
CalligrapherFar7833@reddit
Horrible time not bad time also rebar is pretty much a requirement for intel gpus
Triple-Tooketh@reddit
You cant run ollama on Arcs. You need to dig into specifics but I put a week on it and then bought AMD cards. I'll dig through my notes and post what I can find but my experience was ollama is no go.
Triple-Tooketh@reddit
This got me Googling and I found this
https://markaicode.com/intel-arc-gpu-ollama-openvino-tutorial/
Hope springs eternal
YOU_WONT_LIKE_IT@reddit
What’s the PCIe slots in the T5820?
Librarian-Rare@reddit (OP)
Has two PCIe x16 gen 3