Best Model to use with Arc Pro B70
Posted by Player13377@reddit | LocalLLaMA | View on Reddit | 8 comments
I am looking for the best model that can fit on an Arc Pro B70 with space to spare for context. Specifically important to me is very thorough search and some amount of coding. Currently looking at Gemma4.
semangeIof@reddit
Arc Pro memory bandwidth means dense models will be a little slow. I'd recommend using the 26B A4B. You should be able to fit a Q6 with good context if your cmdline is set up right for llama.cpp. But if you're fine with lower output speed you could fit a Q4 of the dense model also with very high context.
Player13377@reddit (OP)
Regarding bandwidth: I came across a few cheapish listings for used 7900XTX. Should I consider getting two of those instead of the B70 (it is a preorder still) or is AMD and split GPU too large of a downside?
Fluffywings@reddit
For my dollar I would personally choose 2x 7900 XTX as my car and PSU could handle them.
semangeIof@reddit
2 7900XTX is a lot of power. Also the coil whine on these guys is terrible. I have the Sapphire Nitro+ Vapor-X in my gaming PC and it whines so bad. It'll run hot. A little faster than the B70s cause faster chipset/memory/better Vulkan but it'll still run hot.
The B70 is very much a tinkerers card. You will probably be using vLLM over llama.cpp. Software support is less mature. There will be bugs and things will not work consistently. But if you get it working and working well you have great power efficiency and high VRAM for relatively cheap. However a 3090/4090 or 7900XTX will always run faster. Specific benchmarks you'll have to investigate yourself.
I'll be picking up a few B70s in the late April restock because I like the above but you may not.
Also consider that B70s are new and 7900XTX cards are high power heavy gaming units that run hot and have lived lives. It may require maintenance that a new card would not.
Fluffywings@reddit
I have the XFX Merc 310 and great no coil whine luckily.
Player13377@reddit (OP)
I am not afraid of tinkering on either side. Repasting GPUs or fiddling with vLLM should be fine, I am mainly looking for most intelligence per $ since I will be the only user for 95% of the time and looking for a replacement to crippled cloud agents. Whatever I buy would have to run on a Supermicro H11SSL-i and I am just not experienced enough to tell if the extra VRAM and Bandwidth advantage would be killed by PCIe3.0. Bonus for the 7900XTXs would be that I can repurpose them for gaming VMs or what not when needed and on the other hand electricity is expensive over here... tough decision.
semangeIof@reddit
Considering your platform is DDR4 I would target complete offload to GPU for whatever I'm running. Honestly just depends if you like toks/s or higher precision more. If I personally had this setup I'd stuff it with B70s I think. Up to you though.
ea_man@reddit
You look at the size of the model and then add the VRAM consumption for the context you wanna run, then you chose the QWENS or GEMMA in that room range.
B70 has "average" ram speed, bad sw optimization, you multiply VRAM size for 2x and you get half that slow speed.