Intel Arc Pro B70 Open-Source Linux Performance Against NVIDIA RTX & AMD Radeon AI PRO Review

[-]

Icy_Gur6890@reddit

Am i reading this wrong? Are they using llama at b8121 https://github.com/ggml-org/llama.cpp/releases/tag/b8121

A release over 2 months old that should theoretically not even have support ofr the card they are benchmarking?

Disclosure. I bought an arc 70 so i might be bias.the performance hasnt been particularly optimal and r There are alot of bugs. But ive been following what they are doing and tentatively expect improvements to happen over the next couple of weeks.

[-]

fallingdowndizzyvr@reddit (OP)

A release over 2 months old that should theoretically not even have support ofr the card they are benchmarking?

Llama.cpp doesn't support or not support any card. That's not how it works. Yes, it can be tweaked to support some architectures better. But even with those, the B70 is still pretty slow. Here is a thread today using a llama.cpp release from yesterday. Still slow.

https://www.reddit.com/r/LocalLLaMA/comments/1spwztz/llamabench_results_with_sycl_backend_intel_arc/

[-]

Icy_Gur6890@reddit

I mean there is defintiely a bottleneck that i havent found yet. I would say that sycl is not currently optimized for the architecture and using a 2 month old build does have impact if you arent building with sycl in mind. Even in vllm ive been having to mess with the xpu kernel used. And was able to do about 26 t/s on gemma4 31b at q4. On 512 pp and reduced drastically afterwards. I also posted a thread that has my benchmarks on the card. Im not quite writing it off yet. I agree right now performance is atrocious.flash attention being broken has been my biggest painpoint stability is questionable.

[-]

fallingdowndizzyvr@reddit (OP)

I would say that sycl is not currently optimized for the architecture

I'm still waiting for it to be "optimized" for the A770. It's been two years now. Or it's simply as good as it's going to get. Which is well under what the paper specs promise.

I agree right now performance is atrocious.flash attention being broken has been my biggest painpoint stability is questionable.

Doesn't the Vulkan FA work? That works on anything as far as I know.

[-]

Icy_Gur6890@reddit

Well thats fair. You run it on linux or windows. Have almost been considering firing it up on wondows and seeing if ai playground does better.im alsoabout to just redo my host up on baremetal ubuntu 26.04 and see what kind of performance i get.

[-]

fallingdowndizzyvr@reddit (OP)

I've mentioned it a few times through the years, but on Intel Vulkan runs much better under Windows than Linux. Yes, the gap has closed. A year ago, it was atrocious.

Here, I discussed it in this thread. Look at the update under OP. Thankfully, the gap is not that big any more.

https://www.reddit.com/r/LocalLLaMA/comments/1hf98oy/someone_posted_some_numbers_for_llm_on_the_intel/

[-]

Icy_Gur6890@reddit

https://github.com/ggml-org/llama.cpp/issues/21517

Here is why i have a gripe with them running an old llama.cpp commit. I think its still slow but it feels like spreading misinformation. Theres at least 3 otber major gane changers in the last 2 month that take it from painfully slow to almost acceptable.

[-]

fallingdowndizzyvr@reddit (OP)

But it's the same for a lot of GPUs. The 7900xtx for example. It someone spent the time to tweak llama.cpp for that, it would also be much faster. As it would be for a lot of GPUs. But as I was saying, that's not how llama.cpp works. It's not tuned for each and every GPU. It's generic, not specific. So sure, you could tune it to be better for the B70. But by the same token, you could tune it to be better for other GPUs. And thus the gap would remain.

Here's how you can tune llama.cpp for the 7900xtx. It makes it up to 225% faster.

"2x decode speedup on Qwen3.5-27B (12 -> 27 tok/s on a 7900 XTX)"

https://github.com/apollosenvy/kernel-anvil

[-]

Icy_Gur6890@reddit

Yeah i guess thats fair. I was seeing it as the intel sycl improvements are substantial. And it would be like running rocm 6.8 vs running rocm 7.0 on strix halo. But i do get what you mean it doesnt change that the performance is pretty poor. I would say we just about passed the 2 week mark but the intel ai stuff is pretty far behind and while it feels like they are working through a bunch of stuff a whole more bunch of stuff is still brokeb

[-]

Icy_Gur6890@reddit

I might go install windows just so i can report on what best case scenario will be

[-]

Icy_Gur6890@reddit

As for vulkan fa yes it works but sycl performance is at least 3x of vulkan

[-]

digiwiggles@reddit

If you are looking for a more thorough review for AI, I thought this was better:

https://www.youtube.com/watch?v=RcIWhm16ouQ

[-]

RoterElephant@reddit

Alex Ziskind. I would love to hire this guy, just so that I can fire him on his first day for incompetency.

He might be the most annoying hardware reviewer on YouTube. He never presents any of his findings in a clear and readable way. It's just some random video overlays, never any summaries, random model picks, with some random software installation. Zero testing methodology, nothing is reproducible.

[-]

czktcx@reddit

Only a Qwen3-4B model with this 32G card? Who knows if it's the only model get tuned by Intel...

[-]

vasimv@reddit

Looks like something wrong with intel drivers/hardware backends. Such small pp512 speed while about same memory bandwidth and actually much higher scores in other GPU tests...

[-]

fallingdowndizzyvr@reddit (OP)

while about same memory bandwidth

PP is about compute, not memory bandwidth. The Spark has about the same memory bandwidth as Strix Halo. But it blows away Strix Halo for PP. It has more compute.

[-]

jacek2023@reddit

No LLM benchmarks?

[-]

fallingdowndizzyvr@reddit (OP)

It's like literally the first benchmarks they show. Using llama.cpp no less.

https://www.phoronix.com/review/intel-arc-pro-b70/2

[-]