Intel Pro B70 in stock at Newegg - $949
Posted by Altruistic_Call_3023@reddit | LocalLLaMA | View on Reddit | 82 comments
Just wanted to make folks aware as I just grabbed one and it says delivers less than a week. https://www.newegg.com/intel-arc-pro-b70-32gb-graphics-card/p/N82E16814883008
Newegg_Support@reddit
Have you received the new Pro B70? Let us know!
Altruistic_Call_3023@reddit (OP)
I have. My goal is to put it to use this weekend. Excited
lakySK@reddit
Ok, so now this is starting to be interesting. 32GB GPU with decent specs and low-ish wattage for $1k.
How do you expect a 4x b70 PC stack against M5 Max (now that it has the matmul support)?
Both would set you back around $5-6k. Both 128GB, similar bandwidth. Intel workstation likely winning on compute for prompt processing and M5 Max winning on power consumption and form factor? Or am I missing something important?
Dany0@reddit
Check out the level1techs vid on it, he had four of them and tested it
fallingdowndizzyvr@reddit
The performance from that is really slow. Here's the performance for a single user for Qwen 3.5 27B @ 8 bits.
"Avg prompt throughput: 85.4 tokens/s, , Avg generation throughput: 13.4 tokens/s"
That PP is super slow.
I've asked others who got there's for better performance numbers. Not one has responded. It only takes like a couple of minutes to run. Well.... unless the B70 is that super slow.
lacerating_aura@reddit
That bad? Could it just be software optimization issue or is the hard2are that lacking? Cause technically for non nvidia 32gb its either this intel card or amd ai pro ones.
fallingdowndizzyvr@reddit
It shouldn't be that bad. So there's something that not right. But the fact that people have responded to my request to do other benchmarks says something. Since I'm sure if it was good, they would have.
freefall_junkie@reddit
I purchased 2 on the initial release day that arrived 20 min ago. I am currently getting all the drivers configured but I will do some testing. I’ve been excited waiting on these and there is next to no info online. It seems like nobody really had them yet.
fallingdowndizzyvr@reddit
People have had them. It wasn't just the dudes at Level 1.
https://www.reddit.com/r/IntelArc/comments/1s8crqp/intel_arc_b70_for_llm_work_load/
freefall_junkie@reddit
Tbf in the first paragraph that guy specifies he is not using the recommend environment. I am working on getting the latest vLLM stuff set up to test with the stack they advertised. Could be cope but I’m still hopeful
fallingdowndizzyvr@reddit
He is. Which I pointed out in that thread and asked him to run again with the right one. Crickets.
freefall_junkie@reddit
Hey, got my 2x Arc B70 Pro setup working with vLLM 0.17.0-xpu. Still doing more testing and plan to do a full writeup this week with configs, docker-compose files, and detailed benchmarks, but here's what I've seen so far:
Hardware: 2x B70 Pro (32GB each), Ryzen 5 3600X, 48GB RAM, PCIe 4.0 x8, Ubuntu 24.04 w/ kernel 6.17
DeepSeek-R1-Distill-Qwen-32B (dense 32B, FP8 dynamic):
Qwen3-30B-A3B (MoE 30B/3B active, FP8 dynamic):
Interesting finding: pipeline parallelism beats tensor parallelism for MoE models on PCIe 4.0 x8, but TP wins for dense models. Makes sense when you think about compute-to-communication ratio per layer. On NVLink TP would win both.
Getting the docker setup working was the hardest part honestly. Will document all the gotchas in the full post
_W0z@reddit
Please do a write up
freefall_junkie@reddit
I am planning on getting it done by Thursday. Right now the rough plan is 2x B70 benchmarking in a single user setting, and agent style calling using vllms built in benchmarking. I plan on monitoring power usage for a W/tok measurement. The list can expand but for now I am going to use Qwen3.5 27B, Gemma4 31B, a model in the ~70B range, and an MoE model in the ~30B range.
schallau@reddit
Curious to hear how it is going. I'm considering getting 4 when stock re-appears. I suspect (or hope, really) that the support will get better soon.
damirca@reddit
I can't use 0.17 xpu docker image because it does not support fp8 kv cache.
> NotImplementedError: FlashAttention does not support fp8 kv-cache on this device.
So I have to wait for llm-scaler image where they add fp8 kv-cache on top of the publicly available vllm image.
fallingdowndizzyvr@reddit
Sweet. Thanks for that.
freefall_junkie@reddit
Yeah honestly kind of underwhelming. The lack of software support right now can’t be understated and even with 2 of them I can’t really use 70B class models.
fallingdowndizzyvr@reddit
It is. Which is kind of what I expected based on my experience with the A770s. They didn't come close to what the paper specs promised.
freefall_junkie@reddit
No crickets from me. I’m pulling Qwen3.5-27B-FP8 right now
wakIII@reddit
Pull faster
freefall_junkie@reddit
Lmao I had to go do something basically right as a sent that. I’m back at it, but the lack of plug and play functionality of nvidia cards is kicking my ass currently.
wakIII@reddit
lol I had to use my 4090 to run a model to fix openvino on archlinux to get my b50 pro working
prescorn@reddit
Nobody runs LLMs on intel right now, it’s unoptimized
fallingdowndizzyvr@reddit
I ran LLMs just fine on my A770s a couple of years ago. But what was just fine a couple of years ago is not fine today. Today, my A770s are on emergency standby.
prescorn@reddit
i don't think it's out of the question that performance on these newer cards improves significantly in the future, i think it's healthy for us all to want that regardless of whether we settled on red, green or blue!
ImportancePitiful795@reddit
There will be a video from Alex some time next few days with 4xB70s follow up from last week did it with 4xB60.
Ok_Mammoth589@reddit
I mean.. buy 8 and get 256gb for the price of 1 rtx pro 6000.
seamonn@reddit
...until you realize that the software support is crap and all you can run are 6 month old models.
UtmostProfessional@reddit
Is Qwen3-30B-A3B really that old of a model?
Running that on 2x B580s and it’s pretty decent using Vulkan/Mesa and Llamma.ccp
seamonn@reddit
Qwen 3 is like ancient. There's no point in running it when you have Qwen 3.5.
HardlyThereAtAll@reddit
I run Qwen3.5 on my Arc B50 Pro, and have done for a couple of weeks. Intel's vllm fork is pretty decent.
yon_impostor@reddit
I was running Gemma4 on my B580 just fine day-zero. Sometimes with new models an algorithm (like GDN for qwen3.5) will fall back to CPU for a little bit but usually it gets SYCL implementation pretty quick. Vulkan of course gets implemented at the same speed as every other card, and is only a little slower on prompt processing, especially with recent drivers since I'm pretty sure it will use KHR_coopmat for the XMX cores.
prescorn@reddit
For now - intel aren’t slowing down targeting this market and I don’t see nvidia responding
Mountain_Past_6513@reddit
Finally some competition that we badly needed ! Hoping for better pricing
crantob@reddit
Do you also subsidize shooting children's faces off with that purchase?
Ok_Improvement_3610@reddit
This one or two rtx 5060 16GB
Altruistic_Call_3023@reddit (OP)
I have two 5060ti 16gb. Work pretty well. When I get the b70 end of next week and have time to test, should know more.
Ok_Improvement_3610@reddit
Msi rtx5060 16G is 515$ each on amazon. Let me know if this one card make more sense
Dave_from_the_navy@reddit
Just so everyone knows, I have one currently running in my Dell Poweredge R730XD. The hardware dictates that it should be faster than the RTX 4070 Super in my gaming PC by about 15%-20%. On the same model (Qwen3.5-9B), I'm getting about 1/3 the token generation speed (and about 1/10 of the ingest speed), using llama.cpp with the CUDA backend on the 4070 and llama.cpp with the SYCL backend on the B70. I was averaging about 22 t/s on the B70 and about 65-70 t/s on my 4070 super.
I'm still happy with my purchase, and I'm very excited for the SYCL integration to get better over the next few months (if we use the older battlemage cards as a benchmark, we'll probably see 100%+ improvements within just the next 6 months alone!), but I just want you to temper your expectations if you're expecting to buy one, plug it in, and have an equal experience to an Nvidia card with similar hardware right now.
SwingNinja@reddit
Would that be the expected speed? From 22 t/s to 44 t/s?
Dave_from_the_navy@reddit
I'd be surprised if they don't eventually have similar if not better numbers than the 4070 super (maybe in the realm of 60+ t/s), but it might take a little while for Intel to get there. I'm posting more detailed comparison benchmarks later today with some speculation. It is mostly speculation though, as far as the future numbers are concerned. I have no insider knowledge or anything.
yon_impostor@reddit
If the B70 reports are as indicated, I'm expecting it to improve a lot. My B580 is double digits percent faster than my friend's 5060 in stable diffusion. I think despite generally working properly and generally dramatically outpacing vulkan for PP, the SYCL backend is a bit under-optimized. Hoping the B70 motivates some more contributors to it.
How is your B70 behaving in vulkan? And are your drivers (I think it's mesa-dependent?) new enough that it's reporting KHR_coopmat?
Dave_from_the_navy@reddit
I've been ignoring vulkan entirely for now. To be clear, I probably shouldn't. I'll be posting actual benchmarks later tonight comparing SYCL to CUDA on my 4070 Super... I'll have to follow up tomorrow with some more Vulkan, SYCL, and OpenVINO benchmarks, but I'm mostly just excited that I'm out of driver hell for the SYCL inference, lol.
yon_impostor@reddit
I had some trouble with SYCL until I had chatgpt or claude (don't recall which) write up a script to do the OneAPI toolkit install to my user directory so I didn't end up cluttering /opt/. Now I can just nuke it and go again if they release an update. Also it seems like the package dependency graph for their LevelZero apt packages are vaguely compatible with Debian 13/14 with minor finagling now. A lot of this stuff really wants to be running on Ubuntu but it's just not my jam. Debian 13 is holding me back from KHR_coopmat on my a380 though... Probably I could solve that by adding newer mesa repos. I'm just glad I don't have to screw around with Docker anymore.
I'm not sure how long you've been messing with the Intel stuff but it's a huge improvement vs back a ~year+ ago when I was first trying to use it all. The poor showing for the B70 may seem disheartening but if I were you I'd be comforted that they seem to be making a serious effort. Pytorch support especially is WAY WAY better, upstream pytorch xpu instead of having to rely on IPEX etc is massive. Makes comfyui and a lot of other stuff basically drop-in. Llama.cpp sycl is actually still not quite as performant as IPEX-LLM llama.cpp was, but at least it's way more stable and it gets updates immensely more often.
I did buy and then sell a B60 around launch instead of keeping it though, lol. Mostly because I needed the money for life stuff and already had a 3090. With the stack improvements I've been lately pondering getting back the A770 LE I loaned out to a friend and getting another for multigpu.
I'd make sure you're building llama.cpp sycl for fp16, there's a note that it doesn't always improve performance but I've yet to see any issues on the dedicated cards. I also got this one voodoo incantation from an older post which seemed to improve things very slightly on my B580. Your mileage may vary.
xrvz@reddit
1000$ for 32GB VRAM with this or 3000$ for 96GB+ VRAM with Strix Halo.
You get the entire rest of the computer for free with the latter, and ROCm drivers will still be easier than whatever you need to do for the Intel.
PhantomWolf83@reddit
This or R9700? All I want to do is inference, no training.
Consistent-Cold4505@reddit
yeah but it is intel. All the programs, drivers, etc... work with NVIDIA and (Sometimes AMD with quite a bit of work). Even at $1,000 for 32 GB it's not worth the headache to deal with all those issues (probably unsuccessfully) to be able to run a 14-20b model.
Altruistic_Call_3023@reddit (OP)
To each his own. Some of us love the challenge 😎
No_Afternoon_4260@reddit
Vulkan is no challenge
National_Meeting_749@reddit
And vulkan is implemented on like... Llama.cpp and kobold.cpp and.... That's it?
Vulkan support on most AI software is... Rare at best.
ThisWillPass@reddit
Except in a year when we vibe code a compatibility layer, etc.
National_Meeting_749@reddit
Claude isn't at that level yet. Claude can't do that
ThisWillPass@reddit
Yeah, I am under no hallucinations, just extrapolating, "AGI", has recently retargeted to 2027 down from \~20-29/30. Recent "Step change", with labs working on it, with the same compute. Something changed, 13 months, probably can nail it before. AGI will be hardware agnostic. I am probably calling it too early, but for me the writing is on the wall.... (sorry next time I'll save it for singularity sub)
Altruistic_Call_3023@reddit (OP)
Don’t give away the secret! Then it’ll be harder to buy and more expensive! Haha
No_Afternoon_4260@reddit
🫡😅
feckdespez@reddit
No, no. I have a B50 that I got at release. It's not worth it man. I wasted so many hours and it's still pretty awful.
I'd rather by a 9700 pro with 32 GB for $300 more than touch the B70 with a 10 foot pole.
Altruistic_Call_3023@reddit (OP)
I have a b60 and am happy with what I’ve gotten so far. Maybe it’s just me wanting the market to grow so I’m blue glasses tinted looking at it.
satireplusplus@reddit
Support in llama.cpp is actually decent and intel oneapi improved a lot lately. If all you want is LLM inference then its a viable alternative. I was able to run gguf models on the Intel iGPU of a N100 with 16GB DDR5, actually kinda impressive.
I really hope they do a 64GB version though, thats where we could really make a dent. At that point you start competing with the Nvidia Axxxx pro series, which are still $$$.
Time-Culture2549@reddit
Honestly we should stop telling people bro, i want to grab this on sale lmao
Time-Culture2549@reddit
When i bought my b580 I was struggling so hard I gave up. Tried a week ago and it has been easy sailing honestly. I think it is much easier to use these cards now and I think this release is going to prove that. But I do hope the hate pushes it down to $700 so I can snag a few lol
justan0therusername1@reddit
Depends on your needs but the intels in my workflows (for their purposes) have done great with no green tax
jacek2023@reddit
It’s worth checking the actual benchmarks for this card in the software you intend to use, for example llama.cpp, because implementation is often much more important than the spec. For example, an AMD card may look great on paper, but CUDA kernels may be better optimized. So before you buy, make sure it will actually work for your needs: specific model on specific software.
HopePupal@reddit
benchmarking yourself is great, but i had trouble finding any AMD consumer cards attached to cloud machines to test on (Runpod had some of the big current gen Instinct GPUs but no Radeons). Intel? currently impossible.
fallingdowndizzyvr@reddit
Right here dude.
https://github.com/ggml-org/llama.cpp/discussions/10879
HopePupal@reddit
i get wanting to keep a long-running set of benchmarks consistent, but performance on llama 7B Q4_0 tells me basically nothing about how Qwen 3.5 or Gemma 4 are gonna run!
jacek2023@reddit
There are many posts on reddit and github about AMD cards
HopePupal@reddit
yeah and i'll be making my own now that there's an R9700 under my desk. but i'm just saying: you can only reliably find Nvidia cards for that kind of testing. otherwise you're going to be extrapolating from forum posts that maybe kinda sorta look like your use case.
TemporalAgent7@reddit
Why are there no bechmarks for this card? It's crazy, it's been in reviewers' hands for weeks and now at retail and yet no one is running / publishing inference benchmarks.
Dave_from_the_navy@reddit
Posted elsewhere in this thread, but I'm seeing 1/3 the performance of my 4070 super on the same model on the llama.cpp backend. I'll probably make a detailed post with more scientific benchmarks later since you're right, it doesn't seem like anyone is publishing benchmarks! (To be fair, I've been fighting drivers and ReBAR problems for the past week, but I finally got up and running on SYCL via llama.cpp last night!)
TemporalAgent7@reddit
Thank you, looking forward to that.
1/3 of 4070 Super sounds abysmal, I'm hoping there's a misconfiguration because we really desperately need some competition to NVIDIA's monopoly.
Dave_from_the_navy@reddit
No misconfiguration I don't think. If I run it using OpenVINO instead of SYCL, I get a bit closer, about half the performance of the 4070 super, but I've been running into other issue with that build that I won't get into here... The latest drivers and toolkit for SYCL are essentially treating the B70 as a generic card, using the oneAPI compilers to take the generic C/C++ math and logic and translate it into hardware instructions rather than having the hand tuned kernels that Nvidia has for the 4070 super.
Also, flash attention is broken on the Xe2 architecture right now (hopefully will be fixed in the next couple months as per the llama.cpp GitHub). So that's a massive bottleneck for the ttfs!
ThisWillPass@reddit
Nda?
fallingdowndizzyvr@reddit
NDA? For a released product. No. People got this like a week ago and posted numbers. The numbers just suck. I've asked people to run different benchmarks to see if it really sucks. They don't respond. Which is not a good sign. Since if it was good, they would have.
TemporalAgent7@reddit
It's available at retail now though. Surely if the reviewers signed an NDA they're released now.
fallingdowndizzyvr@reddit
People have posted numbers. But they pretty much suck.
Here's the performance for a single user for Qwen 3.5 27B @ 8 bits from Level 1.
"Avg prompt throughput: 85.4 tokens/s, , Avg generation throughput: 13.4 tokens/s"
That PP is super slow. My Strix Halo is like 350tk/s.
overand@reddit
40% the core count and 65%of the memory bandwidth of a 3090, but 32GB rather than 24GB, and it's a new card vs \~6 year old 3090s. It's not a home run, but if it benchmarks decenty compared to a 3090, then it's a good alternative for home users. As for businesses? That's going to depend entirely on workload support, I think.
fallingdowndizzyvr@reddit
Core count only matters when comparing the same gen of tech from the same company. Core counts across architectures don't mean a thing.
bcredeur97@reddit
Unfortunately nvidia just has the monopoly on the software side of things, so it’s hard to consider anything else if you want to be “serious”
But this would be fun to play with.
WoodCreakSeagull@reddit
Always good to have competition. They've been growing their market share, at this rate I would love to see them release something like a 500 dollar 20GB VRAM card or similar that you could slot into an existing consumer system. Running models on vulkan/splitting tensors with RPC has a performance tradeoff but those tradeoffs for certain use cases can be tolerated if you're getting increasing performance of this class of open model.
BitterProfessional7p@reddit
I just seen it changing from in stock to out of stock. F
ea_man@reddit
Let's see if the b65 hits the $800 mark.