Intel Pro B70 in stock at Newegg - $949

[-]

Newegg_Support@reddit

Have you received the new Pro B70? Let us know!

[-]

Altruistic_Call_3023@reddit (OP)

I have. My goal is to put it to use this weekend. Excited

[-]

lakySK@reddit

Ok, so now this is starting to be interesting. 32GB GPU with decent specs and low-ish wattage for $1k.

How do you expect a 4x b70 PC stack against M5 Max (now that it has the matmul support)?

Both would set you back around $5-6k. Both 128GB, similar bandwidth. Intel workstation likely winning on compute for prompt processing and M5 Max winning on power consumption and form factor? Or am I missing something important?

[-]

Dany0@reddit

Check out the level1techs vid on it, he had four of them and tested it

[-]

fallingdowndizzyvr@reddit

The performance from that is really slow. Here's the performance for a single user for Qwen 3.5 27B @ 8 bits.

"Avg prompt throughput: 85.4 tokens/s, , Avg generation throughput: 13.4 tokens/s"

That PP is super slow.

I've asked others who got there's for better performance numbers. Not one has responded. It only takes like a couple of minutes to run. Well.... unless the B70 is that super slow.

[-]

lacerating_aura@reddit

That bad? Could it just be software optimization issue or is the hard2are that lacking? Cause technically for non nvidia 32gb its either this intel card or amd ai pro ones.

[-]

fallingdowndizzyvr@reddit

It shouldn't be that bad. So there's something that not right. But the fact that people have responded to my request to do other benchmarks says something. Since I'm sure if it was good, they would have.

[-]

freefall_junkie@reddit

I purchased 2 on the initial release day that arrived 20 min ago. I am currently getting all the drivers configured but I will do some testing. I’ve been excited waiting on these and there is next to no info online. It seems like nobody really had them yet.

[-]

fallingdowndizzyvr@reddit

It seems like nobody really had them yet.

People have had them. It wasn't just the dudes at Level 1.

https://www.reddit.com/r/IntelArc/comments/1s8crqp/intel_arc_b70_for_llm_work_load/

[-]

freefall_junkie@reddit

Tbf in the first paragraph that guy specifies he is not using the recommend environment. I am working on getting the latest vLLM stuff set up to test with the stack they advertised. Could be cope but I’m still hopeful

[-]

fallingdowndizzyvr@reddit

Tbf in the first paragraph that guy specifies he is not using the recommend environment.

He is. Which I pointed out in that thread and asked him to run again with the right one. Crickets.

[-]

freefall_junkie@reddit

Hey, got my 2x Arc B70 Pro setup working with vLLM 0.17.0-xpu. Still doing more testing and plan to do a full writeup this week with configs, docker-compose files, and detailed benchmarks, but here's what I've seen so far:

Hardware: 2x B70 Pro (32GB each), Ryzen 5 3600X, 48GB RAM, PCIe 4.0 x8, Ubuntu 24.04 w/ kernel 6.17

DeepSeek-R1-Distill-Qwen-32B (dense 32B, FP8 dynamic):

\~22.5 tok/s single user (tensor parallel)
\~14.6 tok/s single user (pipeline parallel)
16 GiB weights per GPU, 84K token KV cache
TP is the clear winner for dense models on PCIe

Qwen3-30B-A3B (MoE 30B/3B active, FP8 dynamic):

\~18-19 tok/s single user (both TP and PP similar, PP slightly faster)
\~294 tok/s total throughput at 16 concurrent (PP) vs \~264 tok/s (TP)
14.5 GiB weights per GPU, 293K token KV cache
PP actually wins for MoE on bandwidth-constrained PCIe — fewer inter-GPU transfers per token

Interesting finding: pipeline parallelism beats tensor parallelism for MoE models on PCIe 4.0 x8, but TP wins for dense models. Makes sense when you think about compute-to-communication ratio per layer. On NVLink TP would win both.

Getting the docker setup working was the hardest part honestly. Will document all the gotchas in the full post

[-]

_W0z@reddit

Please do a write up

[-]

freefall_junkie@reddit

I am planning on getting it done by Thursday. Right now the rough plan is 2x B70 benchmarking in a single user setting, and agent style calling using vllms built in benchmarking. I plan on monitoring power usage for a W/tok measurement. The list can expand but for now I am going to use Qwen3.5 27B, Gemma4 31B, a model in the ~70B range, and an MoE model in the ~30B range.

[-]

schallau@reddit

Curious to hear how it is going. I'm considering getting 4 when stock re-appears. I suspect (or hope, really) that the support will get better soon.

[-]

damirca@reddit

I can't use 0.17 xpu docker image because it does not support fp8 kv cache.

> NotImplementedError: FlashAttention does not support fp8 kv-cache on this device.

So I have to wait for llm-scaler image where they add fp8 kv-cache on top of the publicly available vllm image.

[-]

fallingdowndizzyvr@reddit

Sweet. Thanks for that.

[-]

freefall_junkie@reddit

Yeah honestly kind of underwhelming. The lack of software support right now can’t be understated and even with 2 of them I can’t really use 70B class models.

[-]

fallingdowndizzyvr@reddit

It is. Which is kind of what I expected based on my experience with the A770s. They didn't come close to what the paper specs promised.

[-]

freefall_junkie@reddit

No crickets from me. I’m pulling Qwen3.5-27B-FP8 right now

[-]

wakIII@reddit

Pull faster

[-]

freefall_junkie@reddit

Lmao I had to go do something basically right as a sent that. I’m back at it, but the lack of plug and play functionality of nvidia cards is kicking my ass currently.

[-]

wakIII@reddit

lol I had to use my 4090 to run a model to fix openvino on archlinux to get my b50 pro working

[-]

prescorn@reddit

Nobody runs LLMs on intel right now, it’s unoptimized

[-]

fallingdowndizzyvr@reddit

I ran LLMs just fine on my A770s a couple of years ago. But what was just fine a couple of years ago is not fine today. Today, my A770s are on emergency standby.

[-]

prescorn@reddit

i don't think it's out of the question that performance on these newer cards improves significantly in the future, i think it's healthy for us all to want that regardless of whether we settled on red, green or blue!

[-]

ImportancePitiful795@reddit

There will be a video from Alex some time next few days with 4xB70s follow up from last week did it with 4xB60.

[-]

Ok_Mammoth589@reddit

I mean.. buy 8 and get 256gb for the price of 1 rtx pro 6000.

[-]

seamonn@reddit

...until you realize that the software support is crap and all you can run are 6 month old models.

[-]

UtmostProfessional@reddit

Is Qwen3-30B-A3B really that old of a model?

Running that on 2x B580s and it’s pretty decent using Vulkan/Mesa and Llamma.ccp

[-]

seamonn@reddit

Qwen3-30B-A3B

Qwen 3 is like ancient. There's no point in running it when you have Qwen 3.5.

[-]

HardlyThereAtAll@reddit

I run Qwen3.5 on my Arc B50 Pro, and have done for a couple of weeks. Intel's vllm fork is pretty decent.

[-]

yon_impostor@reddit

I was running Gemma4 on my B580 just fine day-zero. Sometimes with new models an algorithm (like GDN for qwen3.5) will fall back to CPU for a little bit but usually it gets SYCL implementation pretty quick. Vulkan of course gets implemented at the same speed as every other card, and is only a little slower on prompt processing, especially with recent drivers since I'm pretty sure it will use KHR_coopmat for the XMX cores.

[-]

prescorn@reddit

For now - intel aren’t slowing down targeting this market and I don’t see nvidia responding

[-]

Mountain_Past_6513@reddit

Finally some competition that we badly needed ! Hoping for better pricing

[-]

crantob@reddit

Do you also subsidize shooting children's faces off with that purchase?

[-]

Ok_Improvement_3610@reddit

This one or two rtx 5060 16GB

[-]

Altruistic_Call_3023@reddit (OP)

I have two 5060ti 16gb. Work pretty well. When I get the b70 end of next week and have time to test, should know more.

[-]

Ok_Improvement_3610@reddit

Msi rtx5060 16G is 515$ each on amazon. Let me know if this one card make more sense

[-]

Dave_from_the_navy@reddit

Just so everyone knows, I have one currently running in my Dell Poweredge R730XD. The hardware dictates that it should be faster than the RTX 4070 Super in my gaming PC by about 15%-20%. On the same model (Qwen3.5-9B), I'm getting about 1/3 the token generation speed (and about 1/10 of the ingest speed), using llama.cpp with the CUDA backend on the 4070 and llama.cpp with the SYCL backend on the B70. I was averaging about 22 t/s on the B70 and about 65-70 t/s on my 4070 super.

I'm still happy with my purchase, and I'm very excited for the SYCL integration to get better over the next few months (if we use the older battlemage cards as a benchmark, we'll probably see 100%+ improvements within just the next 6 months alone!), but I just want you to temper your expectations if you're expecting to buy one, plug it in, and have an equal experience to an Nvidia card with similar hardware right now.

[-]

SwingNinja@reddit

we'll probably see 100%+ improvements within just the next 6 months alone

Would that be the expected speed? From 22 t/s to 44 t/s?

[-]

Dave_from_the_navy@reddit

I'd be surprised if they don't eventually have similar if not better numbers than the 4070 super (maybe in the realm of 60+ t/s), but it might take a little while for Intel to get there. I'm posting more detailed comparison benchmarks later today with some speculation. It is mostly speculation though, as far as the future numbers are concerned. I have no insider knowledge or anything.

[-]

yon_impostor@reddit

If the B70 reports are as indicated, I'm expecting it to improve a lot. My B580 is double digits percent faster than my friend's 5060 in stable diffusion. I think despite generally working properly and generally dramatically outpacing vulkan for PP, the SYCL backend is a bit under-optimized. Hoping the B70 motivates some more contributors to it.

How is your B70 behaving in vulkan? And are your drivers (I think it's mesa-dependent?) new enough that it's reporting KHR_coopmat?

[-]

Dave_from_the_navy@reddit

I've been ignoring vulkan entirely for now. To be clear, I probably shouldn't. I'll be posting actual benchmarks later tonight comparing SYCL to CUDA on my 4070 Super... I'll have to follow up tomorrow with some more Vulkan, SYCL, and OpenVINO benchmarks, but I'm mostly just excited that I'm out of driver hell for the SYCL inference, lol.

[-]

yon_impostor@reddit

I had some trouble with SYCL until I had chatgpt or claude (don't recall which) write up a script to do the OneAPI toolkit install to my user directory so I didn't end up cluttering /opt/. Now I can just nuke it and go again if they release an update. Also it seems like the package dependency graph for their LevelZero apt packages are vaguely compatible with Debian 13/14 with minor finagling now. A lot of this stuff really wants to be running on Ubuntu but it's just not my jam. Debian 13 is holding me back from KHR_coopmat on my a380 though... Probably I could solve that by adding newer mesa repos. I'm just glad I don't have to screw around with Docker anymore.

I'm not sure how long you've been messing with the Intel stuff but it's a huge improvement vs back a ~year+ ago when I was first trying to use it all. The poor showing for the B70 may seem disheartening but if I were you I'd be comforted that they seem to be making a serious effort. Pytorch support especially is WAY WAY better, upstream pytorch xpu instead of having to rely on IPEX etc is massive. Makes comfyui and a lot of other stuff basically drop-in. Llama.cpp sycl is actually still not quite as performant as IPEX-LLM llama.cpp was, but at least it's way more stable and it gets updates immensely more often.

I did buy and then sell a B60 around launch instead of keeping it though, lol. Mostly because I needed the money for life stuff and already had a 3090. With the stack improvements I've been lately pondering getting back the A770 LE I loaned out to a friend and getting another for multigpu.

I'd make sure you're building llama.cpp sycl for fp16, there's a note that it doesn't always improve performance but I've yet to see any issues on the dedicated cards. I also got this one voodoo incantation from an older post which seemed to improve things very slightly on my B580. Your mileage may vary.

cmake .. -DGGML_SYCL=ON -DGGML_SYCL_GRAPH=ON -DGGML_GRAPH=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_OPENSSL=OFF -DGGML_NATIVE=ON -DGGML_SYCL_F16=ON

[-]

xrvz@reddit

1000$ for 32GB VRAM with this or 3000$ for 96GB+ VRAM with Strix Halo.

You get the entire rest of the computer for free with the latter, and ROCm drivers will still be easier than whatever you need to do for the Intel.

[-]

PhantomWolf83@reddit

This or R9700? All I want to do is inference, no training.

[-]

Consistent-Cold4505@reddit

yeah but it is intel. All the programs, drivers, etc... work with NVIDIA and (Sometimes AMD with quite a bit of work). Even at $1,000 for 32 GB it's not worth the headache to deal with all those issues (probably unsuccessfully) to be able to run a 14-20b model.

[-]

Altruistic_Call_3023@reddit (OP)

To each his own. Some of us love the challenge 😎

[-]

No_Afternoon_4260@reddit

Vulkan is no challenge

[-]

National_Meeting_749@reddit

And vulkan is implemented on like... Llama.cpp and kobold.cpp and.... That's it?

Vulkan support on most AI software is... Rare at best.

[-]

ThisWillPass@reddit

Except in a year when we vibe code a compatibility layer, etc.

[-]

National_Meeting_749@reddit

Claude isn't at that level yet. Claude can't do that

[-]

ThisWillPass@reddit

Yeah, I am under no hallucinations, just extrapolating, "AGI", has recently retargeted to 2027 down from \~20-29/30. Recent "Step change", with labs working on it, with the same compute. Something changed, 13 months, probably can nail it before. AGI will be hardware agnostic. I am probably calling it too early, but for me the writing is on the wall.... (sorry next time I'll save it for singularity sub)

[-]

Altruistic_Call_3023@reddit (OP)

Don’t give away the secret! Then it’ll be harder to buy and more expensive! Haha

[-]

No_Afternoon_4260@reddit

🫡😅

[-]

feckdespez@reddit

No, no. I have a B50 that I got at release. It's not worth it man. I wasted so many hours and it's still pretty awful.

I'd rather by a 9700 pro with 32 GB for $300 more than touch the B70 with a 10 foot pole.

[-]

Altruistic_Call_3023@reddit (OP)

I have a b60 and am happy with what I’ve gotten so far. Maybe it’s just me wanting the market to grow so I’m blue glasses tinted looking at it.

[-]

satireplusplus@reddit

Support in llama.cpp is actually decent and intel oneapi improved a lot lately. If all you want is LLM inference then its a viable alternative. I was able to run gguf models on the Intel iGPU of a N100 with 16GB DDR5, actually kinda impressive.

I really hope they do a 64GB version though, thats where we could really make a dent. At that point you start competing with the Nvidia Axxxx pro series, which are still $$$.

[-]

Time-Culture2549@reddit

Honestly we should stop telling people bro, i want to grab this on sale lmao

[-]

Time-Culture2549@reddit

When i bought my b580 I was struggling so hard I gave up. Tried a week ago and it has been easy sailing honestly. I think it is much easier to use these cards now and I think this release is going to prove that. But I do hope the hate pushes it down to $700 so I can snag a few lol

[-]

justan0therusername1@reddit

Depends on your needs but the intels in my workflows (for their purposes) have done great with no green tax

[-]

jacek2023@reddit

It’s worth checking the actual benchmarks for this card in the software you intend to use, for example llama.cpp, because implementation is often much more important than the spec. For example, an AMD card may look great on paper, but CUDA kernels may be better optimized. So before you buy, make sure it will actually work for your needs: specific model on specific software.

[-]

HopePupal@reddit

benchmarking yourself is great, but i had trouble finding any AMD consumer cards attached to cloud machines to test on (Runpod had some of the big current gen Instinct GPUs but no Radeons). Intel? currently impossible.

[-]

fallingdowndizzyvr@reddit

Right here dude.

https://github.com/ggml-org/llama.cpp/discussions/10879

[-]

HopePupal@reddit

i get wanting to keep a long-running set of benchmarks consistent, but performance on llama 7B Q4_0 tells me basically nothing about how Qwen 3.5 or Gemma 4 are gonna run!

[-]

jacek2023@reddit

There are many posts on reddit and github about AMD cards

[-]

HopePupal@reddit

yeah and i'll be making my own now that there's an R9700 under my desk. but i'm just saying: you can only reliably find Nvidia cards for that kind of testing. otherwise you're going to be extrapolating from forum posts that maybe kinda sorta look like your use case.

[-]

TemporalAgent7@reddit

Why are there no bechmarks for this card? It's crazy, it's been in reviewers' hands for weeks and now at retail and yet no one is running / publishing inference benchmarks.

[-]

Dave_from_the_navy@reddit

Posted elsewhere in this thread, but I'm seeing 1/3 the performance of my 4070 super on the same model on the llama.cpp backend. I'll probably make a detailed post with more scientific benchmarks later since you're right, it doesn't seem like anyone is publishing benchmarks! (To be fair, I've been fighting drivers and ReBAR problems for the past week, but I finally got up and running on SYCL via llama.cpp last night!)

[-]

TemporalAgent7@reddit

Thank you, looking forward to that.

1/3 of 4070 Super sounds abysmal, I'm hoping there's a misconfiguration because we really desperately need some competition to NVIDIA's monopoly.

[-]

Dave_from_the_navy@reddit

No misconfiguration I don't think. If I run it using OpenVINO instead of SYCL, I get a bit closer, about half the performance of the 4070 super, but I've been running into other issue with that build that I won't get into here... The latest drivers and toolkit for SYCL are essentially treating the B70 as a generic card, using the oneAPI compilers to take the generic C/C++ math and logic and translate it into hardware instructions rather than having the hand tuned kernels that Nvidia has for the 4070 super.

Also, flash attention is broken on the Xe2 architecture right now (hopefully will be fixed in the next couple months as per the llama.cpp GitHub). So that's a massive bottleneck for the ttfs!

[-]

ThisWillPass@reddit

Nda?

[-]

fallingdowndizzyvr@reddit

NDA? For a released product. No. People got this like a week ago and posted numbers. The numbers just suck. I've asked people to run different benchmarks to see if it really sucks. They don't respond. Which is not a good sign. Since if it was good, they would have.

[-]

TemporalAgent7@reddit

It's available at retail now though. Surely if the reviewers signed an NDA they're released now.

[-]

fallingdowndizzyvr@reddit

People have posted numbers. But they pretty much suck.

Here's the performance for a single user for Qwen 3.5 27B @ 8 bits from Level 1.

"Avg prompt throughput: 85.4 tokens/s, , Avg generation throughput: 13.4 tokens/s"

That PP is super slow. My Strix Halo is like 350tk/s.

[-]

overand@reddit

40% the core count and 65%of the memory bandwidth of a 3090, but 32GB rather than 24GB, and it's a new card vs \~6 year old 3090s. It's not a home run, but if it benchmarks decenty compared to a 3090, then it's a good alternative for home users. As for businesses? That's going to depend entirely on workload support, I think.

[-]

fallingdowndizzyvr@reddit

40% the core count

Core count only matters when comparing the same gen of tech from the same company. Core counts across architectures don't mean a thing.

[-]

bcredeur97@reddit

Unfortunately nvidia just has the monopoly on the software side of things, so it’s hard to consider anything else if you want to be “serious”

But this would be fun to play with.

[-]

WoodCreakSeagull@reddit

Always good to have competition. They've been growing their market share, at this rate I would love to see them release something like a 500 dollar 20GB VRAM card or similar that you could slot into an existing consumer system. Running models on vulkan/splitting tensors with RPC has a performance tradeoff but those tradeoffs for certain use cases can be tolerated if you're getting increasing performance of this class of open model.

[-]

BitterProfessional7p@reddit

I just seen it changing from in stock to out of stock. F

[-]

ea_man@reddit

Let's see if the b65 hits the $800 mark.