First Intel B580 inference speed test

[-]

MrTubby1@reddit

This is for stable diffusion. It won't accurately reflect LLM performance.

[-]

segmond@reddit

yup, looks like the RTX 3060 will still be a better buy.

[-]

fallingdowndizzyvr@reddit

As discussed in this thread, the 3060 is about the same speed. But considering that the 3060 is well Nvidia and can also run video gen that the B250 can't, it is a better buy.

https://www.reddit.com/r/LocalLLaMA/comments/1hf98oy/someone_posted_some_numbers_for_llm_on_the_intel/

[-]

Monkeylashes@reddit

Nvidia is winner in gaming front as well, due to a ton of features like ray tracing, slash, dlss...

[-]

fallingdowndizzyvr@reddit

No. It's not. Look at the benchmarks. The B250 demolishes the 3060 and even soundly beats the 4060. The Nvidia cards that are competitors. Even when you factor in ray tracing and XESS.

[-]

you are right on raw performance but my point stands. Nvidia has a lot tricks in their software to push the cards over the edge. They don't just rely on raw performance. And if you're including the 40 series cards in your comparison, then there is frame generation too, which almost doubles the perceived performance. It isn't so simple a comparison as you're making out to be.

[-]

Cyber-exe@reddit

The 4060 is suffocating on low VRAM, the 3060 even at 12gb is earlier gen tensor cores that barely do half of what the 4060 does. The 40 series lacks the AI TOPs to use DLSS4 that the 50 series is getting so there's no new tricks in the pipeline for the 3060. Nvidia maybe could put DLSS4 on the 3090 and 4070 TS and above but the lower GPU's within those gens will be behind the entire RTX 50 lineup for AI TOPs

[-]

fallingdowndizzyvr@reddit

Nvidia has a lot tricks in their software to push the cards over the edge. They don't just rely on raw performance.

And Intel has those tricks too. XESS is no slouch.

[-]

fallingdowndizzyvr@reddit

As discussed in the thread a month ago, the 3060 is about the same speed. But considering that the 3060 is well Nvidia and can also run video gen that the B250 can't, it is a better buy.

Here are the numbers for the 3060. Compare the to the B250 numbers I posted in my other response.

| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg128 | 36.70 ± 0.08 |
| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg256 | 36.20 ± 0.07 |
| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg512 | 35.39 ± 0.03 |

[-]

twnznz@reddit

It would be good to know which optimisations are in use. E.g., Flash-Attention.

[-]

fallingdowndizzyvr@reddit

Here's the numbers from someone that ran llama.cpp on their B250 from a month ago.

| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg128 | 35.89 ± 0.11 |
| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg256 | 35.75 ± 0.12 |
| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg512 | 35.45 ± 0.14 |

[-]

pyr0kid@reddit

you realize stable diffusion literally isnt the same software?

[-]

getmevodka@reddit

so my two 3090 are good still 😬👍

[-]

fallingdowndizzyvr@reddit

Hardly first. I posted a thread with numbers that someone made with llama.cpp on their B250 a month ago.

"

| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg128 | 35.89 ± 0.11 |

| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg256 | 35.75 ± 0.12 |

| qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | Vulkan,RPC | 99 | tg512 | 35.45 ± 0.14 |

"

[-]

cchung261@reddit

That seems a little slow.

[-]

CystralSkye@reddit

How are the amd cards tested here? They look disproportionately slow, is it rocm on linux?

[-]

Finguili@reddit

Most likely Windows. I’m getting 8.66/min on a 6700 XT on Linux—which is still rather slow when you compare it to nvidia, but over 2x faster than what is listed here.

[-]

CystralSkye@reddit

Would also very much depend on the model used, and parameters, wouldn't it?

I have a 6700xt and a 4070, and the difference in comfy ui is more like 2x instead of 8x as this chart suggests.

[-]

Finguili@reddit

Yes, I used the 1.5 model and 512x512 resolution with 50 steps on an image, but had to assume Euler as the sampler. Though, the article does say they used the DirectML fork of A1111, so it’s definitely not using ROCM on Linux.

[-]

CystralSkye@reddit

Yea I've seen this quite often.

Like ollama with rocm on windows does quite well on the 6700xt. 80 toks on phi3.