5090 OpenCL & Vulkan leaks

Posted by segmond@reddit | LocalLLaMA | View on Reddit | 16 comments

Ack, not crushing 4090.
https://videocardz.com/newz/nvidia-geforce-rtx-5090-appears-in-first-geekbench-opencl-vulkan-leaks

[-]

a_beautiful_rhind@reddit

Nothin gonna happen until FP4 takes off like FP8 did and neither of those are really an "llm" thing.

[-]

No_Afternoon_4260@reddit

May be not with llama.cpp, but something like a onnx might use it? Idk

[-]

a_beautiful_rhind@reddit

new kernels will use it. probably in image models first. Could be a requirement for SuperHappyFunTime Attention that gives you 4x context.

I see people didn't like this comment even though it's true. I could give a fuck about FP8 until it was needed on image models and could happily keep using 3090s for LLM tasks with a rather negligible speed penalty.

[-]

No_Afternoon_4260@reddit

I think these kind of acceleration are used in llm space when using big batch sizes, else your really aren't compute constrained anyway

[-]

a_beautiful_rhind@reddit

Unless you use FP8 context or weights you don't really use them. It also helps prompt processing in that case. Sure the extra compute helps batches but without the precision optimizations it's not so massive. Even nvidia was not showing that in marketing materials.

Hence little reason to get a 4090 over a 3090 here, etc. Let alone shell out for a 5090 and it's 8 extra gigs of ram.

On video/image models it hurts not being able to compile FP8 (and future FP4) weights in torch and getting 1/3 to 2x the speedup from that.

[-]

fallingdowndizzyvr@reddit

26-37% more performance with 33% more VRAM for a 25% price increase. IMO, that's pretty crushing it.

[-]

LunarianCultist@reddit

For a 33% power increase... Depending on how prices go, buying two 4090's or hell... Three to four 3090's might be better!

[-]

fallingdowndizzyvr@reddit

Depending on how prices go,

As prices go right now, you aren't buying 2x4090s for the price of a 5090. You are buying 1.25.

The force that pushed 4090s well above MSRP is no longer a factor. That being China. Which was hoovering up as many 4090s as it could before the ban. Since the ban went into effect, 4090s at MSRP have been the norm. It should be no different for the 5090.

[-]

kmouratidis@reddit

In the 3 European countries I lived & looked at the prices, they were typically ~2500€ or more. Only used ones go close to MSRP, which is insane.

Then again there are people selling 3090s used for mining at ~~5090 prices~~ scam prices.

[-]

TheRealGentlefox@reddit

Isn't OpenCL the more relevant benchmark here? That would put it at closer to 10%

[-]

fallingdowndizzyvr@reddit

No. Since OpenCL is pretty much obsolete. The standards group that governs OpenCL is pushing SYCL as it's replacement.

OpenCL in llama.cpp has been deprecated and they say to use the Vulkan backend instead.

[-]