Info: Nvidia Cuda 13.3 landed
Posted by parrot42@reddit | LocalLLaMA | View on Reddit | 27 comments
Anybody already tried llama.cpp with 13.3?
Posted by parrot42@reddit | LocalLLaMA | View on Reddit | 27 comments
Anybody already tried llama.cpp with 13.3?
LinkSea8324@reddit
dtdisapointingresult@reddit
I'm no expert on this but it seems to me none of these help 99.9% of users on this sub?
FP4 improvements are for Blackwell Ultra which none of us have, and TF32 are some weird type none of us use.
Is there any benefit for your average Blackwell consumer GPU user?
Dany0@reddit
Helllll yeah gotta go test this. CUDA updates historically gave much more speedup than lately, if this is confirmed it'd be lovely
nmrk@reddit
Oh nice! Drivers and CUDA updated automatically in Proxmox, running fine.
ilintar@reddit
Yeah, the bug from 13.2 is finally fixed.
laginimaineb@reddit
Awesome :)
FerLuisxd@reddit
Let's go!!
Thireus@reddit
Did they solve the iq*_s quantization issues?
parrot42@reddit (OP)
I tried `test-backend-ops test -o MUL_MAT_ID -b CUDA0` with b9357 and cuda 13.3. Now there are no iq errors anymore!
mr_Owner@reddit
Is it faster? 😁
parrot42@reddit (OP)
I have no idea, but it works. I am stress-testing it by installing supabase for honcho for hermes using opencode and qwen and it is doing good.
a_beautiful_rhind@reddit
Nothing for my 3090s in it, most likely.
jacek2023@reddit
what does it mean?
a_beautiful_rhind@reddit
Probably no speedups for older gpu on this update.
No_Afternoon_4260@reddit
The 30 what? /S
Freonr2@reddit
torchao have bf16 stochastic rounding on sm12x yet?
giveen@reddit
Ill wait a few weeks.
kivaougu@reddit
Hopefully this has had better QA than 13.2
vladlearns@reddit
they definitely need to hire more good QAs
Succubus-Empress@reddit
So ai
lowlifecat@reddit
Thank you. anything good in the update? i mean any update is a good update but is there a *good* update?
parrot42@reddit (OP)
As I am not understanding the release notes, I told opencode/qwen to do a nvidia-smi and read the notes and it told me that cuBLAS is 5% faster, TF32 is 27% faster on Blackwell and it could unlock tile based rendering, when implemented into llama.cpp.
So I think it is a good update, but what do I know?
parrot42@reddit (OP)
Just downloaded and installed cuda 13.3 with driver 610.43.02
Much smoother setup under trixie with a backported 7.0 kernel than 12.2.1
Recompiled llama.cpp and everything seems to work (but I just tested with 5 messages to opencode).
Late_Scarcity3455@reddit
Seems like my alias to compile with GCC 15 will not be deleted for now.
freehuntx@reddit
i love my 10gb containers just because of cuda... vulkan is ~500mb
Velocita84@reddit
Believe some guy from nvidia said in a llama.cpp issue that it should fix whatever problems 13.2 had with compiling llama.cpp
Annual-Act-3614@reddit
Ah, great, thank you very much for the information. So, like you, I would like to know if anyone has already tried it and what they think about it compared to previous versions.