llama.cpp: owners of old GPUs wanted for performance testing
Posted by Remove_Ayys@reddit | LocalLLaMA | View on Reddit | 93 comments
I created [a pull request that refactors and optimizes the llama.cpp IQ CUDA kernels](https://github.com/ggerganov/llama.cpp/pull/8215) for generating tokens. These kernels use the `__dp4a` instruction (per-byte integer dot product) which is only available on NVIDIA GPUs starting with compute capability 6.1. Older GPUs are supported via a workaround that does the same calculation doing other instructions. However, during testing it turned out that (on modern GPUs) this workaround is faster than the kernels that are currently being used on master for old GPUs for legacy quants and k-quants. So I changed the default for old GPUs to the `__dp4a` workaround.
However, I don't actually own any old GPUs that I could use for performance testing. So I'm asking for people that have such GPUs to report how the PR compares against master. Relevant GPUs are P100s or Maxwell or older. Relevant models are legacy quants and k-quants. If possible, please run the `llama-bench` utility to obtain the results.
93 Comments
Robert__Sinclair@reddit
fish312@reddit
Robert__Sinclair@reddit
henk717@reddit
Robert__Sinclair@reddit
fish312@reddit
Robert__Sinclair@reddit
satireplusplus@reddit
Robert__Sinclair@reddit
satireplusplus@reddit
Robert__Sinclair@reddit
SystemErrorMessage@reddit
timschwartz@reddit
SystemErrorMessage@reddit
fallingdowndizzyvr@reddit
SystemErrorMessage@reddit
fallingdowndizzyvr@reddit
SystemErrorMessage@reddit
fallingdowndizzyvr@reddit
satireplusplus@reddit
daHaus@reddit
fallingdowndizzyvr@reddit
satireplusplus@reddit
SiEgE-F1@reddit
Robert__Sinclair@reddit
SiEgE-F1@reddit
Robert__Sinclair@reddit
SiEgE-F1@reddit
Robert__Sinclair@reddit
MDSExpro@reddit
fallingdowndizzyvr@reddit
MDSExpro@reddit
fallingdowndizzyvr@reddit
MDSExpro@reddit
fallingdowndizzyvr@reddit
MDSExpro@reddit
fallingdowndizzyvr@reddit
MDSExpro@reddit
fallingdowndizzyvr@reddit
MDSExpro@reddit
Remove_Ayys@reddit (OP)
fallingdowndizzyvr@reddit
Robert__Sinclair@reddit
CanineAssBandit@reddit
smcnally@reddit
candre23@reddit
Remove_Ayys@reddit (OP)
smcnally@reddit
candre23@reddit
AdamDhahabi@reddit
Remove_Ayys@reddit (OP)
AdamDhahabi@reddit
GamerGateFan@reddit
AdamDhahabi@reddit
GamerGateFan@reddit
DeltaSqueezer@reddit
compilebunny@reddit
Swoopley@reddit
Distinct-Target7503@reddit
Swoopley@reddit
Remove_Ayys@reddit (OP)
GG-Irelia@reddit
ankurkaul17@reddit
Remove_Ayys@reddit (OP)
StarfieldAssistant@reddit
ViennaFox@reddit
amaz0n_com@reddit
kryptkpr@reddit
Remove_Ayys@reddit (OP)
kryptkpr@reddit
Wooden-Potential2226@reddit
kryptkpr@reddit
Wooden-Potential2226@reddit
pmp22@reddit
harrro@reddit
LPN64@reddit
SystemErrorMessage@reddit
Fun_Tangerine_1086@reddit
Remove_Ayys@reddit (OP)
DeltaSqueezer@reddit
kryptkpr@reddit
DeltaSqueezer@reddit
kryptkpr@reddit
DeltaSqueezer@reddit
Fusseldieb@reddit
user4772842289472@reddit
Remove_Ayys@reddit (OP)
imrlyslshbrd@reddit
desexmachina@reddit
a_beautiful_rhind@reddit
Remove_Ayys@reddit (OP)
qnixsynapse@reddit
kristaller486@reddit