Getting 5.3 t/s with 70B and a P40 @ IQ2S 4k context. Anyone else get the same?
Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 11 comments
Just wanna confirm if what I am getting is abnormally slow. Im getting 32 t/s with 8b Q8 which seems to be expected.
Can anyone out there with a P40 run llama 3 70B iQ2S @ 4096 context one GPU and let me know if you are getting the same speed? It seems like I should be getting like 8-9 t/s according to some benchmarks I am seeing online..
11 Comments
hapliniste@reddit
My_Unbiased_Opinion@reddit (OP)
MikeLPU@reddit
kryptkpr@reddit
TraditionLost7244@reddit
nero10578@reddit
Healthy-Nebula-3603@reddit
My_Unbiased_Opinion@reddit (OP)
Status_Contest39@reddit
maz_net_au@reddit
My_Unbiased_Opinion@reddit (OP)