Getting 5.3 t/s with 70B and a P40 @ IQ2S 4k context. Anyone else get the same?

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 11 comments

Just wanna confirm if what I am getting is abnormally slow. Im getting 32 t/s with 8b Q8 which seems to be expected. Can anyone out there with a P40 run llama 3 70B iQ2S @ 4096 context one GPU and let me know if you are getting the same speed? It seems like I should be getting like 8-9 t/s according to some benchmarks I am seeing online..