LLM speed t/s

Posted by Lost-Health-8675@reddit | LocalLLaMA | View on Reddit | 44 comments

All I see is "it gives me **/s bla bla bla" all together with q4, q3... even when chatting with qwen3. 6 other day (q8) and we were chating about best llama. cpp command for my use case he suggested to go with q4 for better speeds (it runs with over 40t/s most of the times)

What would I like to know, are you really trading knowledge and reliability for speed?

I would always rather have him work 2x longer to have better output than trying again and debbuging - which with lower quants adds up to more time than q8 to make its thing in first or second try