milpster

MTP is nice and all, but what about PP speeds?

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 31 comments
So a nearby lightningstorm just crashed all my eGPUs

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 49 comments
How to configure Self speculative decoding properly

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 6 comments
How do i specify which gpu to use for kv cache? How to offload expert tensors to specific gpu?

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 4 comments
QWEN Cli websearch tool without remote api

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 3 comments
how to configure self speculative decoding properly?

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 4 comments
Where to compare quants for different llms?

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 0 comments
how to run qwen-code cli locally and skip the welcome screen

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 5 comments