milpster
-
MTP is nice and all, but what about PP speeds?
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 31 comments
-
So a nearby lightningstorm just crashed all my eGPUs
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 49 comments
-
How to configure Self speculative decoding properly
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 6 comments
-
How do i specify which gpu to use for kv cache? How to offload expert tensors to specific gpu?
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 4 comments
-
QWEN Cli websearch tool without remote api
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 3 comments
-
how to configure self speculative decoding properly?
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 4 comments
-
Where to compare quants for different llms?
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 0 comments
-
how to run qwen-code cli locally and skip the welcome screen
Posted by milpster@reddit | LocalLLaMA | View on Reddit | 5 comments