PermanentLiminality

Is speculative decoding available with the Qwen 3.5 series?

Posted by PermanentLiminality@reddit | LocalLLaMA | View on Reddit | 8 comments
How can we run Qwen3-omni-30b-a3b?

Posted by PermanentLiminality@reddit | LocalLLaMA | View on Reddit | 45 comments
CPU only performance king Qwen3:32b-q4_K_M. No GPU required for usable speed.

Posted by PermanentLiminality@reddit | LocalLLaMA | View on Reddit | 24 comments
Poorman's VRAM or how to run Llama 3.1 8B Q8 at 35 tk/s for $40

Posted by PermanentLiminality@reddit | LocalLLaMA | View on Reddit | 26 comments