Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant

Posted by ReasonableRefuse4996@reddit | LocalLLaMA | View on Reddit | 47 comments

I'm a master's student in Germany and I got obsessed with one question: can you run a model that's "too big" for your hardware? After weeks of experimenting I combined three techniques — lazy MoE expert loading, TurboQuant KV compression, and SSD streaming — into a working system. Here's what it looks like running on my Intel UHD 620 laptop with 8GB RAM and zero GPU... GitHub: [https://github.com/patilyashvardhan2002-byte/lazy-moe](https://github.com/patilyashvardhan2002-byte/lazy-moe) Would love feedback from this community!