Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant
Posted by ReasonableRefuse4996@reddit | LocalLLaMA | View on Reddit | 47 comments
I'm a master's student in Germany and I got obsessed with one question:
can you run a model that's "too big" for your hardware?
After weeks of experimenting I combined three techniques — lazy MoE
expert loading, TurboQuant KV compression, and SSD streaming — into
a working system.
Here's what it looks like running on my Intel UHD 620 laptop with
8GB RAM and zero GPU...
GitHub: [https://github.com/patilyashvardhan2002-byte/lazy-moe](https://github.com/patilyashvardhan2002-byte/lazy-moe)
Would love feedback from this community!
47 Comments
kymigreg@reddit
z_latent@reddit
Party-Special-5177@reddit
z_latent@reddit
ReasonableRefuse4996@reddit (OP)
Party-Special-5177@reddit
ReasonableRefuse4996@reddit (OP)
Song-Historical@reddit
z_latent@reddit
Hougasej@reddit
z_latent@reddit
TheRealMasonMac@reddit
ReasonableRefuse4996@reddit (OP)
waruby@reddit
justan0therusername1@reddit
hesperaux@reddit
Ok_Weakness_5253@reddit
hesperaux@reddit
Ok_Weakness_5253@reddit
No-Anchovies@reddit
BigJay125@reddit
aero-spike@reddit
mwallace0569@reddit
KarenBoof@reddit
aero-spike@reddit
xeeff@reddit
bapuc@reddit
MrEU1@reddit
Everlier@reddit
dark_bits@reddit
ReasonableRefuse4996@reddit (OP)
PhilosophyforOne@reddit
fugogugo@reddit
ReasonableRefuse4996@reddit (OP)
xdriver897@reddit
ReasonableRefuse4996@reddit (OP)
Any-Construction6686@reddit
ForsookComparison@reddit
JayPSec@reddit
SirBraxton@reddit
Chromix_@reddit
z_latent@reddit
Dany0@reddit
VoiceApprehensive893@reddit
PitchPleasant338@reddit
mxforest@reddit
fuscaDeValfenda@reddit