Anyone tried multi-machine LLM inference?

Posted by human-exe@reddit | LocalLLaMA | View on Reddit | 18 comments

I've stumbled upon exo-explore/exo, a LLM engine that supports multi-peer inference in self-organized p2p network. I got it running on a single node in LXC, and generally things looked good.

That sounds quite tempting; I have a homelab server, a Шindows gaming machine and a few extra nodes; that totals to 200+ GB of RAM, tens of cores, and some GPU power as well.

There are a few things that spoil the idea:

Am I missing much? Are there any reasons to run big (100+GB) LLMs at home at snail speeds? Is exo good? Is there anything like it, yet more developed and well tested? Did you try any of that, and would you advise me to try?