Inference using exo on mac + dec cluster?

Posted by EternalOptimister@reddit | LocalLLaMA | View on Reddit | 6 comments

I read on the exo lab blog that you can achieve “even higher” inference speeds using DGX spark together with m3 ultra(s) cluster. However I did not find any benchmarks. Has anyone tried this or run benchmarks themselves? Exo doesn’t only work on the ultra but also on m4 pro and m4 max and likely also on m5’s to come. I’m wondering what kind of inference speeds such clusters might realise for large SOTA MoE’s (Kimi, deepseek, …) that are currently practically impossible to run.