Llama 405B running locally!

Posted by ifioravanti@reddit | LocalLLaMA | View on Reddit | 63 comments

https://preview.redd.it/foqiuzj0ezod1.png?width=3440&format=png&auto=webp&s=602c1dd1c694eb3106331d0cb1fb238873c269c2 https://preview.redd.it/wdp2aw91ezod1.png?width=2008&format=png&auto=webp&s=e4e24938e60fc30e15c40a74ce8f632ab9d68d8e Here Llama 405B running on Mac Studio M2 Ultra + Macbook Pro M3 Max! 2.5 tokens/sec but I'm sure it will improve over time. Powered by Exo: [https://github.com/exo-explore](https://github.com/exo-explore) and Apple MLX as backend engine here. An important trick from Apple MLX creato in person: u/awnihannun Set these on all machines involved in the Exo network: sudo sysctl iogpu.wired\_lwm\_mb=400000 sudo sysctl iogpu.wired\_limit\_mb=180000