What is LLMFit Smoking? Can M1 Max run anything decently enough for agentic coding?

Posted by GoodhartMusic@reddit | LocalLLaMA | View on Reddit | 2 comments

What is LLMFit Smoking? Can M1 Max run anything decently enough for agentic coding?

As you can see in this analysis, LLMfit estimated 85 tokens per second with a 64B model. When i tried, I got 9t/s. :'( I'm pretty extremely new to local inference and wonder if an m1 max can realistically take advantage of that in a meaningful way, even if a substantial process takes hours?