Request: Someone with an M4 Macbook Pro Max 64GB

Posted by NEEDMOREVRAM@reddit | LocalLLaMA | View on Reddit | 20 comments

I know this thread is going to get downvoted to hell and back...

I'm trying to decide between the Macbook Pro 48GB and 64GB model.

If you have an M4 Macbook Pro Max with 64GB, can you download the 50GB Q5_K_M model: https://huggingface.co/mradermacher/Llama-3.1-Nemotron-70B-Instruct-HF-i1-GGUF

And let me know what your token and time-to-first token speed are? And can you have a ~8000 token conversation with it to see just how quickly it slows down?

If I could run the Nemotron Q5_K_M quant on a Macbook Pro at even ~4 tokens per second---there would be no reason to spin up the noisy and electricity guzzling AI server in the home office.

Thanks and I give you good karma thoughts for taking the time from your busy day to help out.