Anybody running gpt-oss-120b on a MacBook Pro M4 max 128GB?

Posted by Appomattoxx@reddit | LocalLLaMA | View on Reddit | 15 comments

If you are, could you *please* let me know?

-Thank you,
thinking of getting. one, want to know if I can run that particular model, at a reasonable speed.

[-]

Badger-Purple@reddit

you can run much more than OSS120b on that computer!

[-]

Amazing_Clock5847@reddit

Is the immense context all the time applied with proper quality? 1 million is huge.

[-]

Badger-Purple@reddit

dude, in AI, a comment made 153 days ago is ancient history.

This model is outdated at this point. Qwen Next Coder is a finetuned version that does well for coding and more.

1 million context now standard in other models without rotary positional embeddings. Some even have now rotary attention!

[-]

I can also confirm it works great. I'm seeing over 60 tok/sec with Unsloth's F16 GPT OSS 120B. That said, use Qwen3 Next 80B A3B 8-bit MLX since it's better and also above 60 tok/sec on an M4 Max 128GB.

[-]

committer@reddit

How fast is the prompt processing?

[-]

Appomattoxx@reddit (OP)

Thank you! Can you say what context widows you’re using?

[-]

Due_Mouse8946@reddit

Max context.

[-]

Gregory-Wolf@reddit

Does unsloth's F16 GPT OSS 120B give actually better results than the original MXFP4 in your experience?

[-]

laerien@reddit

I think MXFP4, labeled F16. They call it "gpt-oss-120b-F16.gguf" but pretty sure you're right and it's plain MXFP4. Unsure if they mean unquantized MXFP4 or what?

[-]

Gregory-Wolf@reddit

The weights are probably the same (size in Gb is same at least), but they claim they did some fixes - template and some precision changes here and there. And as if it should be more stable and in some cases provide better results. Than't why I ask.

I have M3 Max 128Gb, and I use MXFP4. I wondered if you compared vanilla MXFP4 to unsloth's F16 and saw any difference, and that's why you switched to unsloth's.

[-]

StateSame5557@reddit

I get over 70 tok/sec with VCoder, a trained 120B by EpistemeAI

https://huggingface.co/nightmedia/VCoder-120b-1.0-qx86-hi-mlx

[-]

Appomattoxx@reddit (OP)

Thank you! I’m excited about the idea of running that model off a Mac, but I wanted to confirm it’d work, before making the purchase.