Gemma4-31B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.

Posted by JLeonsarmiento@reddit | LocalLLaMA | View on Reddit | 8 comments

Just dropped another 3&5 mixed quant for the RAM Poor Base-model-only Mac users that want to try Gemma4 top of the line LLM.

6gb smaller that the other 3bit-mlx out there and 25% faster.

Thicc and dense 13 GB of pure LLM sweetness from Google for the desperate that don't care for vision. (just use something faster and equally good, like tiny Qwen3.5-2B)

Ideal if:

Recommended Inference Parameters

For the best performance, use the following standardized sampling configuration across all use cases:

Parameter Value
temperature 1.0
top_p 0.95
top_k 64
min_p 0.05
repeat_penalty 1.05

LM Studio — Reasoning Section Parsing

To enable thinking/reasoning output parsing:

Add to ninja template:

{%- set enable_thinking = true %}