Mac Mini M4 (16GB) Benchmark - oMLX & Gemma 4

Posted by pepediaz130@reddit | LocalLLaMA | View on Reddit | 8 comments

# Mac Mini M4 (16GB) Benchmark - oMLX & Gemma 4

Hi everyone! Just finished an exhaustive benchmark on the new **Mac Mini M4 (16GB RAM)** using **oMLX** as the inference engine. I was specifically looking for the "sweet spot" between reasoning capability and performance/stability.

Here are the results for **Gemma-4-E4B-it** in both 4-bit and 8-bit quantizations:

### 📊 Performance Comparison (oMLX + M4)

| Metric | Gemma-4-E4B (4-bit) | Gemma-4-E4B (8-bit) |
| :--- | :--- | :--- |
| **Model Size** | 5.10 GB | 8.77 GB |
| **Prefill Speed** | ~350+ tok/s | ~259 tok/s |
| **Generation Speed** | **28.0 tok/s** | **16.8 tok/s** |
| **TTFT** | 0.31s | 0.46s |
| **RAM Free (approx)** | ~10 GB | ~6 GB |
| **Stability** | Rock solid | Solid (Tight fit for large contexts) |

### 🧠 Reasoning & Quality
*   **8-bit:** Significantly better at complex physics problems and logical nuances. Handled the Twin Paradox calculation perfectly and detected subtle traps in logical riddles.
*   **4-bit:** Very fast, but showed slight degradation in complex reasoning steps (still very capable for general tasks/coding).

### 🚀 The oMLX Advantage
The **Paged SSD KV Caching** in oMLX is a game changer for 16GB Macs. Even when the 8-bit model takes up over half the RAM, oMLX swaps old context to the SSD, allowing for massive 32k context windows without hitting the dreaded Metal OOM.

### ❌ 26B Models on 16GB?
I tried forcing **Gemma-4-26B (MXFP4/4-bit)**. 
*   **Result:** FAIL. Even with `--max-model-memory disabled`, it hits the Metal buffer limit immediately (`Insufficient Memory`). 16GB is just not enough for 26B parameters in high precision.

### ❓ Question for the community:
Given these results, **what is the best model you've found for the Mac Mini M4 with 16GB RAM in mid-2026?**

Are there any 10B-14B models that strike a better balance than Gemma 4 E4B? Has anyone successfully run a 20B+ model without massive swapping or stability issues?

[-]

nosodala@reddit

Great results! Have you ever try to use Gemma 4 on 16g Mac Mini as the base model of Openclaw?

[-]

pepediaz130@reddit (OP)

Yes, but too lazy, I use right now Google Flash 3 and cannot compare, prefer flash.

[-]

9kSs@reddit

How are you getting TTFT off 0.31s and 0.46s? I don’t see it in your posted benchmarks

[-]

code_vansh@reddit

Any idea on what could happen in a blank state mac mini for 26B one? I am thinking of hooking up my openclaw setup with gemma 4 as primary… need recommendations…

[-]

Even on a 'blank state' macOS with everything closed, the 26B model in MXFP4/4-bit is just too much for 16GB. Metal will likely throw an 'Insufficient Memory' error during the graph allocation. For an OpenClaw setup on this specific machine, I highly recommend Gemma 4 E4B (8-bit) as your primary; it's the perfect balance for logic-heavy tasks without making the system crawl.

[-]

CATLLM@reddit

Try it at 32k context filled

[-]

pepediaz130@reddit (OP)

I've already pushed it! With oMLX's Paged SSD KV Cache, it handles 32k context on the 8-bit model without crashing, although you start to feel the disk I/O latency on the prefill once it exceeds the physical RAM. For 16GB, it’s the only way to keep the reasoning quality of the 8-bit version while having a usable context window.