Mac Mini M4 (16GB) Benchmark - oMLX & Gemma 4

Posted by pepediaz130@reddit | LocalLLaMA | View on Reddit | 8 comments

# Mac Mini M4 (16GB) Benchmark - oMLX & Gemma 4

Hi everyone! Just finished an exhaustive benchmark on the new **Mac Mini M4 (16GB RAM)** using **oMLX** as the inference engine. I was specifically looking for the "sweet spot" between reasoning capability and performance/stability.

Here are the results for **Gemma-4-E4B-it** in both 4-bit and 8-bit quantizations:

### 📊 Performance Comparison (oMLX + M4)

| Metric | Gemma-4-E4B (4-bit) | Gemma-4-E4B (8-bit) |
| :--- | :--- | :--- |
| **Model Size** | 5.10 GB | 8.77 GB |
| **Prefill Speed** | ~350+ tok/s | ~259 tok/s |
| **Generation Speed** | **28.0 tok/s** | **16.8 tok/s** |
| **TTFT** | 0.31s | 0.46s |
| **RAM Free (approx)** | ~10 GB | ~6 GB |
| **Stability** | Rock solid | Solid (Tight fit for large contexts) |

### đź§  Reasoning & Quality
*   **8-bit:** Significantly better at complex physics problems and logical nuances. Handled the Twin Paradox calculation perfectly and detected subtle traps in logical riddles.
*   **4-bit:** Very fast, but showed slight degradation in complex reasoning steps (still very capable for general tasks/coding).

### 🚀 The oMLX Advantage
The **Paged SSD KV Caching** in oMLX is a game changer for 16GB Macs. Even when the 8-bit model takes up over half the RAM, oMLX swaps old context to the SSD, allowing for massive 32k context windows without hitting the dreaded Metal OOM.

### ❌ 26B Models on 16GB?
I tried forcing **Gemma-4-26B (MXFP4/4-bit)**. 
*   **Result:** FAIL. Even with `--max-model-memory disabled`, it hits the Metal buffer limit immediately (`Insufficient Memory`). 16GB is just not enough for 26B parameters in high precision.

### âť“ Question for the community:
Given these results, **what is the best model you've found for the Mac Mini M4 with 16GB RAM in mid-2026?**

Are there any 10B-14B models that strike a better balance than Gemma 4 E4B? Has anyone successfully run a 20B+ model without massive swapping or stability issues?