Benchmarking small models at 4bit quants on Apple Silicon with mlx-lm

Posted by ironwroth@reddit | LocalLLaMA | View on Reddit | 16 comments

I ran a bunch of small models at 4bit quants through a few benchmarks locally on my MacBook using `mlx-lm.evaluate`. Figured I would share in case anyone else finds it interesting or helpful!

System Info: Apple M4 Pro, 48gb RAM, 20 core GPU, 14 core CPU