Minimax M2.7 on Q3_K_S or Smaller Model with greater precision?

Posted by iFai1x@reddit | LocalLLaMA | View on Reddit | 8 comments

I currently am looking for models to fit into my single DGX Spark for use. I have an RTX Pro 6000 and also a 5090 as well that I'm considering using in combination if the DGX Spark is too slow, but the intent here is to play around with OpenClaw.

I've looked around for some benchmarks, but I'm assuming websites such as PinchBench are referring to full precision models and how well they were able to accomplish tasks on average.

Any tips and experiences from what others are using here for their OpenClaw setup? I've considered Minimax-m2.7, Qwen3.5-27B, Gemma 4 31B, Nemotron 3 Super 120B, and Qwen3.5-122B-A10B. All of these models I would be in Q4 (except Minimax m2.7) for the DGX Spark, or perhaps Q8 or greater on some of these models on the Pro 6000.

My confusion or concern is really asking if Q3 is too aggressive of a quant to run Minimax m2.7, or if running at higher precision on a smaller model will net more consistent results in OpenClaw. Of course, reading into benchmarks only really show you a comparison on full precision.

Any help would be appreciated!