What's the smallest reasonable quant for coding?

Posted by Real_Ebb_7417@reddit | LocalLLaMA | View on Reddit | 5 comments

So this is something that's hard for me to fully understand. I've been playing with many different coding models and quants recently and in one-shot tests it often happens that a smaller quant of the same model does better than a bigger one (eg. Q3 vs Q4). I know that in a one-shot test, it's just a luck factor, but it shows that a smaller quant can also be "good enough".

So I'm thinking about a tradeoff between a better model with lower quant or a worse mode with a bigger quant. I know that it also depends on a specific usecase usually, but let's generalize it. As an example, I can run Qwen3.5 27b in Q6 (and this model is enough for almost anything), but yesterday I also briefly tested MiniMax M2.7 in Q3_XXS and it still gave me a nice speed + it was actually doing pretty well. However, I also want to try some Q2 version, because Q3 doesn't leave me much space for kv cache. And so, in this case, I know that Qwen is good enough and not worth switching to MiniMax probably, but that's not the point. I rather wonder - what quant is usually the smallest one that makes it usable at coding? Q3 with MiniMax gave me pretty neat results, but what about Q2? Or even Q1? (I always considered Q1 unusable for almost anything, but maybe I'm wrong).

I'm also aware that it depends on a model and quantization method, BUT as a general thing - what quant is usually the smallest reasonable option for coding? And what is the tradeoff? (eg. MiniMax in Q3 as I said is doing pretty well for me, but what am I actually losing compared to running eg. Q4, which is usually considered the best go-to, if you don't have the hardware, but still want quality)