16gb vram users: what have you been using? Qwen3.6 27b? Gemma 31b at Q3? How has it been?

Posted by Adventurous-Gold6413@reddit | LocalLLaMA | View on Reddit | 11 comments

Do you guys use q3 to fit it in vram? Or have you had bad results?

I had luck fitting qwen3.5 27b in my 16gb vram with turboquant with 80ctx with the IQ_4XS quant.

But now the hidden size of qwen3.6 is larger(so iQ4_XS is 15.4gb rather than 14.7) :( which makes me upset. I had to use Q3_K_XL version for qwen3.6 27b and while it worked amazingly for openclaw chat, like 10%of the time it couldn’t make the correct tool calls or would not write proper formatting of cron jobs. Causing an error.

I am considering trying Gemma 4 31b at Q3 is it even worth it?

(Gemma 26ba4b has been good chatting wise but sucked for other use cases like Reddit summaries. Etc)