Help with understanding Local LLMs

Posted by theruner83@reddit | LocalLLaMA | View on Reddit | 6 comments

hi all I have a MacBook Pro M4 pro with 24 GB of RAM and I’m looking looking to host a local model. Can someone please help me explain what the best settings would be to run a local model? I can see there’s there’s MLX and then there’s GGUF I’m hoping to run the new Qwen 3.6 27B and wondering if it’s possible to tweak settings to get it to run and fit on my laptop. Will also be helpful if someone could point me to any resources or help me at the stand the settings difference

[-]

tmvr@reddit

It's going to be challenging to get the dense 27B model with only 24GiB total RAM where 16GiB is assigned for VRAM per default. You can try the IQ4_XS, which is only 14.4GiB (15.4GB) from here:

https://huggingface.co/unsloth/Qwen3.6-27B-GGUF

Just use lower contex first like 32K (32768) and see how high you can go after that. Or you step down to one of the Q3 quants to be able to use more context.

[-]

vlad_omniforge@reddit

MLX is meant to be fully optimized for Apple Silicon but in reality GGUF sees more love from the community because it is widely compatible and as such in most of the cases it's actually performing better.

With those specs you mention you should be able to run the model you intend and I would recommend the Unsloth version: unsloth/Qwen3.6-35B-A3B-GGUF with Qwen3.6-35B-A3B-UD-Q4_K_M.gguf

Be warned that it will run kinda slow because you're sitting at the limit, as someone else suggested a smaller model would be your best bet if you want sane response times. I would personally go for the 9B one that strikes a good balance

[-]

JLeonsarmiento@reddit

what do you want to do with the model? what is the intended use?

[-]

theruner83@reddit (OP)

I want to do coding with it. I don’t expect Claude level output but I want to understand what’s the best coding model I can run locally to do meaningful work

[-]

jacek2023@reddit

You should start from tiny model like 4B, just to verify your environment. Then you can try bigger models, but 27B may be to heavy for your setup.

[-]

theruner83@reddit (OP)

Thanks for your response. I saw that unsloth released Qwen 3.6 27B with a note that it can run on 24GB ram with good inference. Hence why I decided to post to understand