Best Agentic pure coding llm for 32gb ddr5 ram and 8gb vram?

Posted by LightH12@reddit | LocalLLaMA | View on Reddit | 12 comments

i'm a little lost on what model to use for Pure coding Agent, i am using LM Studio with Continue CL,
i want to move out of using Gemmini CLI, or at least use something local when my tokens run out, so please don't mention anything online
i have an i7 12650H, 32GB DDR5 RAM (Dual channel), 4060 8GB Mobile. i also want to keep using the device when running the llm since i am coding on it (expect it do run a localhost for my website and intellij so nothing major)
i've looked into Omnicoder, qwen 3.5.

i tried gemmaE4B 7b but let's say it's too dumb to even add Hi world! into an html i have in my project

Speed itself isn't an issue i am using it for casual programming, but i'd at least want it to finish a simple basic task in less than 5min (like add hello work to x.html)

so how many Billion params should i aim and what models? please leave your opinion

[-]

ttkciar@reddit

Violates Rule One: Search before asking. There is a thread specifically for this: https://old.reddit.com/r/LocalLLaMA/comments/1sknx6n/best_local_llms_apr_2026/

[-]

LightH12@reddit (OP)

I have seen it and it does not answer my question.

[-]

ttkciar@reddit

Then please ask your question there. We are trying to consolidate "which model for me?" type questions into that thread so they do not inundate the sub.

[-]

LightH12@reddit (OP)

Thank you that part wasn't clear for me.

[-]

Skyline34rGt@reddit

Gemma4 yes but not e4b but 26B-A4b model (with q4km quant) + offloading MoE layers.

or same with Qwen3.5 35b-a3b

These are smarhest for your setup.

[-]

LightH12@reddit (OP)

Trying 26b-a4b q4, or at least downloading it using ml studio. Unsure how to offload MoE layers yet, will look it up. Tyvm!

[-]

Skyline34rGt@reddit

When you load model change like here:

GPU offload max to right, uncheck 'try mmap' and at bottom MoE layers - for this test couple settings like: 30, 24 and find your best fit for your setup.

[-]

LightH12@reddit (OP)

[-]

pdycnbl@reddit

i am using gemma4 e2b 4bit and it is able to do far more complex things than hi world, i am quite surprised how usable it is. gemma4 had some issues with llama.cpp are you sure you are using latest version and instruction tuned not base version.

[-]

LightH12@reddit (OP)

Tried e4b 7b and it got confused when i asked it to edit a file and add hi costumer! To a random place in a div. It kept forgetting what to do after reading the file.

[-]

No_Block8640@reddit

Everything is shit up until Qwen 3.5 27b. And it’s too having a tendency to be shit comparing to bigger models. Your best bet to actually get work done is to go for 27b or find an api with free quota

[-]

LightH12@reddit (OP)

I discussed it with someone they also said go with 27b at q4, they recommended gemma 4. Idm offloading to ram and i mostly want to use it when tokens run out.