Best Agentic pure coding llm for 32gb ddr5 ram and 8gb vram?
Posted by LightH12@reddit | LocalLLaMA | View on Reddit | 12 comments
i'm a little lost on what model to use for Pure coding Agent, i am using LM Studio with Continue CL,
i want to move out of using Gemmini CLI, or at least use something local when my tokens run out, so please don't mention anything online
i have an i7 12650H, 32GB DDR5 RAM (Dual channel), 4060 8GB Mobile. i also want to keep using the device when running the llm since i am coding on it (expect it do run a localhost for my website and intellij so nothing major)
i've looked into Omnicoder, qwen 3.5.
i tried gemmaE4B 7b but let's say it's too dumb to even add Hi world! into an html i have in my project
Speed itself isn't an issue i am using it for casual programming, but i'd at least want it to finish a simple basic task in less than 5min (like add hello work to x.html)
so how many Billion params should i aim and what models? please leave your opinion
ttkciar@reddit
Violates Rule One: Search before asking. There is a thread specifically for this: https://old.reddit.com/r/LocalLLaMA/comments/1sknx6n/best_local_llms_apr_2026/
LightH12@reddit (OP)
I have seen it and it does not answer my question.
ttkciar@reddit
Then please ask your question there. We are trying to consolidate "which model for me?" type questions into that thread so they do not inundate the sub.
LightH12@reddit (OP)
Thank you that part wasn't clear for me.
Skyline34rGt@reddit
Gemma4 yes but not e4b but 26B-A4b model (with q4km quant) + offloading MoE layers.
or same with Qwen3.5 35b-a3b
These are smarhest for your setup.
LightH12@reddit (OP)
Trying 26b-a4b q4, or at least downloading it using ml studio. Unsure how to offload MoE layers yet, will look it up. Tyvm!
Skyline34rGt@reddit
When you load model change like here:
GPU offload max to right, uncheck 'try mmap' and at bottom MoE layers - for this test couple settings like: 30, 24 and find your best fit for your setup.
LightH12@reddit (OP)
Ty
pdycnbl@reddit
i am using gemma4 e2b 4bit and it is able to do far more complex things than hi world, i am quite surprised how usable it is. gemma4 had some issues with llama.cpp are you sure you are using latest version and instruction tuned not base version.
LightH12@reddit (OP)
Tried e4b 7b and it got confused when i asked it to edit a file and add hi costumer! To a random place in a div. It kept forgetting what to do after reading the file.
No_Block8640@reddit
Everything is shit up until Qwen 3.5 27b. And it’s too having a tendency to be shit comparing to bigger models. Your best bet to actually get work done is to go for 27b or find an api with free quota
LightH12@reddit (OP)
I discussed it with someone they also said go with 27b at q4, they recommended gemma 4. Idm offloading to ram and i mostly want to use it when tokens run out.