Best model for 4090 as AI Coding Agent

Posted by Dry_Sheepherder5907@reddit | LocalLLaMA | View on Reddit | 36 comments

Good day. I am looking for best local model for coding agent. I might've missed something or some model which is not that widely used so I cam here for the help.

Currently I have following models I found useful in agentic coding via Google's turbo quant applied on llama.cpp:

I really was trying to get Qwen3 Coder Next to get a decent t/s for input and output as I thought it would be a killer but to my surprise...it sometimes makes so silly mistakes that I have to do lots of babysitting for agentic flow.

GLM 4.7 and Nemotron are the ones I really can't decide between, both have decent t/s for agentic coding and I use both to maxed context window.

The thing is that I feel there might be some model that just missed from my sight.

Any suggestions?

My Rig:
RTX 4090, 64GB 5600 MT/S ram

Thank you in advance