Local coding models limit

Posted by Blues520@reddit | LocalLLaMA | View on Reddit | 17 comments

I've have dual 3090s and have been running 32b coding models for a while now with Roo/Cline. While they are useful, I only found them helpful for basic to medium level tasks. They can start coding nonsense quite easily and have to be reigned in with a watchful eye. This takes a lot of energy and focus as well, so your coding style changes to accommodate this. For well defined low complexity tasks, they are good, but beyond that I found that they can't keep up.

The next level up would be to add another 48GB VRAM but at that power consumption the intelligence level is not necessary worth it. I'd be interested to know your experience if you're running coding models at around 96GB.

The hosted SOTA models can handle high complexity tasks and especially design, while still prone to hallucination. I often use chatgpt to discuss design and architecture which is fine because I'm not sharing much implementation details or IP. Privacy is the main reason that I'm running local. I don't feel comfortable just handing out my code and IP to these companies. So I'm stuck running 32b models that can help with basic tasks or having to add more VRAM, but I'm not sure if the returns are worth it unless it means running much larger models, and at that point the power consumption and cooling becomes a major factor. Would love to hear your thoughts and experiences on this.