How to increase coding ability in smaller models?

Posted by keepthememes@reddit | LocalLLaMA | View on Reddit | 20 comments

I've been running Qwen3.5 35b APEX I Quality to code a piece of software for me through opencode. Are there any plugins/protocols I should be using to give it better coding skills? It constantly messing things up so 90% of the time spent is tracking down issues its created. Also open to using a different model. I've just found this has been the best quality/speed ratio.

System specs:

RTX 4070 12GB

RYZEN 7 5800X3D

32GB DDR4 RAM

[-]

logic_prevails@reddit

People ain’t gonna like this answer here but i plan the feature with a large model like opus then I implement with local

[-]

iportnov@reddit

Good agents really do matter. Asking LLM to write some code in chat mode is something like live-coding during interview, when you are given a pen and a piece of paper and asked to write quicksort. And then interviewer says you made a typo on line 25. With agent, when LLM can actually run code, debug it, it's totally another situation. Still, bigger model does less errors, so smaller model does many run-check-fix cycles and so spends much more tokens, but in the end it has chances to write something useful.

[-]

-dysangel-@reddit

try the 3.6 version.
don't give such a small model free reign on your code. Use it as a way to type faster, approving every edit. IMO this should be the default even on larger models until you know they can be trusted to do things well enough on their own most of the time.

[-]

keepthememes@reddit (OP)

the main issue is that I have almost no idea to how to code lol. I'm just computer savvy and took some coding classes in college.

My workflow so far (as inefficient as it may be) has been having qwen3.5 code everything it can and then troubleshooting with deepseek since it's free and unlimited.

[-]

diffore@reddit

A better approach might be using the deepseek to generate atomic plans and then let qwen implement them. One plan per chat session - it is important to keep session as small as possible for smaller models.

[-]

BringMeTheBoreWorms@reddit

Really small pieces of work are what’s needed to keep it contained. Even put those rules into your agents.md, but keep that short as well.

But the reality is you’ll only be able to delegate so much to ai before it becomes really unwieldy and unmanageable.

Just break everything down into little components that use each other, kind of like many small projects so it doesn’t get itself all tied up

[-]

NoMechanic6746@reddit

Drop context to 80k–82k max (90k is too aggressive with vision) 
Lower -b and -ub (your current 4096/1024 is too high for 8GB) 
--min-p 0.05 is usually better than 0.0 for coding/reasoning 
Try removing --parallel 2 — it can sometimes hurt performance on small VRAM

I could try something like this:

llama-server -m model.gguf \
  --mmproj mmproj-F16.gguf \
  --jinja \
  -c 81920 \
  -b 2048 \
  -ub 512 \
  -ngl 99 \
  --flash-attn \
  --n-cpu-moe 38 \
  -ctk q8_0 \
  -ctv q8_0 \
  --temp 0.7 \
  --top-p 0.95 \
  --min-p 0.05 \
  --repeat-penalty 1.1 \
  --presence-penalty 1.2 \
  --context-shift \
  --mlock \
  --no-mmap

regards, Leo.

[-]

ea_man@reddit

Try 3.6 with Qwencode, if that don't solve you gotta step up to 27B dense IQ3 which requires Linux / shut down X11.

[-]

keepthememes@reddit (OP)

Is QwenCode really that much better than OpenCode? I've been using OpenCode for months now and I'd rather not switch if it doesn't give that much of a performance increase (but am definitely open).

[-]

ea_man@reddit

QWEN models are trained with tools in XML, Qwencode agents deals with tools in XML.

Actually Opencode does quite well with new QWEN3.6 too.

[-]

keepthememes@reddit (OP)

i've had great experiences with qwen3.5 in opencode (though i did go through days of tweaks to get it to work properly) and so far with qwen3.6 everything seems to be working great with it, too

[-]

ea_man@reddit

Well Opencode does start with a 11k token prompt, so it instructs much behaviours to QWEN3.5

[-]

Own_Suspect5343@reddit

QwenCode better WITH qwen compare to opencode. Vendor know which tool call they are used in their models and 99% this tools should exist in vendor coding agent

[-]

Naiw80@reddit

I run qwen 3.6 with “openclaude”, it works fairly good… sure the model gets stuck in repetions etc at times (and that could possibly be tweaked away, I have not tweaked anything at all so far though)

I use a setup consisting of a Tesla P100 and an RTX 4070, get about 55-60 t/s and currently use a 260k context with Q_4 quantization.

[-]

keepthememes@reddit (OP)

How's your experience with multiple, non-sli gpus been?

[-]

Naiw80@reddit

I think it works just fine, I tend to push the heavy layers to the RTX 4070 to leverage its tensor cores and use the P100 as “bulk storage”.

[-]

Own_Suspect5343@reddit

I am using apex quants of 3.6 with pi agent. Without tuning my agent try to read files using cat and forgot that file already processed and try to read it multiple times. Then i write small extension which disable build-in tools and copy tools from qwen code. It works better. Now i want to optimize workflow to solve my task using small context

I can share my extension if you want

[-]

promobest247@reddit

hhh same thing with apex i mini i get 33 token /s using Rtx 4050 6gb & ram 16 gb laptop but i use pi coding agent is faster than opencode

[-]

Hot-Employ-3399@reddit

Split tasks to subtasks that can be verified

Use a lot of testing. Like a lot.

3.6 qwen > 3.5 qwen

[-]

DarkArtsMastery@reddit

You could give better coding skills to yourself, craft better prompts afterwards and thus yield better results overall in the end. Less slop you know.