what best coding model at 4B or 8B parameters?

Posted by Felix_455-788@reddit | LocalLLaMA | View on Reddit | 31 comments

yea i know the title looks so stupid, yes i done searches, i searched google, huggingface, youtube, i even tested some via LM Studio, but due to my low-end VRAM (GTX 1050 4G Vram) i cant fit more than 4B or 1B into it, i have about 20G RAM + 15G Pagefile, i didnt have the chance to test out Qwen 3.6 35B, my maximum Quant was Q3_XXS, but this and what comes after it (Q2, Q1) will drop plenty of information, and would make the model way more stupider, so i thought about 8B and maybe 14B, but most of my searches all i saw just numbers and benchmarks, so i thought i could just get here and ask people who done experience by themselves and saw results

[-]

netherreddit@reddit

Qwen 3.5 has 2B, 4B, 9b, it's the best for most tasks https://huggingface.co/collections/Qwen/qwen35

[-]

Substantial_Room4275@reddit

Does this also count for wanting to use it as autocomplete in Continue in VS Code? Or should I continue using Qwen2.5 coder 7B for that?

[-]

netherreddit@reddit

2.5 is very old, 3.5 4b is probably better quality, and is certainly faster, definitely try it

[-]

Substantial_Room4275@reddit

Thank you, I don't understand it anymore. I keep asking every AI model I know about information on the best FIM model but they keep constantly recommending Qwen 2.5 3B. Should I use specific coding variants or can I use any small enough model?

[-]

netherreddit@reddit

I don't know what the best FIM model is

[-]

Felix_455-788@reddit (OP)

I have 4B thinking And 9B, are these enough if I connected them to opencode or roo code? Or they'll be extremely slow?

[-]

havnar-@reddit

Use pi-core it’s way better for local llms, especially for small ones.

[-]

deaday@reddit

You mean this one? https://pi.dev

[-]

havnar-@reddit

Yes

[-]

netherreddit@reddit

I haven't tried models that small for coding harnesses, but I'd guess they'll do fine if you give them simple, well-scoped tasks

[-]

Felix_455-788@reddit (OP)

I have other models for general tasks, they have larger or bigger parameters size, + offloading I can be patient but with general tasks and questions, when it comes to coding I want something that small and be good at the same time, I know that I will only get Decent not the best at coding with small models like 8B or 4B

[-]

DeltaSqueezer@reddit

I've just been in an AI coding frenzy the last few days. I've been using Qwen3.5-9B for speed. It's surprisingly usable given the 'small' size.

[-]

netherreddit@reddit

Yeah still Qwen 3.5, even just for coding

[-]

netherreddit@reddit

Could also try Gemma 4 E2b and E4b

[-]

ilintar@reddit

Qwen 3.5 9B for now and hopefully 3.6 soon.

[-]

ea_man@reddit

https://huggingface.co/mradermacher/OmniCoder-2-9B-i1-GGUF

[-]

Desther@reddit

Tried Gemma 2b and 4b with android studio agent tab it kept failing tool calls "no occurrences found", seems it fails to reproduce exactly what it read from your files and the system wont let it through. Wasnt very good at coding either, quicker to do it yourself.

Works for 1-shot prompts from an empty file but nothing more for me

[-]

gurilagarden@reddit

I can run bigger models, but for many daily driver tasks, including quickly modifying html/css/java/python, i still spin up qwen3.5-9b because i can trust it to accomplish most tasks quickly, within limits that you learn over time. I started out giving it tasks then checking over them with Opus4.6, and through that found the limits. It won't build you a super sexy website, but it will build you a website that doesn't have a lot of errors.

[-]

Visual-Afternoon-541@reddit

I tried a bunch of 2b and 4b models and can confirm this. Qwen 9b is really good and comes in compact sizes. it's fast and generally eloquent.

[-]

Tough_Frame4022@reddit

Qwen 3.6 35b

[-]

leo-k7v@reddit

Love 3.6 35B requires a bit upscale RAM to hold even Q4

[-]

SomeOrdinaryKangaroo@reddit

I have a 8gb ram macbook neo and qwen 3.6 works fine at q4

[-]

PattF@reddit

How? Can you post your settings? I have a 24gb Mac Pro and I’m struggling to get it to fit without lobotomizing it with q3 or lower

[-]

GeorgeSC@reddit

not on a mac but these are mine, I get pp 735.29 and tg 32.52 according to llama-bench:

--jinja -fit on -a local-model -ngl 99 
--split-mode none --main-gpu 0 --parallel 1 
-t 8 -tb 8 --flash-attn on -fa 1 
--temp 0.6 --min-p 0.0 --top-p 0.95 
--top-k 20 --keep 1024 --ctx-checkpoints 1 
--presence-penalty 0.0 --repeat-penalty 1.0 --repeat-last-n 128 
--mlock --no-mmap 
--reasoning on --reasoning-budget 512 --chat-template-kwargs "{\"enable_thinking\":true,\"preserve_thinking\":true}"     
--webui-mcp-proxy -ctk q4_0 -ctv q4_0 --fit-target 256 --fit-ctx 125000 -b 4096 -ub 1024 --n-cpu-moe 0

[-]

PattF@reddit

How much vram do you have?

[-]

GeorgeSC@reddit

4070 mobile 8GB

[-]

FatheredPuma81@reddit

Use Gemma 4 26B if you can't run Qwen3.6 35B. People need to really start asking LLMs how to setup their models well...

[-]

kichael@reddit

Been having success with bartowski/Jackrong_Qwen3.5-4B-Neo-GGUF:Q6_K I think I need to tweak some things as it does get into occasional thought loops.

[-]

maycomesinlikealion@reddit

Hi OP, so basically I have these kinds of questions all the time, so I used to grind HF and Reddit looking for the latest abliterated whatever, but basically with the current state of small models it’s negative ROI on your time to not just go with the latest ollama lineup and make things painless for yourself. It’s kind of like searching for the perfect porn video, actually

[-]

onicarps@reddit

the last part is true

[-]

DigRealistic2977@reddit

Try Nemotron 3 4B.