what best coding model at 4B or 8B parameters?
Posted by Felix_455-788@reddit | LocalLLaMA | View on Reddit | 31 comments
yea i know the title looks so stupid, yes i done searches, i searched google, huggingface, youtube, i even tested some via LM Studio, but due to my low-end VRAM (GTX 1050 4G Vram) i cant fit more than 4B or 1B into it, i have about 20G RAM + 15G Pagefile, i didnt have the chance to test out Qwen 3.6 35B, my maximum Quant was Q3_XXS, but this and what comes after it (Q2, Q1) will drop plenty of information, and would make the model way more stupider, so i thought about 8B and maybe 14B, but most of my searches all i saw just numbers and benchmarks, so i thought i could just get here and ask people who done experience by themselves and saw results
netherreddit@reddit
Qwen 3.5 has 2B, 4B, 9b, it's the best for most tasks https://huggingface.co/collections/Qwen/qwen35
Substantial_Room4275@reddit
Does this also count for wanting to use it as autocomplete in Continue in VS Code? Or should I continue using Qwen2.5 coder 7B for that?
netherreddit@reddit
2.5 is very old, 3.5 4b is probably better quality, and is certainly faster, definitely try it
Substantial_Room4275@reddit
Thank you, I don't understand it anymore. I keep asking every AI model I know about information on the best FIM model but they keep constantly recommending Qwen 2.5 3B. Should I use specific coding variants or can I use any small enough model?
netherreddit@reddit
I don't know what the best FIM model is
Felix_455-788@reddit (OP)
I have 4B thinking And 9B, are these enough if I connected them to opencode or roo code? Or they'll be extremely slow?
havnar-@reddit
Use pi-core it’s way better for local llms, especially for small ones.
deaday@reddit
You mean this one? https://pi.dev
havnar-@reddit
Yes
netherreddit@reddit
I haven't tried models that small for coding harnesses, but I'd guess they'll do fine if you give them simple, well-scoped tasks
Felix_455-788@reddit (OP)
I have other models for general tasks, they have larger or bigger parameters size, + offloading I can be patient but with general tasks and questions, when it comes to coding I want something that small and be good at the same time, I know that I will only get Decent not the best at coding with small models like 8B or 4B
DeltaSqueezer@reddit
I've just been in an AI coding frenzy the last few days. I've been using Qwen3.5-9B for speed. It's surprisingly usable given the 'small' size.
netherreddit@reddit
Yeah still Qwen 3.5, even just for coding
netherreddit@reddit
Could also try Gemma 4 E2b and E4b
ilintar@reddit
Qwen 3.5 9B for now and hopefully 3.6 soon.
ea_man@reddit
https://huggingface.co/mradermacher/OmniCoder-2-9B-i1-GGUF
Desther@reddit
Tried Gemma 2b and 4b with android studio agent tab it kept failing tool calls "no occurrences found", seems it fails to reproduce exactly what it read from your files and the system wont let it through. Wasnt very good at coding either, quicker to do it yourself.
Works for 1-shot prompts from an empty file but nothing more for me
gurilagarden@reddit
I can run bigger models, but for many daily driver tasks, including quickly modifying html/css/java/python, i still spin up qwen3.5-9b because i can trust it to accomplish most tasks quickly, within limits that you learn over time. I started out giving it tasks then checking over them with Opus4.6, and through that found the limits. It won't build you a super sexy website, but it will build you a website that doesn't have a lot of errors.
Visual-Afternoon-541@reddit
I tried a bunch of 2b and 4b models and can confirm this. Qwen 9b is really good and comes in compact sizes. it's fast and generally eloquent.
Tough_Frame4022@reddit
Qwen 3.6 35b
leo-k7v@reddit
Love 3.6 35B requires a bit upscale RAM to hold even Q4
SomeOrdinaryKangaroo@reddit
I have a 8gb ram macbook neo and qwen 3.6 works fine at q4
PattF@reddit
How? Can you post your settings? I have a 24gb Mac Pro and I’m struggling to get it to fit without lobotomizing it with q3 or lower
GeorgeSC@reddit
not on a mac but these are mine, I get pp 735.29 and tg 32.52 according to llama-bench:
PattF@reddit
How much vram do you have?
GeorgeSC@reddit
4070 mobile 8GB
FatheredPuma81@reddit
Use Gemma 4 26B if you can't run Qwen3.6 35B. People need to really start asking LLMs how to setup their models well...
kichael@reddit
Been having success with bartowski/Jackrong_Qwen3.5-4B-Neo-GGUF:Q6_K I think I need to tweak some things as it does get into occasional thought loops.
maycomesinlikealion@reddit
Hi OP, so basically I have these kinds of questions all the time, so I used to grind HF and Reddit looking for the latest abliterated whatever, but basically with the current state of small models it’s negative ROI on your time to not just go with the latest ollama lineup and make things painless for yourself. It’s kind of like searching for the perfect porn video, actually
onicarps@reddit
the last part is true
DigRealistic2977@reddit
Try Nemotron 3 4B.