I found a perfect coder model for my RTX4090+64GB RAM
Posted by srigi@reddit | LocalLLaMA | View on Reddit | 92 comments
Disappointed with vanilla Qwen3-coder-30B-A3B, I browsed models at mradermacher. I had a good experience with YOYO models in the past. I stumbled upon **mradermacher/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-i1-GGUF**.
First, I was a little worried that **42B** won't fit, and offloading MoEs to CPU will result in poor perf. But thankfully, I was wrong.
Somehow this model consumed only about 8GB with `--cpu-moe` (keep all Mixture of Experts weights on the CPU) and Q4_K_M, and 32k ctx. So I tuned llama.cpp invocation to fully occupy 24GB of RTX 4090 and put the rest into the CPU/RAM:
```bash
llama-server --model Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III.i1-Q4_K_M.gguf \
--ctx-size 131072 \
--flash-attn on \
--jinja \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--batch-size 1024 \
--ubatch-size 512 \
--n-cpu-moe 28 \
--n-gpu-layers 99 \
--repeat-last-n 192 \
--repeat-penalty 1.05 \
--threads 16 \
--host 0.0.0.0 \
--port 8080 \
--api-key secret
```
With these settings, it eats 23400MB of VRAM and 30GB of RAM. It processes the RooCode's system prompt (around 16k tokens) at around 10s and generates at 44tk/s. With 100k context window.
And the best thing - the RooCode tool-calling is very reliable (vanilla Qwen3-coder failed at this horribly). This model can really code and is fast on a single RTX 4090!
Here is a 1 minute demo of adding a small code-change to medium sized [code-base](https://github.com/srigi/type-graphql):

92 Comments
Tot_hits@reddit
DeerWoodStudios@reddit
smugself@reddit
DeerWoodStudios@reddit
smugself@reddit
DeerWoodStudios@reddit
smugself@reddit
lemondrops9@reddit
milkipedia@reddit
Blizado@reddit
StateSame5557@reddit
randomqhacker@reddit
lemon07r@reddit
Blizado@reddit
lemon07r@reddit
Blizado@reddit
lemon07r@reddit
Blizado@reddit
StateSame5557@reddit
Hot_Turnip_3309@reddit
Ummite69@reddit
srigi@reddit (OP)
Ummite69@reddit
social_tech_10@reddit
Holiday_Purpose_3166@reddit
redblood252@reddit
jacek2023@reddit
Blizado@reddit
LilPsychoPanda@reddit
Blizado@reddit
Kyla_3049@reddit
Blizado@reddit
Blizado@reddit
srigi@reddit (OP)
Blizado@reddit
srigi@reddit (OP)
Blizado@reddit
usernameplshere@reddit
srigi@reddit (OP)
usernameplshere@reddit
Glittering-Call8746@reddit
billy_booboo@reddit
tomakorea@reddit
AppearanceHeavy6724@reddit
ScoreUnique@reddit
ArtfulGenie69@reddit
Blizado@reddit
lemon07r@reddit
tomakorea@reddit
srigi@reddit (OP)
NoFudge4700@reddit
srigi@reddit (OP)
dinerburgeryum@reddit
JEs4@reddit
ElectronSpiderwort@reddit
Ok_Top9254@reddit
stuckinmotion@reddit
MrMisterShin@reddit
MisterBlackStar@reddit
MrMisterShin@reddit
see_spot_ruminate@reddit
MrMisterShin@reddit
see_spot_ruminate@reddit
GreenGreasyGreasels@reddit
randomqhacker@reddit
notlongnot@reddit
nmkd@reddit
Miserable-Dare5090@reddit
BumbleSlob@reddit
noctrex@reddit
somethingdangerzone@reddit
perkia@reddit
coding_workflow@reddit
srigi@reddit (OP)
MrMisterShin@reddit
cleverusernametry@reddit
AutomaticDriver5882@reddit
srigi@reddit (OP)
AutomaticDriver5882@reddit
lumos675@reddit
k0setes@reddit
InvertedVantage@reddit
srigi@reddit (OP)
ikmalsaid@reddit
Blizado@reddit
LagOps91@reddit
srigi@reddit (OP)
LagOps91@reddit
false79@reddit
Easy_Kitchen7819@reddit
NoFudge4700@reddit
Brave-Hold-9389@reddit