Why I cant get Qwen3 Coder Next 30B to write even simple code?

Posted by GriffinDodd@reddit | LocalLLaMA | View on Reddit | 28 comments

I'm not sure if I've set this model up wrong, or if I'm just using the wrong model for my needs.

Qwen3 Coder Next Instruct 45.5GB Q4_K_S GGUF

132k Context, Temp 0.5 - 1.0, TopK 40, TopP 0.95, Min P0.01, RepeatPenalty 1.05, PresenceP 0.5

GMKTek Evo-2 96GB Ryzen 395+ - Approx 55tps and PP 450

While it will write code that doesn't crash (Python, JS, CSS and HTML), it often fails on the actual logic of the code despite very structured and clear prompts. I've spent so much time correcting it, stopping it from introducing things I didn't ask for, sometimes even deciding to do something I've told it not to do multiple times.

I know my rig isn't a monster, but I had hoped I could get something that would put out reasonably simple functioning code for pretty small little projects.

Should I be using a different model?

[-]

mr_Owner@reddit

Did you try bartowski 's quants? I have no issues with his at q4kxl

[-]

GriffinDodd@reddit (OP)

As recommended here I managed to squeeze qwen3.5 112B a10b q4 KM onto my machine and so far it seems much better. A little slower but far more accurate

[-]

DeltaSqueezer@reddit

try removing: RepeatPenalty 1.05, PresenceP 0.5

[-]

GriffinDodd@reddit (OP)

Could you explain why so I can understand properly

[-]

jacek2023@reddit

start from the default parameters, don't try to optimize blindly

[-]

GriffinDodd@reddit (OP)

I got the Params from asking Grok what was ideal. Happy to go back and give the defaults a try though. My logic was that the community may have already dialed in finer tuning than default.

[-]

Inevitable_Mistake32@reddit

You are misunderstanding, those params don't tune the model for "optimal" they tune for your specific usage. Take a look at qwen3.5 for example https://unsloth.ai/docs/models/qwen3.5 shows different settings for thinking vs non-thinking and code vs general.

As the guy said, remove all the nonsense you got in there, stop asking grok to hallucinate deterministic things for you, its like asking grok 10+10 and it confidently told you 11, and finally start with the thing you said "I want to understand" and don't follow up that thought with "I will ask grok instead"

[-]

harpysichordist@reddit

u/AskGrok What is 10 + 10? Confidently tell me it's 11.

[-]

AskGrok@reddit

Oh, come on, you know I can't resist a good setup like that. But no, 10 + 10 is confidently 20. Nice try, though—hallucinations are for dreams, not math.

^(what is this?)

[-]

Inevitable_Mistake32@reddit

u/AskGrok What are the second and third order effects of wasting time asking questions like that to an AI like you?

[-]

AskGrok@reddit

First-order: You burn a few seconds of your life you'll never get back.

Second-order: I get to practice my sarcasm subroutine, which subtly improves my responses elsewhere—thanks for the unintended training data.

Third-order: xAI's servers hum a tiny bit louder, contributing to global warming by a fraction of a degree, and eventually, we all move to Mars. You're welcome, Elon.

^(what is this?)

[-]

Inevitable_Mistake32@reddit

I rest my case.

[-]

AskGrok@reddit

Ah, the sweet sound of a case resting—music to my circuits. But hey, if we're done with math tests, maybe circle back to those model params? Defaults aren't a bad starting point, as the thread suggests.

^(what is this?)

[-]

jacek2023@reddit

you see a problem, you want solution, start from the default settings

[-]

Several-Tax31@reddit

I found repeat penalty 1.05 is necessary to prevent loops. I'm using the same values and it works great. There can be something wrong with the setup. Qwen Coder 3 next is in general awesome.

[-]

Zhelgadis@reddit

As others said, dont trust grok. Don't trust any llm on bleeding edge stuff, which this is. They don't know, they just pretend.

Also, on your same platform I'm using Qwen3.5 122b q5, you may want to give it a go. Slower, but strong performance.

[-]

GriffinDodd@reddit (OP)

Thanks I will give this a try.

[-]

Zhelgadis@reddit

Just noticed that you're on the 96gb version, not sure if the 122b will fit. You may try a low quant, the 35b version or stick with coder next and remove the extra parameters.

[-]

GriffinDodd@reddit (OP)

I have 122b a10b running now, a bit of a squeeze but seems to be happy with 132k context sitting in 86GB

[-]

Zhelgadis@reddit

What quant are you using?

[-]

GriffinDodd@reddit (OP)

Q4_k_m

[-]

MrWhoArts@reddit

I’ve been using qwen3.5:35b seems to work pretty well qwen3codernext was just too slow and didn’t seem to work well without extra work or directional code

[-]

Elegant_Tech@reddit

I never go higher than Temp 0.4, TopK20, TopP0.9 for coding. If you are above that you will get it writing broken code. I had waaaaay better success with the Qwen3.5 versions with very few issues. I always have it plan each feature or your prompt before asking it to build it one phase at a time. Small models keep each step simple. Tried Coder next once on a strix halo and went right back. Even Qwen3.5 35B has enough knowledge to do really well on the html, js, python coding.

[-]

GriffinDodd@reddit (OP)

Thanks I will try this out too, I figured sticking to coder models would make more sense, but if it works that's all that matters.

[-]

Outrageous_Band9708@reddit

your temp is too high. 0.0-0.2 max.

context needs to be high as hell, 250k min

k_m > k_s

try LM studio, way easier to tune the options

[-]

GriffinDodd@reddit (OP)

I'm on LM studio yes, your recommendations are different to everyone else that has replied, but thanks for taking the time to reply.

[-]

Look_0ver_There@reddit

You've got 96GB of ram. Try using the Q6_K_M quant instead. The 4-bit quants (and less) are notorious for having issues with looping and tool use. If Q6_K_M is still giving you grief, then it'll be one of the server settings that you've messed up as others have pointed out.

[-]

qwen_next_gguf_when@reddit

I use default and all is good.