Youtuber tries Qwen 3.5 35B, Qwen 3.6 35B, and Gemma 4 27b to reverse engineer some large JS, with good results for Qwen 3.6
Posted by mr_zerolith@reddit | LocalLLaMA | View on Reddit | 17 comments
Found this interesting and thought i'd share.
A big problem i've had with Qwen 3 MoE is how bad at instruction following it was, and also, it's 'dumb point' in the context window was really low. I was so turned off by it that i never tried Qwen 3.5 and kept using SEED OSS 36B for coding.
3.6 appears to have better instruction following than prior models, do you find this to be the case yourself?
korino11@reddit
He doesnt used correct version of qwen 3.6 with fixed layers! https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF results must be better with it
mister2d@reddit
Did unsloth "fix" their's already?
korino11@reddit
On my link already exist explanation what fixed and how. You have a links on unsloth about SIMILAR fix?!?
mister2d@reddit
Will check, thanks.
stenlis@reddit
I tried to replicate this guys test with gemma4. I don't have his code, but I used the largest code file in OpenSSL I could find: https://github.com/openssl/openssl/blob/master/test/sslapitest.c
This is even larger than his code, 461kB, 15k lines of code, it took 17k tokens prompt processing. This was my prompt:
I tried the prompt from the video:
Answer:
Diff found only removing of four empty space characters at the start of each line, other than that 100% reproduced.
One caviat though is that I'm using the full BF16 version of the Gemma4 26B A4B model. In his video the author had the Q4 version and had some workaround for the insufficient context length out of the box. I wonder whether this had caused the problem.
mit-drissia@reddit
what does this mean? did gemma 4 pass? how many lines did it find?
stenlis@reddit
It listed all 20 lines correctly. I did two different requests (not 20 like the author) but both were 100% accurate. Unlike him I used the unquantized versions.
You can replicate this by giving your model the same file and the same prompt.
FullstackSensei@reddit
I think a big issue is the quant and the tooling/harness. Yes, 3.6 is better but Gemma 4 would very probably have completed the task successfully with a better quant and/or a good harness.
Q4 is, more often than not, not good enough for tasks with any amount of complexity at such model sizes. Higher quants are even more crucial when working with larger contexts, where the model needs to have even more nuance to stay "focused".
You don't need to, and shouldn't aim to fit the whole model + context in VRAM. The obsession with token generation speed, is very misguided. The only things that matter is how fast you can actually completes tasks and how much you had to intervene.
Even if you don't have enough VRAM, use system RAM to offload. Those are 3-4B active parameter models, they'll still be plenty fast running Q8 with all the FFN layers on CPU.
You'll get a lot more done at 1/3rd the speed but with less interventions and less corrections than at full speed but having to constantly correct the model or fix by hand the mistakes it makes.
redditorialy_retard@reddit
Exactly reason why im upgrading my ram even though it cost a shit ton
vulcan4d@reddit
We need more of these and different quant testing to validate the information that we are basically sold to. Everything on paper looks good but everyone seems focussed on testing extremely large models that many simply just dream of running.
redditorialy_retard@reddit
And most people run models at around Q4-5 as those are the ones that fit inside the GPU
jacek2023@reddit
Usually youtubers talking about LLMs are shit but this looks good, thanks for sharing.
ps5cfw@reddit
QWEN 3.6 really is the first time I can really work 100% locally without needing any cloud AI model, so I'm not too surprised.
FastHotEmu@reddit
not reverse engineer, just recall
Express_Quail_1493@reddit
i love this guys videos. he does real test on projects the LLM would stumble on to intentially feel out the models without relying heavily on benchmarks. Most youtubers are lazy zero-shot single file HTML edits which doesn't say much since pretty much all models can do that LOL
pmttyji@reddit
Can you share other youtube channels(on LLM stuff) you're watching? Thanks
nikhilprasanth@reddit
Intresting to see that the lm studio quants are performing better than unsloth ones across the three models.