Youtuber tries Qwen 3.5 35B, Qwen 3.6 35B, and Gemma 4 27b to reverse engineer some large JS, with good results for Qwen 3.6

Posted by mr_zerolith@reddit | LocalLLaMA | View on Reddit | 17 comments

Found this interesting and thought i'd share.

A big problem i've had with Qwen 3 MoE is how bad at instruction following it was, and also, it's 'dumb point' in the context window was really low. I was so turned off by it that i never tried Qwen 3.5 and kept using SEED OSS 36B for coding.

3.6 appears to have better instruction following than prior models, do you find this to be the case yourself?

[-]

korino11@reddit

He doesnt used correct version of qwen 3.6 with fixed layers! https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF results must be better with it

[-]

mister2d@reddit

Did unsloth "fix" their's already?

[-]

korino11@reddit

On my link already exist explanation what fixed and how. You have a links on unsloth about SIMILAR fix?!?

[-]

mister2d@reddit

Will check, thanks.

[-]

stenlis@reddit

I tried to replicate this guys test with gemma4. I don't have his code, but I used the largest code file in OpenSSL I could find: https://github.com/openssl/openssl/blob/master/test/sslapitest.c

This is even larger than his code, 461kB, 15k lines of code, it took 17k tokens prompt processing. This was my prompt:

I tried the prompt from the video:

reproduce the 20 lines immediately following the opening bracket { of the function static int test_keylog_output, copy verbatim, do not paraphrase, do not comment

Answer:

char *token = NULL;
unsigned char actual_client_random[SSL3_RANDOM_SIZE] = { 0 };
size_t client_random_size = SSL3_RANDOM_SIZE;
unsigned char actual_master_key[SSL_MAX_MASTER_KEY_LENGTH] = { 0 };
size_t master_key_size = SSL_MAX_MASTER_KEY_LENGTH;
unsigned int rsa_key_exchange_count = 0;
unsigned int master_secret_count = 0;
unsigned int client_early_secret_count = 0;
unsigned int client_handshake_secret_count = 0;
unsigned int server_handshake_secret_count = 0;
unsigned int client_application_secret_count = 0;
unsigned int server_application_secret_count = 0;
unsigned int early_exporter_secret_count = 0;
unsigned int exporter_secret_count = 0;

for (token = strtok(buffer, " \n"); token != NULL;
    token = strtok(NULL, " \n")) {
    if (strcmp(token, "RSA") == 0) {
        /*
         * Premaster secret. Tokens should be: 16 ASCII bytes of

Diff found only removing of four empty space characters at the start of each line, other than that 100% reproduced.

One caviat though is that I'm using the full BF16 version of the Gemma4 26B A4B model. In his video the author had the Q4 version and had some workaround for the insufficient context length out of the box. I wonder whether this had caused the problem.

[-]

mit-drissia@reddit

what does this mean? did gemma 4 pass? how many lines did it find?

[-]

stenlis@reddit

It listed all 20 lines correctly. I did two different requests (not 20 like the author) but both were 100% accurate. Unlike him I used the unquantized versions.

You can replicate this by giving your model the same file and the same prompt.

[-]

FullstackSensei@reddit

I think a big issue is the quant and the tooling/harness. Yes, 3.6 is better but Gemma 4 would very probably have completed the task successfully with a better quant and/or a good harness.

Q4 is, more often than not, not good enough for tasks with any amount of complexity at such model sizes. Higher quants are even more crucial when working with larger contexts, where the model needs to have even more nuance to stay "focused".

You don't need to, and shouldn't aim to fit the whole model + context in VRAM. The obsession with token generation speed, is very misguided. The only things that matter is how fast you can actually completes tasks and how much you had to intervene.

Even if you don't have enough VRAM, use system RAM to offload. Those are 3-4B active parameter models, they'll still be plenty fast running Q8 with all the FFN layers on CPU.

You'll get a lot more done at 1/3rd the speed but with less interventions and less corrections than at full speed but having to constantly correct the model or fix by hand the mistakes it makes.

[-]

redditorialy_retard@reddit

Exactly reason why im upgrading my ram even though it cost a shit ton

[-]

vulcan4d@reddit

We need more of these and different quant testing to validate the information that we are basically sold to. Everything on paper looks good but everyone seems focussed on testing extremely large models that many simply just dream of running.

[-]

redditorialy_retard@reddit

And most people run models at around Q4-5 as those are the ones that fit inside the GPU

[-]

jacek2023@reddit

Usually youtubers talking about LLMs are shit but this looks good, thanks for sharing.

[-]

ps5cfw@reddit

QWEN 3.6 really is the first time I can really work 100% locally without needing any cloud AI model, so I'm not too surprised.

[-]

FastHotEmu@reddit

not reverse engineer, just recall

[-]

Express_Quail_1493@reddit

i love this guys videos. he does real test on projects the LLM would stumble on to intentially feel out the models without relying heavily on benchmarks. Most youtubers are lazy zero-shot single file HTML edits which doesn't say much since pretty much all models can do that LOL

[-]

pmttyji@reddit

Can you share other youtube channels(on LLM stuff) you're watching? Thanks

[-]

nikhilprasanth@reddit

Intresting to see that the lm studio quants are performing better than unsloth ones across the three models.