Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

Posted by EvilEnginer@reddit | LocalLLaMA | View on Reddit | 180 comments

Hello everyone. I found and fixed training bug in Qwen3.5 35B A3B model.

Here my fixed version: https://huggingface.co/LuffyTheFox/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

Upgraded system prompt that unlocks deep thinking (works great with this model):
https://pastebin.com/pU25DVnB

Chat template: https://pastebin.com/uk9ZkxCR (supports tool calling)

Recommended Settings (LM Studio):

Temperature	0.7
Top K Sampling	20
Presence Penalty	1.5
Top P Sampling	0.8
Min P Sampling	0
Seed	42

History:

I've been using Qwen 3.5 35B A3B (the uncensored version by HauhauCS) for a while. It's an incredible model - uncensored, MoE with 256 experts, hybrid DeltaNet + Attention, 40 layers. But something was off. On short prompts it works fine. On long conversations it started "philosophizing" - losing context, repeating itself, writing broken code with strange comments.

I spent two weeks digging through the weights.

What I found:

Two tensors. In blocks 36 and 37. ssm_conv1d.weight.

Their scale was \~60% higher than normal (σ=0.102 vs median 0.063). Because of how AdamW works, rare experts in the last layers get a huge effective learning rate - their weights drift.

In a recurrent architecture like DeltaNet, this kills the hidden state. The model forgets context after a few tokens.

Surprisingly I didn't found any issues in Gemma 4 26B A4B - all scales were correct in model.

What I did:

I scaled broken tensors back to normal. Nothing else. 489 other tensors were left untouched - their scale is architectural (gate_inp, etc.).

Results:

Error reduction: 88.6%.
Long conversations now stay coherent.
Code generation works.
No more "philosophizing".

What I learned:

One bug. Two tensors. 64GB of model. And the entire potential of the most complex open-weight architecture was locked behind it.

If you're using MoE + recurrent hybrids (DeltaNet, Mamba, etc.), check your last blocks. AdamW might have silently broken them.

Enjoy \^_\^

[-]

Thireus@reddit

u/VoidAlchemy

[-]

VoidAlchemy@reddit

Thanks, I'll look into it. fwiw all GGUFs created will have identical f32 dtypes for the mentioned vector data e.g.

$ grep conv1d *.log | grep -E '(36|37)'
quantize-Qwen3.5-27B-Q8_0.log:[ 360/ 851]              blk.0.ssm_conv1d.weight - [    4, 10240,     1,     1], type =    f32, size =    0.156 MB
quantize-Qwen3.5-27B-Q8_0.log:[ 370/ 851]              blk.1.ssm_conv1d.weight - [    4, 10240,     1,     1], type =    f32, size =    0.156 MB
quantize-Qwen3.5-27B-Q8_0.log:[ 597/ 851]             blk.36.ssm_conv1d.weight - [    4, 10240,     1,     1], type =    f32, size =    0.156 MB
quantize-Qwen3.5-27B-Q8_0.log:[ 605/ 851]             blk.37.ssm_conv1d.weight - [    4, 10240,     1,     1], type =    f32, size =    0.156 MB
quantize-Qwen3.5-27B-Q8_0.log:[ 837/ 851]              blk.8.ssm_conv1d.weight - [    4, 10240,     1,     1], type =    f32, size =    0.156 MB
quantize-Qwen3.5-35B-A3B-Q8_0.log:[ 361/ 733]             blk.18.ssm_conv1d.weight - [    4,  8192,     1,     1], type =    f32, size =    0.125 MB
quantize-Qwen3.5-35B-A3B-Q8_0.log:[ 609/ 733]             blk.36.ssm_conv1d.weight - [    4,  8192,     1,     1], type =    f32, size =    0.125 MB
quantize-Qwen3.5-35B-A3B-Q8_0.log:[ 622/ 733]             blk.37.ssm_conv1d.weight - [    4,  8192,     1,     1], type =    f32, size =    0.125 MB

[-]

Thireus@reddit

Thanks! Also see https://www.reddit.com/r/LocalLLaMA/comments/1sfwauj/comment/off7gus/ and answers.

[-]

VoidAlchemy@reddit

> The script is proprietary - I'm not sharing it.

I'm not convinced yet that there is any problem. I haven't seen any PPL/KLD numbers with/without patched bf16 safetensors (or bf16 gguf if that is where they are starting).

There is no problem at all with GGUF and their claim has nothing to do with GGUF. Their claim is the original safetensors bf16 format weights for those two tensors have wider standard deviation so they scale it down presumably?

It could just be some marketing hype for their finetuned model? Honestly not sure if it is worth investigating or not unless they release the actual script or someone can A/B test a patched/unpatched version in terms of PPL/KLD/lm harness evals or anything besides vibes.

[-]

Decivox@reddit

I built a tool and ran it against the original BF16 GGUF in the OP, I actually found one extra tensor (36), applied repairs, and got no differing results in the NIAH tests. Available below if you want to give it a test:

https://github.com/decibuild/qwen-ssm-repair

[-]

VoidAlchemy@reddit

Thanks for sharing your tokens! My GLM-5.1 running locally on CPU-only couldn't figure out what the "problem" even was after a couple hours so I gave up... haha...

[-]

Thireus@reddit

Thank you for looking into it!

[-]

Thireus@reddit

I agree and if there is indeed something wrong with the original tensors this should probably be raised with Qwen so they can update their model in a minor release.

[-]