Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

Posted by EvilEnginer@reddit | LocalLLaMA | View on Reddit | 180 comments

Hello everyone. I found and fixed training bug in Qwen3.5 35B A3B model.

Here my fixed version: https://huggingface.co/LuffyTheFox/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

Upgraded system prompt that unlocks deep thinking (works great with this model):
https://pastebin.com/pU25DVnB

Chat template: https://pastebin.com/uk9ZkxCR (supports tool calling)

Recommended Settings (LM Studio):

Temperature 0.7
Top K Sampling 20
Presence Penalty 1.5
Top P Sampling 0.8
Min P Sampling 0
Seed 42

History:

I've been using Qwen 3.5 35B A3B (the uncensored version by HauhauCS) for a while. It's an incredible model - uncensored, MoE with 256 experts, hybrid DeltaNet + Attention, 40 layers. But something was off. On short prompts it works fine. On long conversations it started "philosophizing" - losing context, repeating itself, writing broken code with strange comments.

I spent two weeks digging through the weights.

What I found:

Two tensors. In blocks 36 and 37. ssm_conv1d.weight.

Their scale was \~60% higher than normal (σ=0.102 vs median 0.063). Because of how AdamW works, rare experts in the last layers get a huge effective learning rate - their weights drift.

In a recurrent architecture like DeltaNet, this kills the hidden state. The model forgets context after a few tokens.

Surprisingly I didn't found any issues in Gemma 4 26B A4B - all scales were correct in model.

What I did:

I scaled broken tensors back to normal. Nothing else. 489 other tensors were left untouched - their scale is architectural (gate_inp, etc.).

Results:

What I learned:

One bug. Two tensors. 64GB of model. And the entire potential of the most complex open-weight architecture was locked behind it.

If you're using MoE + recurrent hybrids (DeltaNet, Mamba, etc.), check your last blocks. AdamW might have silently broken them.

Enjoy \^_\^