These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade

Posted by BuffMcBigHuge@reddit | LocalLLaMA | View on Reddit | 122 comments

Time and time again I find posts about these fine tunes that promise increased intelligence and reasoning with base models, and I continuously try them, realize they're botched, and delete them shortly after. I sometimes do resort to a lower quant since they are bigger, in this case, a 40b variant of Qwen 3.5 27b, but they seem to always let me down. I've resorted to not downloading any model with "Claude Opus 4.6" in the name.

Kudos to everyone who tries to make the foundation models more intelligent, but imo, it never works.

Note that this example is anecdotal evidence on a single prompt, but it's overall always the case of decreased intelligence when using with a local agent setup + llama.cpp in WSL2. This is irrespective of the quant as well - I've tried many.

One thing to notice however, the reasoning/thinking is significantly less, perhaps that's part of the problem.

Have any you found these better than base, ever?

The attached screenshots are:

./llama-server -hf mradermacher/Qwen3.5-27B-heretic-GGUF:Q4_K_S --temp 1.0 --top-p 0.8 --top-k 20 --min-p 0.00 --fit on --alias default --jinja --flash-attn on --ctx-size 262144 --ctx-checkpoints 256 --cache-ram -1 --cache-type-k q4_0 --cache-type-v q4_0 --threads 8 --threads-batch 16 --no-mmap --sleep-idle-seconds 600

~/llama.cpp/build/bin && ./llama-server -hf mradermacher/Qwen3.5-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-i1-GGUF:i1-Q3_K_S --temp 1.0 --top-p 0.8 --top-k 20 --min-p 0.00 --fit on --alias default --jinja --flash-attn on --ctx-size 131072 --ctx-checkpoints 256 --cache-ram -1 --cache-type-k q4_0 --cache-type-v q4_0 --threads 8 --threads-batch 16 --no-mmap --sleep-idle-seconds 600