Qwen3.6 uncensored AWQ

Posted by chikengunya@reddit | LocalLLaMA | View on Reddit | 7 comments

I have tested Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf on my 4x3090 system (opencode) and find it really good and fast. However, I can't find any uncensored models for vllm (preferably as AWQ). Is there no demand for that here, or is the 'making uncensored' limited to gguf only? Sorry for the noob question.

[-]

jennops@reddit

There's a few over at https://huggingface.co/collections/genevera/heretics

[-]

Uninterested_Viewer@reddit

I few 35b exist e.g. https://huggingface.co/genevera/Qwen3.6-35B-A3B-Abliterated-Heretic-AWQ-4bit

27b is still very new so I'd either give it time or quant it yourself.

[-]

jennops@reddit

Oh hey that's my quant!

[-]

Lissanro@reddit

Any model can be uncensored. AWQ for HauhauCS-Agressive was requested already quite a few times, and here https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/1 the quant author replied that he plans to upload BF16 if HuggingFace approves request for grant-based space. At the moment there is still no BF16 or 16-bit safetesnsors, so it is not possible to make proper AWQ quant yet (well, technically, it is possible to dequant Q8_K_P GGUF and then make AWQ out of that, but it is extra work, this is why probably no one have done it yet).

[-]

Prize_Negotiation66@reddit

Fuck vllm, all my homies are using llamacpp

[-]

Lissanro@reddit

There are valid reasons to use vllm, in particular support for video input, higher throughput; llama.cpp is the best option if you need offload to RAM, or for edge cases when with VLLM you cannot fit the model to VRAM but can with llama.cpp (since it has more quant options, and also more efficient at utilizing available VRAM).

[-]

Technical-Earth-3254@reddit

Idk, I'm mostly using llama.cpp bc I'm gpu poor with my single 3090. If I wasn't and had a boatload of vram, I would use vllm.