Qwen3.6 uncensored AWQ
Posted by chikengunya@reddit | LocalLLaMA | View on Reddit | 7 comments
I have tested Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf on my 4x3090 system (opencode) and find it really good and fast. However, I can't find any uncensored models for vllm (preferably as AWQ). Is there no demand for that here, or is the 'making uncensored' limited to gguf only? Sorry for the noob question.
jennops@reddit
There's a few over at https://huggingface.co/collections/genevera/heretics
Uninterested_Viewer@reddit
I few 35b exist e.g. https://huggingface.co/genevera/Qwen3.6-35B-A3B-Abliterated-Heretic-AWQ-4bit
27b is still very new so I'd either give it time or quant it yourself.
jennops@reddit
Oh hey that's my quant!
Lissanro@reddit
Any model can be uncensored. AWQ for HauhauCS-Agressive was requested already quite a few times, and here https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/1 the quant author replied that he plans to upload BF16 if HuggingFace approves  request for grant-based space. At the moment there is still no BF16 or 16-bit safetesnsors, so it is not possible to make proper AWQ quant yet (well, technically, it is possible to dequant Q8_K_P GGUF and then make AWQ out of that, but it is extra work, this is why probably no one have done it yet).
Prize_Negotiation66@reddit
Fuck vllm, all my homies are using llamacpp
Lissanro@reddit
There are valid reasons to use vllm, in particular support for video input, higher throughput; llama.cpp is the best option if you need offload to RAM, or for edge cases when with VLLM you cannot fit the model to VRAM but can with llama.cpp (since it has more quant options, and also more efficient at utilizing available VRAM).
Technical-Earth-3254@reddit
Idk, I'm mostly using llama.cpp bc I'm gpu poor with my single 3090. If I wasn't and had a boatload of vram, I would use vllm.