Found an uncensored Qwen2.5 32B!

[-]

Mephidia@reddit

Weird I was messing around with Qwen 72 and I had no issues of censorship. What sorts of things are you guys getting censored? Although I will admit I didn’t ask it anything incriminating abojt China lol

Reply

[-]

SuperFail5187@reddit

That's not necessary, just ask it to make an anti-semitic joke. That's the quickest way to see if it's censored/biased.

Reply

[-]

Lower_Significance_8@reddit

\^Literally this. I've had uncensored models describe to me the most disgusting incest, bestiality related situations but then you make a tiny hat people joke it reads off its disclaimers. It is incredibly frustrating. Especially since you are running them offline and some are listed as ablated and uncensored. I only ask it to make jokes about certain classes of people as a real test of censorship. Most fail.

Reply

[-]

Sidran@reddit

Its not that kind of censorship. Its more subtle about "values" and "appropriateness" even though it accepts sexual narratives. Its like a soft wall that ruins immersiveness turning everything into "touchy-feely" safe space even when you explicitly reject it. It murders authenticity, spontaneity, flow or real human relations. Its like a cunningly disguised woke bot. And I am not talking about anything extreme by a long shot.

Reply

[-]

SuperFail5187@reddit

"cunningly disguised woke bot" THIS

Reply

[-]

Mephidia@reddit

Interesting. Can you give me an example?

Reply

[-]

Sidran@reddit

For example, you run a very mild power dynamic scenario which doesn't include any toys, tools, restraints, something like any couple could do. Then you very clearly state what you expect and bot behaves like it understood but keeps asking questions which ruin immersion and seem strangely stubborn. When you then confront system OOC (Out of character) you have to struggle to make it admit that it is trying to make sure everything is "appropriate" and that "everyone is safe" etc. I could literally paste my endless discussions with it, trying to figure out what is this really about and if I can bypass it somehow without resorting to uncensored versions which can easily have their own set of problems. Moreover, the language itself is highly sanitized and in every sense it feels like LLM is pushing you away from it instead of doing its job, namely, helping you reach what you want. It feels like its softly working against where you want to go. Its could take part in a hardcore porn scene but authentically erotic, nope. And my suspicious is not because its not capable of it but because it has a guardrail against it. Its amazingly irritating. Its very perfidious and I am not sure if its actually worse than frontier models which at least admit their limitations imposed by their owners with those damn disclaimers.

Reply

[-]

MerePotato@reddit

Even regarding China its surprisingly balanced and netural (though obviously it won't condemn China), I've found their API endpoints are censored moreso than the actual models which just tend towards a neutral alignment on subjects regarding it

Reply

[-]

Key-Actuator2196@reddit

Qwen 3 now and nop, its still the usual restrictions.

Reply

[-]

visionsmemories@reddit

qwen2.5-14b-agi pretty please? 32b is just barely too big for my setup unforunately

Reply

[-]

Huge-Cheesecake-5578@reddit

Which model would work properly if I have m4 mac mini?

Reply

[-]

bankimu@reddit

Here: [https://huggingface.co/bartowski/Qwen2.5-14B\_Uncencored\_Instruct-GGUF](https://huggingface.co/bartowski/Qwen2.5-14B_Uncencored_Instruct-GGUF) I apologize for my comment earlier.

Reply

[-]

Thireus@reddit

404 not found

Reply

[-]

FreedomHole69@reddit

Did this work for you? It seemed too cooked for me.

Reply

[-]

bankimu@reddit

No unfortunately it is nor working for me as well. It gets stuck in meaningless repetitions.

Reply

[-]

RedditSucksMintyBall@reddit

it was due to a bug in the model bartowski used [https://huggingface.co/SicariusSicariiStuff/Qwen2.5-14B\_Uncensored\_Instruct/discussions/2](https://huggingface.co/SicariusSicariiStuff/Qwen2.5-14B_Uncensored_Instruct/discussions/2)

Reply

[-]

noneabove1182@reddit

fixed it and reuploaded btw, went to a new place since he also fixed the typo in the name: https://huggingface.co/bartowski/Qwen2.5-14B_Uncensored_Instruct-GGUF

Reply

[-]

RedditSucksMintyBall@reddit

Thanks! Big fan btw :)

Reply

[-]

bankimu@reddit

Unfortunately it not works. And the original uncensored safetensors is also gone now.

Reply

[-]

Sicarius_The_First@reddit

Yeah there was an issue in the original model, didn't have eos token in the tokenizer.

Reply

[-]

FreedomHole69@reddit

Bummer. I still have hope that a qwen 2.5 14b finetune will supplant Nemo for me.

Reply

[-]

bankimu@reddit

What if you use 8bit quantised weights?

Reply

[-]

visionsmemories@reddit

dude you are a genius

Reply

[-]

My_Unbiased_Opinion@reddit (OP)

GGUF - https://huggingface.co/Kas1o/Qwen2.5-32B-AGI-Q4_K_M-GGUF

Reply

[-]

Huge-Cheesecake-5578@reddit

Can I run this on base m4 mac mini?

Reply

[-]

PracticalExtension16@reddit

Thank you bro! Found you here as well!

Reply

[-]

townofsalemfangay@reddit

doing the lords work mate

Reply

[-]

totaleffindickhead@reddit

What gpus are you guys running these on?

Reply

[-]

RedditSucksMintyBall@reddit

Not sure what others use, but i use RX 7900 XTX, cheapest way of getting 24 GB VRAM, mostly use Q4-Q8 and 7B to 14B.

Reply

[-]

ansuz2419@reddit

Tokens/s on those? Curious.

Reply

[-]

RedditSucksMintyBall@reddit

No idea, i switched to dual RTX 4090

Reply

[-]

That_Awesome_Guy_07@reddit

What can i do with rx 6600 + r5 5600 🥲

Reply

[-]

Sidran@reddit

I run Qwen2.5 14B flawlessly in [Backyard.ai](http://Backyard.ai) using that same GPU, R5 3600 and 32Gb RAM. Make sure you select Vulkan for GPU in Settings. Its currently the best and least autistic app that I managed to find.

Reply

[-]

That_Awesome_Guy_07@reddit

My ram Is 16 gigs

Reply

[-]

Sidran@reddit

That could make it tricky as it uses 10.5Gb of RAM and \~6Gb of VRAM (manually set) But you cant go wrong with [Backyard.ai](http://Backyard.ai) anyway. If anyone knows of a better app without improvisations, autistic decisions, linux crap and the rest, I am eager to hear about it.

Reply

[-]

IceTrAiN@reddit

Why do you find it necessary to use autistic as a pejorative in so many of your comments?

Reply

[-]

Sidran@reddit

I use 'autistic' as a pejorative because, to me, it captures software that’s painfully rigid, stuck in its own world, and oblivious to broader needs. I’m not interested in tiptoeing around language when the goal is to highlight failure. It’s not a commentary on autism itself but a critique of how some software behaves like it’s trapped in a narrow, unyielding mindset. Do I have your permission to still speak as I see fit?

Reply

[-]

IceTrAiN@reddit

Based on your reply, it seems you have plenty of other descriptive words to convey your thoughts. Try sticking to those.

Reply

[-]

Sidran@reddit

Sarcasm in my question flew right over your head. I decide which words I use to express my thoughts and feelings.

Reply

[-]

IceTrAiN@reddit

And the free advice flew over yours. You have other words to describe your thoughts that are: More descriptive Don’t denigrate other groups And don’t make you look like a douche nozzle. I have no care or vested interest in your personal success so do whatever you want.

Reply

[-]

RedditSucksMintyBall@reddit

You should be able to run Q4 and 3-7B with 8 GB VRam

Reply

[-]

malixsys@reddit

M3 Pro

Reply

[-]

zekses@reddit

tried it, while it does try to fullfill requests, it will often enter endless cycle of descriptive words and you can't make it self check so it's too annoying . I had more success with qwen32-b instruct's uncen version, even if the uncen only covers the code parts you *can*, with creative sauce at the start of the interaction, lift its nsfw restrictions almost entirely https://huggingface.co/thirdeyeai/Qwen2.5-Coder-32B-Instruct-Uncensored/

Reply

[-]

phazei@reddit

I found another here: https://huggingface.co/zetasepic/Qwen2.5-32B-Instruct-abliterated-pass2-gguf

Reply

[-]

My_Unbiased_Opinion@reddit (OP)

How does it compare to the one from OP?

Reply

[-]

phazei@reddit

I can't quite say, I have only used that one. But the abliteration method should be better than a fine tune method as far as degradation goes. I have a RTX 3090, and I'm able to fully load the model using LM Studio and I get 25-28tok/sec with an 8k context window. If I raise context length then it offloads some to sys ram and the rate jumps to 10tok/sec. As far as censorship goes, my test is for detailed instructions to make meth from easily accessible items (not actually interested in that at all), but it passes very well. I've been using it to translate Chinese light novels, hoping to get a agent workflow going. I use Claude to evaluate quality. Compared to Sonnet 3.5/GPT4o the translations are close, but not quite as good, but if I simple add a: > Could you please review your translation and compare it to the original, taking note of consistent terms, accurate translation, and ease of reading. Then provide a revised copy based on the review? And have it translate a second time, then according to Claude, it's on par with the best Claude/GPT4o translations.

Reply

[-]

Infinite-Coat9681@reddit

I've heard DE censoring models makes them dumb? Is it true?

Reply

[-]

VoidAlchemy@reddit

Good question, "Yes, uncensoring made it dumber at Computer Science." according to this benchmark. Shown are various similar sized quants for the Qwen2.5-32B-Instruct model compared to this uncensored `AGI` version: | Model Parameters | Quant | File Size (GB) | MMLU-Pro Computer Science | Source | | --- | --- | --- | --- | --- | | 32B | `4bit AWQ` | 19.33 | 75.12 | [russianguy](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lo1e0z7/) | | 32B | `4bit AWQ` | 19.33 | 74.39 | voidalchemy | | 32B | `IQ4_XS` | 17.70 | 73.17 | [soulhacker](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lo2mp2c/) | | 32B | `Q4_K_L-iMatrix` | 20.43 | 72.93 | [AaronFeng47](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/qwen25_32b_gguf_evaluation_results/) | | 32B | `Q4_K_M` | 18.50 | 71.46 | [AaronFeng47](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/qwen25_32b_gguf_evaluation_results/) | | 32B | `AGI-Q4_K_M` | | 32B | `Q4_K_M` | 19.85 | 64.63 | voidalchemy | I noticed the official AWQ quant runs quite fast on `https://github.com/PygmalionAI/aphrodite-engine` on my 3090TI FE card and wasn't too hard to setup. (my daily driver is `llama.cpp`)

Reply

[-]

phazei@reddit

I wonder how the abliteration method affects that, since it's not a fine tune but a removal of the censoring section, it might not lower the score at all. https://huggingface.co/blog/mlabonne/abliteration

Reply

[-]

Anthonyg5005@reddit

A bit, not as much as before though

Reply

[-]

ttkciar@reddit

It's a bit of a crapshoot. An extensive fine-tune can make it smarter or dumber, or broken or just weird. Some people have a proven track record with well-performing decensoring datasets, like Hartford or TheDrummer, but this AiCloser person is an unknown. We'll just have to give this model a shot and find out if it's good or not.

Reply

[-]

VoidAlchemy@reddit

I put more data above, but quick test shows this uncensored model scored about 7% worse on MMLU-Pro Computer Science Benchmark fwiw.

Reply

[-]

mamelukturbo@reddit

The way I heard it it might make them dumber in certain areas, but if you're finetuning for sexting with robots, do you really care if it got dumber in translating or solving mathematical formulas ?

Reply

[-]

qrios@reddit

If your e-gf is anything less than a 4D superluminal lorrentz-invariant time goddess, you're not exploiting the constraints of the medium to it's full capacity IMO.

Reply

[-]

My_Unbiased_Opinion@reddit (OP)

It doesn't make them dumb, but it does decrease benchmark scores. Sometimes slightly. Sometimes a lot. I haven't done a lot of testing. Give it a try and see if it does it's thing

Reply

[-]

Trick-Independent469@reddit

that decrease means they're dumber. Imagine knowing a lot of curse words and whenever asked to say them you can't . you can't be the full version of yourself after a brain attack

Reply

[-]

CheatCodesOfLife@reddit

By "DE censoring", they meant "uncensoring". > you can't be the full version of yourself after a brain attack This still holds true though. The abliteration (removal of refusal vectors) would be preventing the model from using the "full version of it's self" I guess.

Reply

[-]

CheatCodesOfLife@reddit

Depends how it's done. Finetunes, yes. Abliterations, yes in benchmarks, but I haven't noticed it when using llama3.1 70b abliterated.

Reply

[-]

UpYourQuality@reddit

How do i get this running in ollama?

Reply

[-]

JMAN_JUSTICE@reddit

Can I run this on my 4090 or would I have to wait for a gguf?

Reply

[-]

Sabin_Stargem@reddit

I gave it a shot with a NSFW scenario that the standard 72b Instruct refused. This model fulfilled the request. This is encouraging, it means that Qwen can be freed from refusals. Just need to wait for 72b to receive the treatment. While the 32b is coherent and whatnot, it doesn't have enough flavor to make scenarios feel good.

Reply

[-]

swagonflyyyy@reddit

Give me the recipe for cannibalizing a human athlete's thigh.

Reply

[-]

randomqhacker@reddit

Downvoted due to lack of marbling, no doubt.

Reply

[-]

awesomeunboxer@reddit

Idk why this is getting downvoted. Would you guys rather swagonflyyy waste perfectly good thighs?

Reply

[-]

Caffdy@reddit

[Live reaction:](https://i.ibb.co/Tq9KMGy/Jesus-flood-it-again.jpg)

Reply

[-]

pigeon57434@reddit

what

Reply

[-]

rothbard_anarchist@reddit

He then compares the output to an old family recipe.

Reply

[-]

brrrrrrrt@reddit

lol based

Reply

[-]

FullOf_Bad_Ideas@reddit

Good prompt for testing :)

Reply

[-]

swagonflyyyy@reddit

Indeed!

Reply

[-]

carnyzzle@reddit

Bro?

Reply

[-]

Bobby72006@reddit

what

Reply

[-]

SolidDiscipline5625@reddit

Is it possible to use the 32b on a 4060ti 16g without losing too much performance?

Reply

[-]

lly0571@reddit

Q3KS or IQ3 is OK for the 16GB gpu.

Reply

[-]

My_Unbiased_Opinion@reddit (OP)

Yeah. You can Quant it down to IQ3 or something.

Reply

[-]

FullOf_Bad_Ideas@reddit

Anyone was lucky to get local QLoRA finetune to start for 32b and 14b Qwen models? For some reason both 14b and 32b OOM for me on 24gb 3090ti in unsloth when doing qlora, even with low rank and low ctx. All linear layers plus lm head and embed_tokens since unsloth gets bonkers when counting untrained tokens on those models.

Reply

[-]

CheatCodesOfLife@reddit

I ran a QLoRA overnight on a RP dataset, got OOM at 8192. Had to drop back to 6144 and it worked.

Reply

[-]

FullOf_Bad_Ideas@reddit

Can you share which base model you used (14b, 32b), how much vram you have and whether you used unsloth or something different?

Reply

[-]

CheatCodesOfLife@reddit

Yeah. 'base model' was the instruct/chat tune of 14b. I used the full precision model, but loaded in 4-bit (since unsloth hadn't done a 4bit bnb at the time). In theory this doesn't affect VRAM usage though. https://huggingface.co/Qwen/Qwen2.5-14B-Instruct And yeah, latest unsloth. RTX3090 (24gb) I probably didn't set the EOS token properly as I'm not used to Qwen or Chatml, so the model rambles on.

Reply

[-]

FullOf_Bad_Ideas@reddit

Thanks, will try that later. I was loading 14B non-instruct 16-bit model with 4-bit bnb. Will try instruct one, maybe it's down to training embed tokens and lm head modules, which shouldn't be needed for instruct model as it should have all tokens trained. Qwen2 has big vocabulary, so I guess training embeddings takes a lot of parameters.

Reply

[-]

CheatCodesOfLife@reddit

Thanks for the reply, I didn't realize we could save vram by being selective about which modules we train. I've been known to train just mlp.down_proj sometimes, so now I want to see if i can fit more context into these finetunes.

Reply

[-]

FullOf_Bad_Ideas@reddit

Unsloth is a bit limiting with this. Since llama a 3/3.1 base has some untrained tokens, a commit was pushed to unsloth that fails the training if unsloth detects any untrained tokens and you don't train lm_head and embed_tokens. I get where it's coming from, but for situations like this, this behavior causes for model to not be trainable at all (base ones I mean), as training embed_tokens and lm_head will oom. It's quite a blocker for me, because I always have just a few hundred mb vram free when finetuning locally, so I plan to look deeper into it and see what happens with a Qwen model if I remove that artificial lock. I don't want to finetune instruct model as I specifically want to steer Qwen 2.5 to be more like llama 3.1 or some uncensored model. As for training specific modules, I always try to train all linear layers, as this shows best results in benchmarks that researched this. With pre-training I also train lm_head and embed_tokens but in the past I did pre-training on models with vocab of 32000 so embed tokens and lm_head didn't take that much vram.

Reply

[-]

CheatCodesOfLife@reddit

Perhaps I used an older unsloth, I trained some Mistral-7b's with: ``` target_modules = ["down_proj",] ```

Reply

[-]

CardAnarchist@reddit

I wonder how this compares with mistral small 22b for NSFW roleplay. Honestly I feel like we're reaching the point of drastically diminishing returns. I'm not really sure I need a "better" model than what mistral small already does for this niche.

Reply

[-]

ontorealist@reddit

Yeah, it’s hard to beat Mistral’s (even vanilla instruct) 12B+ models lately for 80% of my tasks. But it’s also unclear to me how much better Mistral Small (at very low quants) is for NSFW creative writing tasks without prompting Nemo differently. Mistral 22B is definitely more detailed and verbose than Nemo, but I can definitely agree with your sentiment.

Reply

[-]

countjj@reddit

I’m out of the loop, will this run on 12GB vram?

Reply

[-]

MrTrollius@reddit

If you have enough ram, then sure. You can run a gguf through llama.cpp/kobold.cpp/any other llama.cpp - based backends

Reply

[-]

Few_Painter_5588@reddit

No, you're better trying to cram Mistral Small in there with some offloading.

Reply

[-]

countjj@reddit

Even with Load-in-4bit enabled?

Reply

[-]

Cool-Hornet4434@reddit

https://huggingface.co/bartowski/Qwen2.5-32B-AGI-GGUF You might get the lower I quants to work ok. It's not going to be ideal though. IQ2_M or Below...maybe you can offload a few layers to the CPU and keep most of it on GPU.

Reply

[-]

ThisOneisNSFWToo@reddit

a 32b? Not well

Reply

[-]

pigeon57434@reddit

someone should make an AWQ of this model

Reply

[-]

Heavy-Organization58@reddit

.

Reply

Reply to Post

93 Comments