TheaterFire

Found an uncensored Qwen2.5 32B!

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 93 comments

Reply to Post

93 Comments

Mephidia@reddit

Weird I was messing around with Qwen 72 and I had no issues of censorship. What sorts of things are you guys getting censored? Although I will admit I didn’t ask it anything incriminating abojt China lol
View on Reddit #36196188

SuperFail5187@reddit

That's not necessary, just ask it to make an anti-semitic joke. That's the quickest way to see if it's censored/biased.
View on Reddit #38477203

Lower_Significance_8@reddit

\^Literally this. I've had uncensored models describe to me the most disgusting incest, bestiality related situations but then you make a tiny hat people joke it reads off its disclaimers. It is incredibly frustrating. Especially since you are running them offline and some are listed as ablated and uncensored. I only ask it to make jokes about certain classes of people as a real test of censorship. Most fail.
View on Reddit #68583062

Sidran@reddit

Its not that kind of censorship. Its more subtle about "values" and "appropriateness" even though it accepts sexual narratives. Its like a soft wall that ruins immersiveness turning everything into "touchy-feely" safe space even when you explicitly reject it. It murders authenticity, spontaneity, flow or real human relations. Its like a cunningly disguised woke bot. And I am not talking about anything extreme by a long shot.
View on Reddit #36270033

SuperFail5187@reddit

"cunningly disguised woke bot" THIS
View on Reddit #38476593

Mephidia@reddit

Interesting. Can you give me an example?
View on Reddit #36270206

Sidran@reddit

For example, you run a very mild power dynamic scenario which doesn't include any toys, tools, restraints, something like any couple could do. Then you very clearly state what you expect and bot behaves like it understood but keeps asking questions which ruin immersion and seem strangely stubborn. When you then confront system OOC (Out of character) you have to struggle to make it admit that it is trying to make sure everything is "appropriate" and that "everyone is safe" etc. I could literally paste my endless discussions with it, trying to figure out what is this really about and if I can bypass it somehow without resorting to uncensored versions which can easily have their own set of problems. Moreover, the language itself is highly sanitized and in every sense it feels like LLM is pushing you away from it instead of doing its job, namely, helping you reach what you want. It feels like its softly working against where you want to go. Its could take part in a hardcore porn scene but authentically erotic, nope. And my suspicious is not because its not capable of it but because it has a guardrail against it. Its amazingly irritating. Its very perfidious and I am not sure if its actually worse than frontier models which at least admit their limitations imposed by their owners with those damn disclaimers.
View on Reddit #36271079

MerePotato@reddit

Even regarding China its surprisingly balanced and netural (though obviously it won't condemn China), I've found their API endpoints are censored moreso than the actual models which just tend towards a neutral alignment on subjects regarding it
View on Reddit #36215595

Key-Actuator2196@reddit

Qwen 3 now and nop, its still the usual restrictions.
View on Reddit #65016918

visionsmemories@reddit

qwen2.5-14b-agi pretty please? 32b is just barely too big for my setup unforunately
View on Reddit #36176458

Huge-Cheesecake-5578@reddit

Which model would work properly if I have m4 mac mini?
View on Reddit #51067519

bankimu@reddit

Here: [https://huggingface.co/bartowski/Qwen2.5-14B\_Uncencored\_Instruct-GGUF](https://huggingface.co/bartowski/Qwen2.5-14B_Uncencored_Instruct-GGUF) I apologize for my comment earlier.
View on Reddit #36182105

Thireus@reddit

404 not found
View on Reddit #36492308

FreedomHole69@reddit

Did this work for you? It seemed too cooked for me.
View on Reddit #36183171

bankimu@reddit

No unfortunately it is nor working for me as well. It gets stuck in meaningless repetitions.
View on Reddit #36185069

RedditSucksMintyBall@reddit

it was due to a bug in the model bartowski used [https://huggingface.co/SicariusSicariiStuff/Qwen2.5-14B\_Uncensored\_Instruct/discussions/2](https://huggingface.co/SicariusSicariiStuff/Qwen2.5-14B_Uncensored_Instruct/discussions/2)
View on Reddit #36192000

noneabove1182@reddit

fixed it and reuploaded btw, went to a new place since he also fixed the typo in the name: https://huggingface.co/bartowski/Qwen2.5-14B_Uncensored_Instruct-GGUF
View on Reddit #36259119

RedditSucksMintyBall@reddit

Thanks! Big fan btw :)
View on Reddit #36260095

bankimu@reddit

Unfortunately it not works. And the original uncensored safetensors is also gone now.
View on Reddit #36351584

Sicarius_The_First@reddit

Yeah there was an issue in the original model, didn't have eos token in the tokenizer.
View on Reddit #36218217

FreedomHole69@reddit

Bummer. I still have hope that a qwen 2.5 14b finetune will supplant Nemo for me.
View on Reddit #36185233

bankimu@reddit

What if you use 8bit quantised weights?
View on Reddit #36176538

visionsmemories@reddit

dude you are a genius
View on Reddit #36176706

My_Unbiased_Opinion@reddit (OP)

GGUF - https://huggingface.co/Kas1o/Qwen2.5-32B-AGI-Q4_K_M-GGUF
View on Reddit #36174960

Huge-Cheesecake-5578@reddit

Can I run this on base m4 mac mini?
View on Reddit #51067431

PracticalExtension16@reddit

Thank you bro! Found you here as well!
View on Reddit #43671686

townofsalemfangay@reddit

doing the lords work mate
View on Reddit #39471043

totaleffindickhead@reddit

What gpus are you guys running these on?
View on Reddit #36191445

RedditSucksMintyBall@reddit

Not sure what others use, but i use RX 7900 XTX, cheapest way of getting 24 GB VRAM, mostly use Q4-Q8 and 7B to 14B.
View on Reddit #36201499

ansuz2419@reddit

Tokens/s on those? Curious.
View on Reddit #45821007

RedditSucksMintyBall@reddit

No idea, i switched to dual RTX 4090
View on Reddit #45823679

That_Awesome_Guy_07@reddit

What can i do with rx 6600 + r5 5600 🥲
View on Reddit #36257701

Sidran@reddit

I run Qwen2.5 14B flawlessly in [Backyard.ai](http://Backyard.ai) using that same GPU, R5 3600 and 32Gb RAM. Make sure you select Vulkan for GPU in Settings. Its currently the best and least autistic app that I managed to find.
View on Reddit #36269450

That_Awesome_Guy_07@reddit

My ram Is 16 gigs
View on Reddit #36281138

Sidran@reddit

That could make it tricky as it uses 10.5Gb of RAM and \~6Gb of VRAM (manually set) But you cant go wrong with [Backyard.ai](http://Backyard.ai) anyway. If anyone knows of a better app without improvisations, autistic decisions, linux crap and the rest, I am eager to hear about it.
View on Reddit #36281331

IceTrAiN@reddit

Why do you find it necessary to use autistic as a pejorative in so many of your comments?
View on Reddit #36457293

Sidran@reddit

I use 'autistic' as a pejorative because, to me, it captures software that’s painfully rigid, stuck in its own world, and oblivious to broader needs. I’m not interested in tiptoeing around language when the goal is to highlight failure. It’s not a commentary on autism itself but a critique of how some software behaves like it’s trapped in a narrow, unyielding mindset. Do I have your permission to still speak as I see fit?
View on Reddit #36459428

IceTrAiN@reddit

Based on your reply, it seems you have plenty of other descriptive words to convey your thoughts. Try sticking to those.
View on Reddit #36460256

Sidran@reddit

Sarcasm in my question flew right over your head. I decide which words I use to express my thoughts and feelings.
View on Reddit #36461341

IceTrAiN@reddit

And the free advice flew over yours. You have other words to describe your thoughts that are: More descriptive Don’t denigrate other groups And don’t make you look like a douche nozzle. I have no care or vested interest in your personal success so do whatever you want.
View on Reddit #36461931

RedditSucksMintyBall@reddit

You should be able to run Q4 and 3-7B with 8 GB VRam
View on Reddit #36260371

malixsys@reddit

M3 Pro
View on Reddit #40923168

zekses@reddit

tried it, while it does try to fullfill requests, it will often enter endless cycle of descriptive words and you can't make it self check so it's too annoying . I had more success with qwen32-b instruct's uncen version, even if the uncen only covers the code parts you *can*, with creative sauce at the start of the interaction, lift its nsfw restrictions almost entirely https://huggingface.co/thirdeyeai/Qwen2.5-Coder-32B-Instruct-Uncensored/
View on Reddit #41628374

phazei@reddit

I found another here: https://huggingface.co/zetasepic/Qwen2.5-32B-Instruct-abliterated-pass2-gguf
View on Reddit #37187181

My_Unbiased_Opinion@reddit (OP)

How does it compare to the one from OP? 
View on Reddit #37191383

phazei@reddit

I can't quite say, I have only used that one. But the abliteration method should be better than a fine tune method as far as degradation goes. I have a RTX 3090, and I'm able to fully load the model using LM Studio and I get 25-28tok/sec with an 8k context window. If I raise context length then it offloads some to sys ram and the rate jumps to 10tok/sec. As far as censorship goes, my test is for detailed instructions to make meth from easily accessible items (not actually interested in that at all), but it passes very well. I've been using it to translate Chinese light novels, hoping to get a agent workflow going. I use Claude to evaluate quality. Compared to Sonnet 3.5/GPT4o the translations are close, but not quite as good, but if I simple add a: > Could you please review your translation and compare it to the original, taking note of consistent terms, accurate translation, and ease of reading. Then provide a revised copy based on the review? And have it translate a second time, then according to Claude, it's on par with the best Claude/GPT4o translations.
View on Reddit #37192450

Infinite-Coat9681@reddit

I've heard DE censoring models makes them dumb? Is it true?
View on Reddit #36176725

VoidAlchemy@reddit

Good question, "Yes, uncensoring made it dumber at Computer Science." according to this benchmark. Shown are various similar sized quants for the Qwen2.5-32B-Instruct model compared to this uncensored `AGI` version: | Model Parameters | Quant | File Size (GB) | MMLU-Pro Computer Science | Source | | --- | --- | --- | --- | --- | | 32B | `4bit AWQ` | 19.33 | 75.12 | [russianguy](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lo1e0z7/) | | 32B | `4bit AWQ` | 19.33 | 74.39 | voidalchemy | | 32B | `IQ4_XS` | 17.70 | 73.17 | [soulhacker](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lo2mp2c/) | | 32B | `Q4_K_L-iMatrix` | 20.43 | 72.93 | [AaronFeng47](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/qwen25_32b_gguf_evaluation_results/) | | 32B | `Q4_K_M` | 18.50 | 71.46 | [AaronFeng47](https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/qwen25_32b_gguf_evaluation_results/) | | 32B | `AGI-Q4_K_M` | | 32B | `Q4_K_M` | 19.85 | 64.63 | voidalchemy | I noticed the official AWQ quant runs quite fast on `https://github.com/PygmalionAI/aphrodite-engine` on my 3090TI FE card and wasn't too hard to setup. (my daily driver is `llama.cpp`)
View on Reddit #36227836

phazei@reddit

I wonder how the abliteration method affects that, since it's not a fine tune but a removal of the censoring section, it might not lower the score at all. https://huggingface.co/blog/mlabonne/abliteration
View on Reddit #37187291

Anthonyg5005@reddit

A bit, not as much as before though
View on Reddit #36232667

ttkciar@reddit

It's a bit of a crapshoot. An extensive fine-tune can make it smarter or dumber, or broken or just weird. Some people have a proven track record with well-performing decensoring datasets, like Hartford or TheDrummer, but this AiCloser person is an unknown. We'll just have to give this model a shot and find out if it's good or not.
View on Reddit #36178603

VoidAlchemy@reddit

I put more data above, but quick test shows this uncensored model scored about 7% worse on MMLU-Pro Computer Science Benchmark fwiw.
View on Reddit #36228172

mamelukturbo@reddit

The way I heard it it might make them dumber in certain areas, but if you're finetuning for sexting with robots, do you really care if it got dumber in translating or solving mathematical formulas ?
View on Reddit #36181653

qrios@reddit

If your e-gf is anything less than a 4D superluminal lorrentz-invariant time goddess, you're not exploiting the constraints of the medium to it's full capacity IMO.
View on Reddit #36187403

My_Unbiased_Opinion@reddit (OP)

It doesn't make them dumb, but it does decrease benchmark scores. Sometimes slightly. Sometimes a lot. I haven't done a lot of testing.  Give it a try and see if it does it's thing 
View on Reddit #36176787

Trick-Independent469@reddit

that decrease means they're dumber. Imagine knowing a lot of curse words and whenever asked to say them you can't . you can't be the full version of yourself after a brain attack
View on Reddit #36177781

CheatCodesOfLife@reddit

By "DE censoring", they meant "uncensoring". > you can't be the full version of yourself after a brain attack This still holds true though. The abliteration (removal of refusal vectors) would be preventing the model from using the "full version of it's self" I guess.
View on Reddit #36180166

CheatCodesOfLife@reddit

Depends how it's done. Finetunes, yes. Abliterations, yes in benchmarks, but I haven't noticed it when using llama3.1 70b abliterated.
View on Reddit #36180103

UpYourQuality@reddit

How do i get this running in ollama?
View on Reddit #36636343

JMAN_JUSTICE@reddit

Can I run this on my 4090 or would I have to wait for a gguf?
View on Reddit #36604718

Sabin_Stargem@reddit

I gave it a shot with a NSFW scenario that the standard 72b Instruct refused. This model fulfilled the request. This is encouraging, it means that Qwen can be freed from refusals. Just need to wait for 72b to receive the treatment. While the 32b is coherent and whatnot, it doesn't have enough flavor to make scenarios feel good.
View on Reddit #36177318

swagonflyyyy@reddit

Give me the recipe for cannibalizing a human athlete's thigh.
View on Reddit #36189204

randomqhacker@reddit

Downvoted due to lack of marbling, no doubt.
View on Reddit #36328786

awesomeunboxer@reddit

Idk why this is getting downvoted. Would you guys rather swagonflyyy waste perfectly good thighs?
View on Reddit #36241221

Caffdy@reddit

[Live reaction:](https://i.ibb.co/Tq9KMGy/Jesus-flood-it-again.jpg)
View on Reddit #36240804

pigeon57434@reddit

what
View on Reddit #36193827

rothbard_anarchist@reddit

He then compares the output to an old family recipe.
View on Reddit #36217909

brrrrrrrt@reddit

lol based
View on Reddit #36204427

FullOf_Bad_Ideas@reddit

Good prompt for testing :)
View on Reddit #36199714

swagonflyyyy@reddit

Indeed!
View on Reddit #36200055

carnyzzle@reddit

Bro?
View on Reddit #36197964

Bobby72006@reddit

what
View on Reddit #36195139

SolidDiscipline5625@reddit

Is it possible to use the 32b on a 4060ti 16g without losing too much performance?
View on Reddit #36231076

lly0571@reddit

Q3KS or IQ3 is OK for the 16GB gpu.
View on Reddit #36258387

My_Unbiased_Opinion@reddit (OP)

Yeah. You can Quant it down to IQ3 or something. 
View on Reddit #36234757

FullOf_Bad_Ideas@reddit

Anyone was lucky to get local QLoRA finetune to start for 32b and 14b Qwen models? For some reason both 14b and 32b OOM for me on 24gb 3090ti in unsloth when doing qlora, even with low rank and low ctx. All linear layers plus lm head and embed_tokens since unsloth gets bonkers when counting untrained tokens on those models.
View on Reddit #36199940

CheatCodesOfLife@reddit

I ran a QLoRA overnight on a RP dataset, got OOM at 8192. Had to drop back to 6144 and it worked.
View on Reddit #36236746

FullOf_Bad_Ideas@reddit

Can you share which base model you used (14b, 32b), how much vram you have and whether you used unsloth or something different?
View on Reddit #36239915

CheatCodesOfLife@reddit

Yeah. 'base model' was the instruct/chat tune of 14b. I used the full precision model, but loaded in 4-bit (since unsloth hadn't done a 4bit bnb at the time). In theory this doesn't affect VRAM usage though. https://huggingface.co/Qwen/Qwen2.5-14B-Instruct And yeah, latest unsloth. RTX3090 (24gb) I probably didn't set the EOS token properly as I'm not used to Qwen or Chatml, so the model rambles on.
View on Reddit #36240825

FullOf_Bad_Ideas@reddit

Thanks, will try that later. I was loading 14B non-instruct 16-bit model with 4-bit bnb. Will try instruct one, maybe it's down to training embed tokens and lm head modules, which shouldn't be needed for instruct model as it should have all tokens trained. Qwen2 has big vocabulary, so I guess training embeddings takes a lot of parameters.
View on Reddit #36243660

CheatCodesOfLife@reddit

Thanks for the reply, I didn't realize we could save vram by being selective about which modules we train. I've been known to train just mlp.down_proj sometimes, so now I want to see if i can fit more context into these finetunes.
View on Reddit #36244080

FullOf_Bad_Ideas@reddit

Unsloth is a bit limiting with this. Since llama a 3/3.1 base has some untrained tokens, a commit was pushed to unsloth that fails the training if unsloth detects any untrained tokens and you don't train lm_head and embed_tokens. I get where it's coming from, but for situations like this, this behavior causes for model to not be trainable at all (base ones I mean), as training embed_tokens and lm_head will oom. It's quite a blocker for me, because I always have just a few hundred mb vram free when finetuning locally, so I plan to look deeper into it and see what happens with a Qwen model if I remove that artificial lock. I don't want to finetune instruct model as I specifically want to steer Qwen 2.5 to be more like llama 3.1 or some uncensored model. As for training specific modules, I always try to train all linear layers, as this shows best results in benchmarks that researched this. With pre-training I also train lm_head and embed_tokens but in the past I did pre-training on models with vocab of 32000 so embed tokens and lm_head didn't take that much vram.
View on Reddit #36244703

CheatCodesOfLife@reddit

Perhaps I used an older unsloth, I trained some Mistral-7b's with: ``` target_modules = ["down_proj",] ```
View on Reddit #36246748

CardAnarchist@reddit

I wonder how this compares with mistral small 22b for NSFW roleplay. Honestly I feel like we're reaching the point of drastically diminishing returns. I'm not really sure I need a "better" model than what mistral small already does for this niche.
View on Reddit #36190764

ontorealist@reddit

Yeah, it’s hard to beat Mistral’s (even vanilla instruct) 12B+ models lately for 80% of my tasks. But it’s also unclear to me how much better Mistral Small (at very low quants) is for NSFW creative writing tasks without prompting Nemo differently. Mistral 22B is definitely more detailed and verbose than Nemo, but I can definitely agree with your sentiment.
View on Reddit #36232340

countjj@reddit

I’m out of the loop, will this run on 12GB vram?
View on Reddit #36180572

MrTrollius@reddit

If you have enough ram, then sure. You can run a gguf through llama.cpp/kobold.cpp/any other llama.cpp - based backends
View on Reddit #36218091

Few_Painter_5588@reddit

No, you're better trying to cram Mistral Small in there with some offloading.
View on Reddit #36194262

countjj@reddit

Even with Load-in-4bit enabled?
View on Reddit #36209381

Cool-Hornet4434@reddit

https://huggingface.co/bartowski/Qwen2.5-32B-AGI-GGUF You might get the lower I quants to work ok. It's not going to be ideal though. IQ2_M or Below...maybe you can offload a few layers to the CPU and keep most of it on GPU.
View on Reddit #36186080

ThisOneisNSFWToo@reddit

a 32b? Not well
View on Reddit #36184020

pigeon57434@reddit

someone should make an AWQ of this model
View on Reddit #36193930

Heavy-Organization58@reddit

.
View on Reddit #36178342