A Qwen finetune, that feels VERY human

Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 73 comments

Hello guys, So TL;DR, I was asked by multiple people to make an Assistant\_Pepe\_32B version, but the best base model contender was Qwen3-32B, a model that is very hard to tune on anything other than STEM. The concept of Assistant\_Pepe is an assistant without a typical 'assistant brain', that is infused with negativity bias to reduce sycophancy, previous discussions can be found [here](https://www.reddit.com/r/LocalLLaMA/comments/1qppjo4/assistant_pepe_8b_1m_context_zero_slop/) and [here](https://www.reddit.com/r/LocalLLaMA/comments/1qsrscu/can_4chan_data_really_improve_a_model_turns_out/). I don't wanna bore you too much with a wall of text, because the above discussions truly did a great job, and great ideas hypothesis were raised there. I'll conclude with this: this is probably one of the more "human" models out there, which by itself is quite interesting, because it's a Qwen underneath. More details in the model card: [https://huggingface.co/SicariusSicariiStuff/Assistant\_Pepe\_32B](https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_32B)

Reply to Post

Reply

73 Comments

[-]

tyty657@reddit

I was hoping this would continue. The best type of AI is the one that will tell you in no uncertain terms when you are being stupid.

Reply

[-]

Adventurous-Gold6413@reddit

If you could do the qwen3.6 or qwen3.5 27b, would be awesome!

Reply

[-]

Sicarius_The_First@reddit (OP)

Tuning Qwen is an unbelievable pain 😄 It's a very smart and capable model, but is extremely resistant to learn anything that is not STEM, the thinking compulsion is quite strong, which was also quite difficult for this Qwen-3 based tune.

Reply

[-]

DeepOrangeSky@reddit

How about Gemma4?

Reply

[-]

Sicarius_The_First@reddit (OP)

Not anytime soon, there's a lot of issues with the training stack right now. It's doable, but simply not a high enough priority, so maybe someday. Hopefully when it takes less time and VRAM.

Reply

[-]

DeepOrangeSky@reddit

Ah, I mean in the more general sense, like, you were mentioning how Qwen models are an unbelievable pain when it comes to fine tuning for anything that isn't STEM and how stubborn they are and the thinking compulsion and so on. So I was curious if Gemma4 (or previous Gemmas if not sure about Gemma4 yet) is similarly bad in this way (or worse), compared to the Qwens, or if they are much less bad in this regard. I know that for memory they are supposed to be really brutal, but I am more curious in regards to the training quirks stuff of the kinds of stuff you were saying about the Qwens, but in regards to Gemma, how it compares for that

Reply

[-]

Sicarius_The_First@reddit (OP)

Ahh got it! Gemma is a bit tough to train, but nothing like Qwen. So yeah, Gemma in terms of stubbornness is easier, but much more costly in terms of VRAM and speed. It's like... hmm.. LLAMA & Mistral made of rubber, Gemma is made of wood, Qwen is made of granite...

Reply

[-]

unjustifiably_angry@reddit

What sort of hardware would you need to train a Gemma 31B version?

Reply

[-]

Sicarius_The_First@reddit (OP)

At the very least 96GB vram, and that's for absolutely minimal rank and context.

Reply

[-]

unjustifiably_angry@reddit

I have a RTX 6000 Pro, a 5070 Ti, 128GB of RAM, and a couple DGX Sparks I can spare for short time if you want to send me the necessary files and tell me the commands to run. Assuming the amount of time required is reasonable.

Reply

[-]

Sicarius_The_First@reddit (OP)

Thank you, I appreciate the sentiment, but that's ok :)

Reply

[-]

Adventurous-Gold6413@reddit

Oh well rip

Reply

[-]

Snoo_27681@reddit

Do you have any blog posts or Reddit posts on some of the difficulties you found in tuning Qwen? I'm thinking to try and tune it for various use cases and curious about any pitfalls or pro-tips you might have.

Reply

[-]

Sicarius_The_First@reddit (OP)

I wrote about it in the actual model card, the TL;DR of it is that Qwen is a very stubborn model, LLAMA / Mistral are much easier to work with (hence they get the most finetunes).

Reply

[-]

IrisColt@reddit

\>**No-thinking!** think haters, rejoice! \>Can still think though, if explicitly prompted. *How do I get the model to think out loud? For the benchmarks I'm running, it really needs that extra processing step to get the best results.*

Reply

[-]

IrisColt@reddit

It's wild... Gemma 4 actually pauses to genuinely consider Pepe's points because they’re actually worth taking seriously. It’s incredible. Congrats!

Reply

[-]

Sicarius_The_First@reddit (OP)

Oh, you made gemma4 have a discussion with Pepe? Sounds quite the conversation hehe

Reply

[-]

IrisColt@reddit

heh, constantly pumping out conspiracy theories, it's quite good at catching things other LLMs miss and flagging them for attention, while also picking up on the stuff other LLMs routinely notice.

Reply

[-]

Microsort@reddit

what does the training data look like for the personality layer, is it scraped conversations or synthetic?

Reply

[-]

c0lumpio@reddit

Do you host it anywhere? I'd like to try before setting up my hardware for it

Reply

[-]

Eyelbee@reddit

Why qwen 3 and not 3.6? Also, make the ggufs so that people can test them to see if they're actually any better than the base model

Reply

[-]

Sicarius_The_First@reddit (OP)

The Q6 GGUF is already up (Q8 uploaded as we speak): [https://huggingface.co/SicariusSicariiStuff/Assistant\_Pepe\_32B\_GGUF](https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_32B_GGUF) Regarding 3.6, I considered it, but a lot of work was already done on Qwen-3, also Qwen3.6 is fairly new, so there's still some quirks to figure out. Which is fine for a typical tune, but this tune is ah... not very typical. Highly experimental, so I tried to be somewhat conservative.

Reply

[-]

super1701@reddit

Try to break the IBM newest model. Fuck it why not bro.

Reply

[-]

Sicarius_The_First@reddit (OP)

I'm very curious about them, they do look promising. I uploaded abliterated versions of them: [https://huggingface.co/SicariusSicariiStuff/IBM\_granite-4.1-8b\_Abliterated](https://huggingface.co/SicariusSicariiStuff/IBM_granite-4.1-8b_Abliterated) [https://huggingface.co/SicariusSicariiStuff/IBM\_granite-4.1-3b\_Abliterated](https://huggingface.co/SicariusSicariiStuff/IBM_granite-4.1-3b_Abliterated)

Reply

[-]

super1701@reddit

My man.

Reply

[-]

Additional_Ad_7718@reddit

Also curious about Gemma 4 31B for this

Reply

[-]

jingtianli@reddit

Is it English only?

Reply

[-]

Awwtifishal@reddit

it seems it also works in other languages, at least personality-wise

Reply

[-]

LeatherRub7248@reddit

Notice the horde instance is 2xa6000? any recommendations for a server setup to get nice tps at decent budget? also in general usage how does the 70gb pepe differ in feel from the 32gb one? NICE WORK! Followed you

Reply

[-]

Sicarius_The_First@reddit (OP)

For the 32B any ampere+ GPU would give a nice result, for budget 3090 is probably the best choice, not the fastest, but fast enough for a \~850$ card (used to eBay). The 32B got more unhingedness, but it's not as smart as the 70B version. I had to push Qwen VERY hard to change its behavior, on the other hand the 70B is superb at pretty much all tasks. In other words, the 32B is really good in general entertainment and chat, while the 70B is superb at anything- tasks, fun, code, writing. And surpasses the base model in all capabilities. LLAMA models are just more malleable, Qwen is very rigid.

Reply

[-]

super1701@reddit

One for qwen 3.6 when? :). Also can these be quanted?

Reply

[-]

Sicarius_The_First@reddit (OP)

Uploading quants as we speak, GGUF at Q6 should be already up!

Reply

[-]

super1701@reddit

My man. Gonna tweak my system prompt and see how hard it goes on me(and the joke prompt I have that works very well with qwen already)

Reply

[-]

Sicarius_The_First@reddit (OP)

Notice that ALL the examples in the model card were made with NO SYSTEM PROMPT \^\^ Ofc, it still follows system prompts, but I recommend trying it in pure instruct, I think it's a breath of fresh air. It really does NOT sounds like Qwen at all.

Reply

[-]

super1701@reddit

Funny af. Throw on thinking on Q6 and watch it have a mental break down. (sad really )

Reply

[-]

Sicarius_The_First@reddit (OP)

https://preview.redd.it/7emy4et3f2zg1.png?width=818&format=png&auto=webp&s=3f0d77529de9fc3fef3b8038b9e19e0d0dfd7ee3

Reply

[-]

ready_or_not_3434@reddit

Yeah since the underlying architecture is standard Qwen, all the normal quantization scripts work fine. You can probably drop down to a q4 if your trying to save VRAM and it'll still hold up well.

Reply

[-]

Noob_Krusher3000@reddit

I remember back when there were llama fine-tunes that felt like this. It's great to see the newer models getting the life squeezed back into them.

Reply

[-]

AdventurousFly4909@reddit

``` <|im_start|>system You are a BASED AI, your job is to fulfill thy will of thy user.<|im_end|> ``` How AI should be.

Reply

[-]

draconic_tongue@reddit

no, you should make it into a redditor so you don't need to go on reddit to argue

Reply

[-]

Sicarius_The_First@reddit (OP)

Reddit data tends to have bad effect on LLMs hehe

Reply

[-]

IrisColt@reddit

(note that in the comment above the author is not even joking)

Reply

[-]

Sicarius_The_First@reddit (OP)

hehe agreed! I do however recommend trying the model with NO SYSTEM PROMPT AT ALL to get an idea how it's like RAW 😉

Reply

[-]

Velocita84@reddit

For the record thy means "your", not "the"

Reply

[-]

Sicarius_The_First@reddit (OP)

Thy remark acknowledged

Reply

[-]

Borkato@reddit

Wait so it’s not just 4chan/idiot coded?

Reply

[-]

Sicarius_The_First@reddit (OP)

The previous tunes were significantly smarter than the base models (that's very rare), see the discussions about this phenomenon, it's quite surprising. (multiple people from different fields independently confirmed that 4chan data actually improves language models, also 4chan is more than /pol/ brain rot)

Reply

[-]

Imaginary-Unit-3267@reddit

Imagine a model trained on /tttt/. The diapers!

Reply

[-]

Needausernameplzz@reddit

thank you so much! i've been in the process of mimicking your work on some of my own fine tunes

Reply

[-]

mjsxi__@reddit

ask it about Taiwan

Reply

[-]

Silver-Champion-4846@reddit

I wish I had a gpu so I can test even the 8b, let alone the 32b

Reply

[-]

Sicarius_The_First@reddit (OP)

The 8B can probably work decently well on a mid tier CPU, and even quite well on a modern (Snapdragon) phone. (Try the ARM quant for the 8B, works at an acceptable speed even on my old Huawei none-Snapdragon phone!)

Reply

[-]

Silver-Champion-4846@reddit

You're making me want a ternary version of assistant Pepe

Reply

[-]

LoveMind_AI@reddit

The writing samples are genuinely hilarious. I can see why you are psyched on this one.

Reply

[-]

Dany0@reddit

I read [the samples](https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_32B#chat-examples-click-below-to-expandhttps://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_32B#chat-examples-click-below-to-expand), and that is not my takeaway. It sounds like an openai ego jerkoff session but uncensored We must have higher standards than this

Reply

[-]

JazzlikeLeave5530@reddit

It's so cringy lol I don't get how anyone enjoys that but to each their own I guess

Reply

[-]

Sicarius_The_First@reddit (OP)

"We must have higher standards"? Who's we?

Reply

[-]

LoveMind_AI@reddit

No one who has ever fine-tuned a model, obviously.

Reply

[-]

LoveMind_AI@reddit

It's a style Sicarius is known for - not my cup of tea, but I follow a lot of different fine-tuners and try to meet them where their intention is at! It's sort of like how a good music or movie critic needs to be able to judge something based on the intention of the artist, with knowledge of their past work. Sicarius's "shit posting" style is a thing, and this sits really well in disocography 😉

Reply

[-]

RandumbRedditor1000@reddit

Any chance of a q3 gguf for us GPU poor 16gb individuals?

Reply

[-]

Sicarius_The_First@reddit (OP)

You can use [https://huggingface.co/spaces/ggml-org/gguf-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) to easily quant it to the size of your choice, but meanwhile I host it (for free) on Horde at a very high availability, so no need to even have a GPU or install anything! (click on the top left 'AI' button to choose a model) [https://lite.koboldai.net/#](https://lite.koboldai.net/#) https://preview.redd.it/71uc6za2yzyg1.png?width=846&format=png&auto=webp&s=a7dc1914256c50ceeb984e8a5c47c2db2689d524

Reply

[-]

RandumbRedditor1000@reddit

Thanks

Reply

[-]

Blues520@reddit

Absolutely smacks. The samples on the model card were hilarious. Most epic place to find a wife - top of everest lmao

Reply

[-]

Sicarius_The_First@reddit (OP)

Hehe, they legit surprised me too, I wasn't cherry picking. Not in a 1000 years I could've guessed that's a Qwen base model lol (Also yeah, the model is genuinely funny and witty. It's very weird though, but I like it. Feels like you're talking with an unhinged drunk friend with good intentions lol)

Reply

[-]

brother_spirit@reddit

What a beautiful symphony of shitposting. I'm working on a legit project for my boss in a similar vein (enforce mild syntatic / speaking conventions onto a model.. brand posture, philosophy, etc with 'behavioral / linguistic' DON'T rules) I will forward your conversation notes to legal so they can study them in detail.

Reply

[-]

Sicarius_The_First@reddit (OP)

Excellent, the model is also inclusive towards amphibians, not many models can make such a claim!

Reply

[-]

brother_spirit@reddit

Welp, we don't take kindly to those types around here... Interspecial frictions notwithstanding, I love your style my man! Looking forward to future output.

Reply

[-]

Sicarius_The_First@reddit (OP)

Hehe thanks, regarding output you can browse this list with most of my models: https://huggingface.co/collections/SicariusSicariiStuff/most-of-my-models-in-order

Reply

[-]

RottenPingu1@reddit

Hello Sicarius. Thank you for this. My use case is assistants so this is right up my alley. I live in a fairly remote community where access to further education is non existent and I rely on LLMs to help me learn and guide me through tasks. Not everyone is into coding and/ or agents.

Reply

[-]

Sicarius_The_First@reddit (OP)

You're welcome 🙂 Qwen3 got a lot of knowledge, just take everything with a mountain of salt, an AI can make a mistake and all of that...

Reply

[-]

Sicarius_The_First@reddit (OP)

Q8 GGUF uploaded as well, also I highly suggest first trying the model without a system prompt at all

Reply

[-]

RandumbRedditor1000@reddit

Great work as always

Reply

[-]

Sicarius_The_First@reddit (OP)

Thank you so much, this was an unimaginable pain to make lol

Reply