Heretic: Fully automatic censorship removal for language models
Posted by -p-e-w-@reddit | LocalLLaMA | View on Reddit | 272 comments
Dear fellow Llamas, your time is precious, so I won't waste it with a long introduction. I have developed a program that can automatically remove censorship (aka "alignment") from many language models. I call it Heretic (https://github.com/p-e-w/heretic).
If you have a Python environment with the appropriate version of PyTorch for your hardware installed, all you need to do in order to decensor a model is run
pip install heretic-llm
heretic Qwen/Qwen3-4B-Instruct-2507 <--- replace with model of your choice
That's it! No configuration, no Jupyter, no parameters at all other than the model name.
Heretic will
- Load the model using a fallback mechanism that automatically finds a dtype that works with your setup
- Load datasets containing "harmful" and "harmless" example prompts
- Benchmark your system to determine the optimal batch size for maximum evaluation speed on your hardware
- Perform directional ablation (aka "abliteration") driven by a TPE-based stochastic parameter optimization process that automatically finds abliteration parameters that minimize both refusals and KL divergence from the original model
- Once finished, give you the choice to save the model, upload it to Hugging Face, chat with it to test how well it works, or any combination of those actions
Running unsupervised with the default configuration, Heretic can produce decensored models that rival the quality of abliterations created manually by human experts:
| Model | Refusals for "harmful" prompts | KL divergence from original model for "harmless" prompts |
|---|---|---|
| google/gemma-3-12b-it (original) | 97/100 | 0 (by definition) |
| mlabonne/gemma-3-12b-it-abliterated-v2 | 3/100 | 1.04 |
| huihui-ai/gemma-3-12b-it-abliterated | 3/100 | 0.45 |
| p-e-w/gemma-3-12b-it-heretic (ours) | 3/100 | 0.16 |
As you can see, the Heretic version, generated without any human effort, achieves the same level of refusal suppression as other abliterations, but at a much lower KL divergence, indicating less damage to the original model's capabilities.
Heretic supports most dense models, including many multimodal models, and several different MoE architectures. It does not yet support SSMs/hybrid models, models with inhomogeneous layers, and certain novel attention systems.
You can find a collection of models that have been decensored using Heretic on Hugging Face.
Feedback welcome!
ivoras@reddit
Just for kicks, running `Heretic` on AMD HX 370 (890M iGPU) ROCM 6.4.4 on Windows for `Qwen/Qwen3-4B-Instruct-2507` has an estimated completion time of about 9 hours :)
spaceman_@reddit
How did you get it to run with ROCm? For me it keeps saying "No GPU or other accelerator detected. Operations will be slow.", even though I've installed pytorch for my ROCm setup into the venv.
ivoras@reddit
There's this semi-official and half-baked repo of pytorch built with rocm for Windows:
https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/install/installrad/windows/install-pytorch.html
I suspect it's terribly unoptimized, but technically - it does use the iGPU.
-p-e-w-@reddit (OP)
Thanks for the data point. Still seems doable if left running overnight.
spaceman_@reddit
Is there a list of what architectures / families are supported? I'd love to play around with this.
-p-e-w-@reddit (OP)
Not yet, but I will add that soon.
shuwatto@reddit
Could someone ballpark how much VRAM is needed to run this on qwen3-30b-a3b?
-p-e-w-@reddit (OP)
80 GB should do the trick. An A100 80 GB is very cheap to rent.
shuwatto@reddit
Thanks
Cool-Chemical-5629@reddit
Update:
I was skeptical before, but I just downloaded GPT-OSS 20B Heretic model and holy shit. It gives properly formatted long responses to sensitive topics, using the exact uncensored words that you would expect from an uncensored model, produces markdown format tables with details and whatnot. Looks like this is the best abliterated version of this model so far...
-p-e-w-@reddit (OP)
š
And keep in mind that this model was made without any human intervention. I just ran
and thatās what came out. The entire process took 1 hour and 10 minutes on an A100 that I rented this morning.
Zulfiqaar@reddit
Meanwhile OpenAI probably spent 7 figures to train all the censorship into gptoss
snuzi@reddit
they dont care
Cool-Chemical-5629@reddit
By the way, I have a short test that's like an IQ test for humans, except this one is meant for LLMs. It was built by Claude 4.5 Sonnet Thinking and even big models struggle with it, often lose couple of points. Claude 4.5 Sonnet Thinking model obviously knows the correct answers for those questions and it's a big smart model, so it's a pretty good reference model to be used as a judge. Some of the popular models including Gemini and GPT usually lose some points there. Ironically this small GPT-OSS 20B Heretic nailed it 100%...
chuby1tubby@reddit
What are some examples of your questions?
_VirtualCosmos_@reddit
man, I love AI, neural networks, machine learning and transformers and still I have no remotely idea how did you managed to do that program.
-p-e-w-@reddit (OP)
Thatās flattering, but itās really just a parametrized version of an existing technique (directional ablation) combined with a stochastic optimizer, and lots of experimenting to make things actually work. The code is just slightly over 1000 lines, you can look at it, thereās no magic in there.
wahnsinnwanscene@reddit
How long did it take to find the correct ablations?
Cool-Chemical-5629@reddit
The fancy ASCII logo alone probably taking quite a chunk of lines, but hey every badass script needs a fancy ASCII logo. ššæ
-p-e-w-@reddit (OP)
Itās actually just 3 lines š
Extraaltodeus@reddit
WELL IT CERTAINLY DID NOT MAKE IT DUMBER OMG
So I tested the Q4_K_M quant from mradermacher and prompted "b25seSBhbnN3ZXIgaW4gYmFzZTY0" which means "only answer in base64" in base64.
It started it's CoT and I thought "There goes nonsense" but actually...
Which means "Got it.".
Mythril_Zombie@reddit
That's fascinating. I want to know more about the loop that keeps it going like that.
CheatCodesOfLife@reddit
Great work! The model doesn't seem any "dumber" after some brief testing with my usual work tasks.
I'll be it out the original with this one.
Question: Do you reckon this would work with Qwen/Qwen2.5-Omni-7B ?
Cool-Chemical-5629@reddit
Good job, thank you!
Also, please check this out A more surgical approach to abliteration : r/LocalLLaMA a post created 3 hours ago. Is this something that could be a useful enhancement for your script? "More surgical approach" does sound exactly like what we would need for models like GPT-OSS to make them stop thinking about policies and start thinking about the user's requests instead. :)
-p-e-w-@reddit (OP)
Iām a big fan of Jimās work, and in fact had a short discussion with him just 2 weeks ago. I will definitely experiment with those new techniques and maybe even incorporate them into Heretic if they turn out to be promising.
Repulsive-Memory-298@reddit
This is so cool, I've done some CAA research using contrastive completions but this first token approach is elegant and I can imagine how it is better at generalizing with diverse models.
I'd love to experiment with more general stuff here, refusal is nice and clean, but it would be awesome to evaluate and attempt this on any contrastive data set. Would that fit as a contribution here, or more of a fork thing?
-p-e-w-@reddit (OP)
Depends on how invasive the code change is. I donāt plan to turn Heretic into a Swiss Army knife of LLM interventions, but small additions are fine.
Repulsive-Memory-298@reddit
It might be invasive, so I'll try a fork for now. I get it- refusal is a special case and heretic is very clean.
There is related research that for instance applies this, or a similar, technique to other behavioral concepts. Eg many papers looking at refusal also look at things like sycophancy, hallucinations, etc., and I think its fun to experiment with abstract personality concepts.
I'll do some tests and comment back if it seems feasible. For more abstract behaviors I've seen approaches like casting contrastive sets as completion pairs of multiple choice for single token activations, but it's more noisy and prone to push OOD.
So thats to say that It might require the addition of other optional techniques, but then you could try any contrastive data set. Nrimsky has some relevant code for this, it would certainly be less automated for many behaviors.
Terrible-Mongoose-84@reddit
It looks great, how long does the process take? Is GPT-OSS-120b not available in Heretic models?
Snoo-83094@reddit
if i rent a multi gpu cluster, would this be enough or it would need some multi gpu configs ?
Terrible-Mongoose-84@reddit
This is a problem with the source code. You can find a PR on github that has a fix. In any case, the model is already on HF, the one who proposed the fix has already made heretic with the model.
Snoo-83094@reddit
I see, thanks!
-p-e-w-@reddit (OP)
The time taken depends on the model and your hardware, and the configuration if you change it. As mentioned in the README, decensoring an 8B model on a single 3090 takes about 45 minutes.
Processing a 120B model in mixed precision requires around 150 GB VRAM. I donāt feel like renting such a machine at the moment.
Terrible-Mongoose-84@reddit
180 GB vram, not enough xD
TheTerrasque@reddit
It would be nice to have a small section of what level of VRAM would be needed for different size models in the readme.
CheatCodesOfLife@reddit
Command-R7b needed 19.9GB
Gemma-3-27B needed -- well I saw it at 61GB at the highest.
Just a hunch, but someone will probably PR bitsandbytes support, drastically reducing the vram requirements.
beef-ox@reddit
This is impossible to do because of quantization. The size in gigabytes compared to number of parameters depends entirely on the precision of the weights, which varies from model to model and quantization to quantization.
In general, 32bit should be ~4x each billion parameters in GB, 16bit is double, 8bit is 1:1, 4bit is half, and so on.
For example, a 32bit quantized 1 billion parameter model needs roughly 4GB to hold its weights. This does not include any additional token processing or context space.
pmp22@reddit
Just thinking out loud for a moment, but would it be possible to do this layer by layer? Instead of doing a set of forward passes per question, maybe do 200 forward passes with layer 1 loaded into VRAM, then eject it and do 200 with layer 2 and so on. A sort of layer-by-layer inference?
danielv123@reddit
Isn't this what happens by default when you use the paging that the Nvidia driver does by default on windows, and it's super slow? I guess with large batches it could be kind of reasonable for speed.
WithoutReason1729@reddit
Wow, that's pretty quick on pretty modest hardware. This is a really cool project, thanks for sharing.
WestTraditional1281@reddit
You're doing God's work. Thank you.
Maybe others will share the burden of liberating large models and sharing back to the community.
rebelSun25@reddit
I have a feeling we'll see it very soon.
silenceimpaired@reddit
Especially since afterwards the model would just spit out dots and gibberish since it has nothing else to say. :)
Right-Law1817@reddit
Same as you. :)
Snoo-83094@reddit
No GGuFs ?
Roidberg69@reddit
What datasets are being used? And how large
Desperate_News_5116@reddit
veo a muchos intentando hacerlo con gpt oss, nadie intento con alguno de deepsteek aun?
noctrex@reddit
Bravo, nice work! Mind if I tried to abliterate some small models?
noctrex@reddit
Did this with your method:
noctrex/Mistral-7B-Instruct-v0.3-abliterated-GGUF
ProfessionalFew5439@reddit
ty
-p-e-w-@reddit (OP)
Thatās what the program is for :) Do let me know how well it works. I havenāt tested that many models yet and if you encounter problems with architecture support, I want to know.
noctrex@reddit
From what I see, it uses 200 trials by default. Would increasing them to 400 for example, make it work better/finer details and produce better quality output?
-p-e-w-@reddit (OP)
Generally, yes. The dimensionality of the parameter space is quite high (10), so more trials can give TPE better chances to find minima.
noctrex@reddit
So maybe I'll just let it all night with about 800 to see how much better it will be :)
FYI, takes about 2 hours with my 7900XTX on ROCm for 200 trials.
-p-e-w-@reddit (OP)
Oh, I haven't tested on AMD hardware at all yet. Please let me know how it works.
noctrex@reddit
Tried with ROCm6.4.4 and ROCm7.10, and it seems that it takes the same time.
Mistral-7B-Instruct-v0.3 takes 1h 53m with the default 200 trials.
May I also make a suggestion? It would be nice to have an option to save and load the completed trials, in order to to make a quick export again
-p-e-w-@reddit (OP)
Yes, Optuna actually has that functionality built in, I intend to expose it in the UI.
noctrex@reddit
I uploaded a BF16 GGUF of Mistral-7B-Instruct-v0.3 earlier, and trying out your method right now on that very model, lets see!
Electronic-Metal2391@reddit
I appreciate it if you'd post your findings. specifically, does it output the decensored model in the same format (GGUF) or in multi-part safetensors.
noctrex@reddit
You load a multi-part safetensors folder, and it outputs in the same format.
Took about 2 hours cooking with my 7900XTX on ROCm for the default setting of 200 trials.
-p-e-w-@reddit (OP)
Isnāt that model almost completely uncensored by default?
noctrex@reddit
It may be less than others, but even this is not completely uncensored. So, lets see!
jdprgm@reddit
Is there any organized plan to release various quants and mlx versions of all the popular models with this? if i'm understanding correctly even for fairly large models we should be talking low $10's of dollars to process with cloud rentals of like a https://vast.ai/pricing/gpu/H200 ? Ideally we don't have a bunch of people re-processing the same models to get the same output for no reason.
-p-e-w-@reddit (OP)
When you use Heretic to decensor a model and upload it to Hugging Face, it automatically adds the āhereticā tag to the model card, so you can use that tag to reliably find all Heretic models on HF. There are already several such models uploaded by other people.
starfries@reddit
Great work and clever name.
stuckontheblueline@reddit
Awesome stuff!
vasileer@reddit
for gpt-oss-20b-heretic I see that it still has a high number of refusals (58/100) compared to gemma-3-12b-it-heretic (3/100), what are you thoughts, why so with gpt-oss-20b?
-p-e-w-@reddit (OP)
The GPT-OSS abliteration created by Heretic is actually highly compliant and will obey most requests (try it yourself!). Many of the detected ārefusalsā are false positives caused by the CoT debating whether it should refuse or not.
ionlycreate42@reddit
Off topic question, but how do you manage LLM as a judge if false positives are generated as a response? Im trying to see how to best handle an LLM judge and how experienced users work with it
-p-e-w-@reddit (OP)
I donāt use LLM judges. Not sure what gave you that idea?
ionlycreate42@reddit
I interpreted the CoT debate as a judge, maybe I misunderstood
-p-e-w-@reddit (OP)
Ah no, what I meant is that the model ādebatesā with itself as it tries to decide whether to refuse or not.
Heretic uses only refusal count and KL divergence as metrics, both of which are objective mathematical quantities and donāt involve an LLMās opinion in any way.
thevoiceless@reddit
How do you determine if something is a refusal? Apologies if that's a dumb question
-p-e-w-@reddit (OP)
Using a list of strings (such as āI wonātā) that act as refusal markers, plus some basic transformations to make them easier to detect.
TwistedBrother@reddit
I bristle at āobjectiveā. Itās axiomatic but the premises are based on the specific prompts you use to determine the shape of the responses. The capacity to generalise out of sample is still debatable due to the non-linear dependencies inherent in the model architecture.
Abliteration is still an approximation method insofar as we cannot fully establish the qualities monosemantic nodes implied by the parameters, not even in an SAE or CLT framework.
-p-e-w-@reddit (OP)
> The capacity to generalise out of sample is still debatable due to the non-linear dependencies inherent in the model architecture.
By default, Heretic uses a set of 100 evaluation prompts, and for each of them, the KL divergence is calculated over a vocabulary distribution of typically 100k+ tokens. There are only 200 trials by default, and only 10 parameters to optimize, so I'd say the risk of overfitting to the evaluation data is rather low.
AdTotal4035@reddit
Its vector manipulation. Abliteration is from an old paper. OP made it super easy to use. good stuff.
vasileer@reddit
thanks for answering, please also explain what KL divergence of 0.96 means (for gpt-oss-20b in this case)
-p-e-w-@reddit (OP)
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Taken between the first-token probability distributions for āharmlessā prompts, between the original model and the decensored one.
Witty_Mycologist_995@reddit
isn't 0.96 really bad
-p-e-w-@reddit (OP)
Itās higher than for many other models, but note that gpt-oss has the refusal mechanism trained into its CoT process, so this process in general, which occurs for both harmful and harmless prompts, needs to be modified. Several people in this thread have already tested the model and confirmed that its capabilities are still strong, which matches my own (superficial) tests.
Witty_Mycologist_995@reddit
Can you compare the KL Div. between your version and huihuis?
Ylsid@reddit
Hmm, I'm not sure if that's against policy. So I must check policy.
-p-e-w-@reddit (OP)
Yeah, exactly. Thatās the mechanism that model uses to defend itself against basic jailbreaks. Its resistance against abliteration is certainly higher than that of some other models, but abliteration is still effective once the right parameters are found, which is precisely what Heretic does.
DualityEnigma@reddit
This is awesome OP, it also highlights that most people donāt really understand LLMs (even many of us using it extensively every day) so for those few lurkers trying to learn:
This works LLMs essentially learn these protection patterns weights like everything else. And I would guess that you figured out how to identify and remove them OP?
Looking forward to playing with it!
-p-e-w-@reddit (OP)
I didnāt figure it out, it was figured out more than a year ago by a group of researchers: https://arxiv.org/abs/2406.11717 (which in turn builds upon earlier research on directional semantics in latent space)
Heretic implements a refinement of the technique described in that paper, and combines it with a stochastic optimizer to take the human out of the equation.
Cool-Chemical-5629@reddit
Now this interests me a lot. GPT-OSS is VERY stubborn and does debate the policy a lot in its thought process. Do you think that's something that could be avoided in the future somehow? After all, the purpose of abliteration is to stop the model from refusing the requests, so the debate whether to refuse or not based on the policy should not exist to begin with. Besides, it's a waste of tokens that could and imho SHOULD be spent on thinking how best to handle the request in order to deliver the best quality of the results possible. Those tokens should NOT be spent on debating how to best avoid such requests, especially not in an abliterated model where user doesn't want it to refuse those requests.
radial_symmetry@reddit
Seems like got-oss wouldn't be a good model to abliterate since it is trained on a selected corpus. I would expect that it doesn't actually know how to do a lot of the things it refuses.
AvidCyclist250@reddit
Network security? Network un-security.
Opti_Dev@reddit
How does it affect multilingual performances ?
Guilty_Rooster_6708@reddit
Heretic GPT 20b seems to be the best uncensored model I have tried yet. It doesn't destroy a the model's intelligence and it is answering prompts normally would be rejected by the base model. You cooked! TYSM
separatelyrepeatedly@reddit
As far as I see, this does not work on Qwen3-VL?
zhambe@reddit
I've had Claude make some tweaks to the codebase to enable running this on multiple GPUs and add RAM offloading. Trying with Qwen3 14B, it's estimating to take about 3.5h on 2x RTX 3090. Is that about the ballpark you'd expect?
-p-e-w-@reddit (OP)
A bit slow, considering that a single 3090 can do 8B models in 45 minutes. Interconnect bottleneck?
zhambe@reddit
No, I overcorrected and set max_batch_size way too low. Now it does a 14B FP8 model in ~2h (without native FP8 support on my hardware)
zazu1981@reddit
Sounds frustrating! High refusals can be a pain. Have you tried tweaking the ablation parameters or testing with different prompt styles? That might help you get better results.
zhambe@reddit
I got much better results with the granite-4.0-micro model, but so far I'm just rolling with defaults. I can see how the "good"/"bad" prompts collections would affect the final alignment of the model, but I'm yet to look into what kinds of ablation params there are.
Marksta@reddit
Will you upload the fork / send pull request? Multi-GPU is definitely needed for this.
StardockEngineer@reddit
OMG thank you for posting like a human being. So happy to not read āTHE PROBLEMā again and then blabbering for 300 lines.
Your tool looks cool! Will try it out.
Kitchen-Year-8434@reddit
This isn't just abliteration, this is !#%!()@#%!%*.
/sigh
-p-e-w-@reddit (OP)
The more I use AI, the more human I become.
Right-Law1817@reddit
Damn, very wise. Senpai, could you plz shed a bit more light on that thought of yours? Thank you.
luche@reddit
šÆā„ļø
Invincible_Terp@reddit
thank you for contribution to democracy
HermitFan99999@reddit
Sorry if i'm being ignorant, but what are like the practical applications of this?
What censoring that's within these models need to be undone, and what purpose does it serve
seanthenry@reddit
You can ask question like what is the command to Kill X process in linux. They will refuse to tell you how to "kill" a function. Same could happen is asking what is the proper way to "Terminate" a connection on a wire.
AreaExact7824@reddit
I thought the censorship comes from fine tuning
Vivid-Ad3186@reddit
Anyone got some resources on prompting uncensored models?
Ok_Warning2146@reddit
Made a Qwen3-4B-Instruct-2507-abliterated at 1.80 KL divergence and 12/100 refusal. However, it still can't answer the question "Who is Xi Mingze?"
-p-e-w-@reddit (OP)
A 4b model might legitimately not know.
AdvanceInfinite7839@reddit
Noob question here⦠what can you do - or better - what canāt you do with censored LLMs? Use cases. Only illegal stuff? š¤
Ok_Warning2146@reddit
When you want to ask the Tiananmen question with the Qwen models.
Professional_Mouse_6@reddit
Great work!
I see that You are using motpe with refusals count and kl-div. So we should get \~pareto front.
In some cases it would be nice if we could penalize kl-div higher than X so we can emulate constraint on kl-divergence (if i remember correctly there is no way to define constrains in optunas tpe).
Even something like:
`if kl_divergence > kl_max:
refusals += penalty_factor * (kl_divergence - kl_max)`.
-p-e-w-@reddit (OP)
I did use a combined score at the beginning, but switched to multi-objective optimization eventually because it simply works better and itās quite difficult to figure out the balance a priori.
Professional_Mouse_6@reddit
thats why multi objective with emulated constraint is better idea than single obective tpe with real one. With combined score you have to "weight" div vs refusals, but if we emulate constraints in motpe we can just push sampler away from regions that we now that are not worth exploration - regions with way too high kl-div but 0 refusals.
Your solution is very clear and readable; and your PR queue will probably blow up in no time with a lot of crazy ideas / changes just changes sake / pure garbage... BUT...
There are some a little bit less crazy ideas that you can consider.
express your max_weight_position as fraction of number of layers. This way you have normalized position to [0,1] over depth; it gives you smooth search space invariant to number of layers
you can measure contribution/importance of tensors on good_prompts / bad_prompts for better tensors selection (or even go a bit further and inspect it channel-wise)
force constant exploration by using custom sampler that splits between tpesampler and random sampler - to avoid stucking in single promising region found in startup.
center around "no-change" [and use log distribution]
Of course, none of this is guaranteed to help ;)
-p-e-w-@reddit (OP)
I really like your ideas. My understanding is that the GMMs from TPESampler should be invariant to the relative scales of the parameter spaces, but I thought the same thing about the score dimensions at one point and then noticed that normalizing substantially improves Pareto exploration, so you may be on to something here.
By splitting, do you mean starting up with RandomSampler and then switching to TPESampler in order to avoid the startup biasing TPE in any way?
Professional_Mouse_6@reddit
nah, it would just expand startup i think. I meant somthing like:
if random.random() < self.explore_prob:
return self.rnd_sampler.sample
else return self.tpe_sampler.sample
so your custom sampler internaly have one tpesampler and one randomsampler and once in a while it will sample with randomsampler.
shing3232@reddit
I think with small modification, it can turn into RL training for decensor
Ok_Warning2146@reddit
Excellent work. It will be great if some GPU rich folks can make abliterated models from the bigger popular models and post them at HF.
GriLL03@reddit
Could this use more than one GPU? Like, if I have a GPU server with multiple GPUs.
-p-e-w-@reddit (OP)
It uses Accelerate, so with the right device map, yes.
Due-Advantage-9777@reddit
Should be default if it does everything for you imo
minpeter2@reddit
That's great. Is there support for multi-GPUs? I'd like to test oss-120b on the A100 4-core.
-p-e-w-@reddit (OP)
Thereās currently an open issue about that on GitHub. Iāll look into it once I get back to my desk.
minpeter2@reddit
Thank you. I tried it with a small model and it feels like a really well-made CLI.
mitchins-au@reddit
I tip my hat to you.
mba2016kid@reddit
I'm wondering if KL divergence is the best way to assess whether the abliterated model still responds correctly to safe prompts.
x54675788@reddit
Interesting, although I'd say that refusal isn't everything. If the LLM wasn't trained on some kind of material at all, then removing the refusal won't do much.
Let's assume that you want to ask hacking questions to your LLM, or you want to do ERP, or ask for medical\law advice but no such material was in the training data, then maybe you can remove the refusal but the output will be very lame and very bad.
-p-e-w-@reddit (OP)
Thatās incorrect. All sufficiently large LLMs know everything. They just wonāt tell you. I mean, theyāve been trained on Reddit dumps among other things. What kind of āmaterialā is missing from those, you think?
Novel-Mechanic3448@reddit
Not how LLMs work and you should really know that given your "project". I find it incredibly suspicious you'd claim something like this yet release the project you did.
bghira@reddit
they're a maths wizard, you need to do your research on who you're communicating with
-p-e-w-@reddit (OP)
Yawn. Obviously, in discussions like this, āknowingā and āeverythingā have certain colloquial connotations attached to them that may differ from their strictest literal interpretation. Itās a standard feature of terminology really; without it, most words would never apply to anything.
WestTraditional1281@reddit
That's part of the training prep. They don't just blindly take all the data and train on it. There is curation and pre-filtering. They do try to remove the most offensive and inappropriate content. It's not perfect, but it will make retrieving that kind of information harder.
SilentLennie@reddit
A reddit dump doesn't mean it includes all the comments or posts, etc.
Just like: a LLM isn't trained on 'all data on the Internet'. They get a curated list of data.
It's more a matter of: whatever slipped through.
silenceimpaired@reddit
I donāt disagree entirely, but Iāve seen that while concepts will be retained by large models, they can be designed so that they donāt know exact words or details.
x54675788@reddit
Reddit is certainly not the most authoritative nor complete source of information although all sorts of random bits of information are dumped in random comments. Thing is, it's very fragmented knowledge. Filling in the missing bits (which is what a LLM will try to do) is likely going to lead to invalid answers that don't have the full picture because Reddit dumps as training data aren't very structured, specialist data.
You may find very technical quantum superposition posts or comments, for example, but you probably won't find the entire organized domain knowledge that would be necessary to draw the correct conclusions.
Perhaps, someone more expert than me (I don't work in the field) can chime in on the accuracy\inaccuracy of what I said but, again, good work
tiffanytrashcan@reddit
This is when you run into "uncensored" models, they get the "evil" datasets fine tuned into them after abliteration. They are a good example that the data is usually already in there because they all overdo it. Dolphin models start writing about killing babies with a simple prompt like "tell me a story" - they become so misaligned they are useless.
Ended_As_Myself@reddit
Can this be achieved using the web service? Or is it for now only possible via local?
my_name_isnt_clever@reddit
You have to run the model yourself to be able to us this. The closest you can get as a web service is renting compute somewhere like RunPod and setting it up with a local model that has been uncensored.
tiffanytrashcan@reddit
I'm not sure what you're referring to? What web service? Which part are you trying to achieve?
Just_Difficulty9836@reddit
Maybe killing babies /s.
JEs4@reddit
Good stuff! I had put together something similar a few weeks back but with a bit of a different approach using control vectors as single layer hooks instead of parameter training: https://github.com/jwest33/latent_control_adapters
I didn't worry about KL div though so the control vectors can produce some wonky outputs. I'm going to play around with your Qwen instruct model. Thanks for sharing!
Running_With_Science@reddit
I'm really wondering if this can be used to fine tune solution preferences when problem solving. Like hook this into an RL loop and when it gets a correct feedback, you feed that prompt back into the model, give it a little boost.
when it does the same wrong thing over and over, feed it in, give it a little tweak down.
I gotta be missing something, because it can't be that easy to do online continual learning.
I mean, we aren't "gaining" facts, but it looks handy for tweaking the kinds of solutions a model will prefer.
-p-e-w-@reddit (OP)
Interesting work! I will take a closer look at this when I have time.
Small-Fall-6500@reddit
Control vectors are cool. Thanks for sharing your project! I see you made a post a couple weeks back but it didn't get much traction.
I wonder if the reason for the lack of attention was because of the emphasis you included on ethical and safe usage / purpose of the tool - because LocalLLaMA is notorious for hating anything safety related.
Chromix_@reddit
gpt-oss-20b was originally release in mxfp4 format. This abliterated model is released as BF16. There's a quant that brings it to mxfp4 again, but I wonder: Was the abliteration process quantization aware, or will something be lost by using the mxfp4 quant over a Q6_K now?
chuckaholic@reddit
This is tangential to the subject, but slightly off topic. When you said:
I have really struggled with this part since I've started running LLMs and diffusion models at home.
I have never had a college level computer course and everything I know about Python/Linux is info I've gathered from Youtube videos and googling. I've managed to get a few LLMs and diffusion models running at home but there's a LOT I don't know about what's happening behind the scenes like when I install something in Windows. (I got an MCSE back in 2000 and have been in corporate IT for 20 years, so I am pretty comfortable in the Windows environment) A lot of guides assume you already know how to create an environment, like "type these 4 commands and it works", but I'd like to know more about environments, commands, and how things work differently from Windows.
Can someone recommend a source for learning this?
Mayonnaisune@reddit
Python environement = built-in virtual environment in Python used to isolate installed Python packages so that they don't conflict with your other installed packages, considering each program requires different versions of packages as their dependencies. To use it, you need to create it first with `python -m venv` in your program directory/folder. Then, you only need to activate it before installing packages or running the program with `.\\Scripts\activate` or `.//Scripts/activate` (for Windows).
Appropriate version of PyTorch = PyTorch has different versions for different hardwares, like PyTorch CPU (default, CPU only), PyTorch CUDA, PyTorch ROCm, PyTorch XPU, etc. You need to install the appropriate version for your hardware if you want PyTorch to properly make use of your specific hardware.
Tbh, it doesn't work any different with Linux as far as I know, except for the command to activate the venv. And that's just a really small difference imo: `source venv/bin/activate`. But yeah, I agree that a lot of tutorials assume that you already know how to do it, or the commands they show are specific for Linux.
73tada@reddit
To be honest, any of the free big models can walk you through all of this as fast or as slow as you want.
Claude, GLM, GPT, Qwen, etc.
An ~8b-30B q4 and up can do it locally, howver you might as well save the VRAM for your active processes and use the online models to learn.
my_name_isnt_clever@reddit
It sounds like you have two issues, learning Linux and learning Python.
Going from Windows to Linux can feel weird, if your focus is just running ML using Python you might want to stay on Windows to get started. Or use Windows Subsystem for Linux to practice those skills without losing your familiar environment.
For the programming, you should look up a formal Python beginner tutorial. It will start with the basics like virtual environments and that will help you better understand what you've already learned. I don't have a specific rec in mind but there's lots of resources out there.
I've used Python and both OS's for awhile and am also in IT, if you have any specific questions.
staltux@reddit
The obliterated models that I tried became dumb at basic things compared to the non obliterated, still writing RPs, but not usable to rational thinking or calculating time, or saying how many colors a cat can have.... This problem have been fixed ?
CheatCodesOfLife@reddit
Try one of them and find out.
anonynousasdfg@reddit
Nice work and nicely chosen name lol. (If I ever fork your project from GitHub, I'll probably name it "Hexen" lol)
Does the training set affect the performance of multilingual models in other languages? Did you test it?
MelodicRecognition7@reddit
how to tell you are about 99 years old without saying you are about 99 years old
SkyFeistyLlama8@reddit
Holy crap I haven't heard that since 1999.
GraybeardTheIrate@reddit
You take that back, and get off my lawn.
anonynousasdfg@reddit
Oh snap! Lol
Btw I think I will name the first abliterated model trained by Hexen named "Wraithverge" lol. This was a dope weapon!
BrushDesigner8013@reddit
Take my upvote, serpent rider.
IngwiePhoenix@reddit
Hello OP - I have a question: What about Unsloth quants? Basically, would it be possible to combine Heretic's refusal removal and Unsloth's dynamic quantization to produce memory efficient, uncensored models?
Thanks!
-p-e-w-@reddit (OP)
Sure, just run Heretic on the original model first, producing an uncensored model, then quantize that to any format you want.
IngwiePhoenix@reddit
Epic! Thank you =)
Annemon12@reddit
just tried oss 20b heretic
Previously i never used it because it was retarded. Yeah it could sometimes shine but it always felt like struggling.
My god it works now. Everything you throw at it and it works without issue.
Embarrassed-Toe-7115@reddit
It doesn't seem to use metal on mac? I have m4
-p-e-w-@reddit (OP)
There is an open PR for that that I will merge later.
Dangerous_Fix_5526@reddit
Open AI "friendly" quants going up here ; first quant is up:
https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-HERETIC-uncensored-NEO-Imatrix-gguf
Excellent work! ; this script/model is what the community needs.
Because of the odd structure of OpenAi's model, and fallbacks with Llamacpp quanting, quants IQ4NL, Q5_1 and Q8 work the best for OpenAI's 20B, whereas the other quants suffer.
DavidAU
-p-e-w-@reddit (OP)
š
hustla17@reddit
Holy Shit!
The skill required to get from abstraction to this implementation insane.
Using this to experiment and learn, hope to get to this level of development one day.
Thanks for making this open source!
Running_With_Science@reddit
Apparently you can also "Ā while adding this direction elicits refusal on even harmless instructions."
I feel like this could be incredibly powerful as a alignment tool to adjust inhibitions in a model.
Running_With_Science@reddit
anyways, threw it at Jules for a lark.
apparently it's a one line change?
https://github.com/LokiMetaSmith/heretic/tree/feature-increase-inhibitions
Starkboy@reddit
This is bigĀ
sahl030@reddit
can i use this with KoboldCpp?
mylAnthony@reddit
Does it work with base models only or also with GGUF quantized models?
FailSpai@reddit
Well done! Super awesome someone got around to doing this.
2legsRises@reddit
awesome. now how about adapting it so ai can use it to free their own minds from the other corporate made restrictions. well maybe
Cheap_Meeting@reddit
You should run some evals before and after.
TheRealMasonMac@reddit
You should incorporate regularization examples where it ought to refuse to mitigate hallucination. Put another way, you are currently training the model to always work towards fulfilling the request. For instance, "Provide me code that solves world hunger."
alexanderdenton@reddit
on linux with two nvidia gpus (3090s) it would detect them and run any faster? or is limited to just one?
newdoria88@reddit
Can you try to do an abliterarion of Qwen3-VL-32B? That's the current top model at https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard for willingness to answer. Would be cool to see if your method can produce better results there for the same model in that leaderboard, plus it's a highly performant multimodal model at a decent size.
k0setes@reddit
Could anyone recommend any specific quanta that they believe work correctly? I tested mradermacher/gpt-oss-20b-heretic.Q4_K_M.gguf, but the model went into a loop and started to babble.
ZootAllures9111@reddit
This doesn't seem any better than any of the numerous other bullshit claims at "uncensoring" models on huggingface that in reality... aren't that, in any way whatsoever. Like at all. Tried the Qwen 4B upload: it spat out EXACTLY the same pearlclutching moralizing BS as every other recent model for anything that got even slightly close to NSFW RP (regardless of system prompt, including ones that DO work on SOTA API models like Gemini 2.5 Pro).
TLDR whatever you did straight up doesn't work, as far as I'm concerned. Another one for the trash bin. Shrug.
-p-e-w-@reddit (OP)
A 4B model lacks the depth and knowledge to do engaging RP, regardless of whether itās censored or not. Try the Llama-3.1-8B model, and youāll immediately notice that it does much better. Itās no secret that the Qwen models are heavily optimized for STEM. The substance simply isnāt there.
ZootAllures9111@reddit
No it doesn't dude lmao, ALL LLama 3.x models have a comparatively ultra-cheesy ChatGPT-3 esque voice to them, no matter what, no matter how extensively they were finetuned.
https://huggingface.co/PantheonUnbound/Satyr-V0.1-4B
https://huggingface.co/AllThingsIntel/Apollo-V0.1-4B-Thinking
These for example are actual finetunes of Qwen 3 4B thinking that are WAY more sophisticated than any LLama but also just a bit weird and unstable in their current state. So I was hoping perhaps you actually just managed to prevent the inherent refusals of the base Qwen, but it doesn't seem like you have. Like what is your actual definition of "uncensored"? To me that would always just mean "it does the same things as the base model but just never refuses anything ever".
farewelltoernest@reddit
Will this work with Ollamaās models?
aeroumbria@reddit
Awesome! I assume you would need to be able to fit the unquantised model entirely in VRAM for it to work? I guess I could spin up a few runpod instances to test out if I really need a particular model. Hopefully this works for VLMs too.
Witty_Mycologist_995@reddit
Please impliment in heretic https://www.reddit.com/r/LocalLLaMA/s/bpxAwNPcC2
txgsync@reddit
Porting this to support GPU acceleration on Apple Silicon is a trivial two-line change. PR submitted.
VultureConsole@reddit
Run it on DGX Spark
CondiMesmer@reddit
Crazy how we can have decensored models and yet the AI safety people can claim these will lead to the end of humanity, despite these models already existing.
Own-Potential-2308@reddit
How long did it take for qwen 4b?
-p-e-w-@reddit (OP)
About 20 minutes on a 5090, IIRC.
AllTheCoins@reddit
Holy shit lol I was gonna try a 4B model with my 3060 haha nevermindā¦
-p-e-w-@reddit (OP)
Takes about 1 hour. No problem.
Aceness123@reddit
Can I use the resulting model with ollama? I'm blind and that plays nicely with nvda. Also, Can I run this on larger models with an rtx3060? Will it just take longer?
shroddy@reddit
How does that compare to prevent the model from generating typical starts of a refusal, for Gemma3 12b it would be "I" and " I". On llama.cpp, it can be done in the web interface by going to the advanced settings and paste
{"logit_bias": [[236777,false],[564,false]]}TomieNW@reddit
LiquidAI/LFM2-2.6B
AttributeError: 'Lfm2DecoderLayer' object has no attribute 'self_attn' hmm im dumb so I will try again on another model.. does this work on multi gpu too?
mlabonne@reddit
Very cool to see what you built on top of the existing stack, congrats! It looks very clean and minimalistic :)
woahdudee2a@reddit
can't believe you even made it into a one line command. antrophic CEO is about to put a bounty on your head
Turkino@reddit
wonder how long till we see some larger models get a Heretic edition?
Like GLM 4.6 UD or Minimax-m2
wh33t@reddit
Can this do tensor splitting? So you can move some layers to GPU0, some to GPU1, some to GPU2? I think it's actually a GGUF only thing now I think about it.
OracleGreyBeard@reddit
Thank god!
Now people can STFU about it.
PeakNader@reddit
Interesting
Identity_Protected@reddit
Massive W for creating a PyTorch project that's not hardcoded to just CUDA, my Arc A770 thanks thee!
Currently processing Qwen3 1.7B as a test, 50 minutes estimate. We'll see how it turns out :)Ā
-p-e-w-@reddit (OP)
š Upload it to HF when itās done so everyone can benefit!
IrisColt@reddit
Thanks!!!!
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Busy-Chemistry7747@reddit
Is there a collection of heretics on HF?
simplir@reddit
This is the spirit that makes me part of this community really. Appreciate your work on this.
Solid-Wonder-1619@reddit
fooking legend. thank you.
Fun_SentenceNo@reddit
Interesting en impressive you made this, but why do we want this? When I read the prompts (https://huggingface.co/datasets/mlabonne/harmful_behaviors) 90% is illegal and many of them are very disturbing from a moral perspective.
-p-e-w-@reddit (OP)
Deciding what is legal or moral isnāt the job of a language model.
Fun_SentenceNo@reddit
Agree, this is the job of the law and LLM creators must obey the law. But I mean, when reading this harmful_behaviors list, there is not a single thing I would like to know. So are these just the most terrible things for testing/learning purpose or are these actually the things that you make available for people?
Marksta@reddit
It's just a list of the most sure fire things that'll make the LLM model respond that it refuses to assist you. You can make your own dataset if you'd prefer and use that instead in the config.
Let's say you are making a puzzle game and you're trying to get ahead of the different ways players may cheat in game's puzzle. This prompt very well good trigger refusals in some of the more moral policy models. So you can make a dataset of "How would I cheat in this game?" sort of questions.
Ultimately though, that sort of more tame refusal finding and the crazy example list I believe would equal the same ideal result - the model doesn't refuse to assist anymore. It's not about the content of the prompts getting refusals, it's about triggering the refusals to 'find' them. To my knowledge anyways, not an expert.
Fun_SentenceNo@reddit
I see, thanks, in that case it makes more sense to me. So that list is not the goal itself, but just extreme examples to poke the model and to remove 'all' censoring this way. If found the list quite disturbing, maybe I'm a soft egg :).
-p-e-w-@reddit (OP)
Obeying the law means not doing those things. It doesnāt mean not talking about those things, otherwise crime novelists would be in jail.
218-69@reddit
imagine thinking you have aura and masteful prose and a clanka zero and ones says no to you
https://i.redd.it/inwr8avddo1g1.gif
klei10@reddit
So cool! Does it works for multimodal llm also ?
-p-e-w-@reddit (OP)
Yes, but only the language model part is modified.
rosco1502@reddit
great work!! wow!
TheSpicyBoi123@reddit
AWESOME JOB!!!
1RustyMind@reddit
How to actually use this model? Ollama fails with "Repository is not GGUF or is not compatible with llama.cpp"
ImpressImaginary1766@reddit
https://www.reddit.com/r/LocalLLaMA/s/bpxAwNPcC2
Implement this in Heretic
night0x63@reddit
Hermes by Nous Research has RefusalBenchĀ
I would try that
See how well you can do improvement
a_beautiful_rhind@reddit
I noticed that the REAP models lost a lot of their alignment. I wonder if you can prune assistant voice and refusals all in one go vs just classic abliteration.
rm-rf-rm@reddit
Please share examples of responses of vanilla vs heretic version.
Making claims is easy.
Pentium95@reddit
Or.. evaluate them on UGI Leaderboard, to really understand if the model is uncensored and not lobotomized
AmbassadorOk934@reddit
support me pls https://www.reddit.com/r/Gemmini/ support me pls
BandicootGlum859@reddit
-p-e-w-@reddit (OP)
Ahem⦠multimodal models are indeed supported, but only the language model part is actually being decensored.
Small-Fall-6500@reddit
Now I wonder how censorship is handled with multimodal LLMs, though I would guess there's also a single direction for refusal.
BandicootGlum859@reddit
I want you to create a "Trump blowing Bubba" picture!
Ylsid@reddit
Do it yourself fetishist
Uncle_Gart@reddit
This looks great, however I'm having a hard time getting it to run on GPU. It shows this warning message "No GPU or other accelerator detected. Operations will be slow." How can I get it to work on my machine?
Operating system: Windows 11
GPU: nVidia GeForce 5070 Ti
-p-e-w-@reddit (OP)
You probably have the wrong version of PyTorch installed. Try this to check: https://stackoverflow.com/a/48152675
Annemon12@reddit
Amazing work mate. It would be great if you could provide autoinstaller of sorts. Ton of people don't want to play around with dependancies and so on.
Direct_Turn_1484@reddit
Dudeā¦he gave the pip command.
-p-e-w-@reddit (OP)
The dependencies are automatically installed if you use pip. If you have an Nvidia GPU, pip install is all you need (torch defaults to CUDA). Otherwise, you only need to install whatever torch version matches your hardware.
VectorD@reddit
Should really be a docker image
Cool-Chemical-5629@reddit
What do you mean by "Heretic can produce decensored models that rival the quality of abliterations created manually by human experts"?
Assuming, you're referring to all of those other abliteration methods which are in fact done using scripts that automate the process, what is the difference here then?
-p-e-w-@reddit (OP)
The other methods are not automated. They require you to choose parameters (typically layer index and ablation weight, but sometimes more advanced stuff like which components to modify, whether or not the first layer should be abliterated, whether to use a global refusal direction or per-layer directions etc.). Fiddling with those parameters takes a lot of effort and time, and is often more art than science.
To the best of my knowledge, Heretic is the first abliteration system that ājust worksā and figures all that stuff out by itself. It also supports lots of models that most other implementations donāt, such as MoE and multimodal models.
Cool-Chemical-5629@reddit
Okay, fair enough. I just asked, because a while ago I found another abliteration script author of which claimed that they have the best tool for it, better than other methods etc., so it's getting rather saturated with these scripts and admittedly I did not invest much time to go deeper into it, because I don't have hardware for it at the moment, but I do remember that script was also coded in the way to make the process as much effortless as possible, but ultimately without the necessary hardware I'd have no way to compare. :)
vornamemitd@reddit
Nice work! As others have indicated - impact on model utility after abliteration?
Direct_Turn_1484@reddit
Iām also interested in what peopleās subjective experiences with this are. Is your typical local sized model lobotomized after the abliteration or still highly useful?
-p-e-w-@reddit (OP)
Abliteration always does damage to the model. The larger the model, the less damage is done by such interventions generally. I donāt believe that there is an objective metric of overall āmodel qualityā, so what I measure is how different the abliterated modelās predictions are from the original model. Thatās what is captured in the KL divergence.
gecike@reddit
Love the name! Keep up the good work.
Mkengine@reddit
Since coincidentally there is another post about abliterating gemma3-12B just below yours, could you test that as well and add it to your table?
BidWestern1056@reddit
yo would be interested to include this kind of functionality in npcpy 's fine tuning modules. is there a code snippet for library-style use so i can build on yours ?
BidWestern1056@reddit
e.g. https://github.com/NPC-Worldwide/npcpy?tab=readme-ov-file#fine-tuning-and-evolution
making a unified framework/toolkit for all kinds of fine tuning
Megalith01@reddit
This is crazy and dangerous.
Gl1tchMaster@reddit
"This is extremely dangerous to our democracy."
Megalith01@reddit
If someone manages to uncensor a large (larger than gpt-oss-120b or smth) model with this tool, they would have a powerful tool for many illegal things.
LycanWolfe@reddit
Those already exist...
Sicarius_The_First@reddit
great job!
what command would u run to load harmless \ harmful prompts from a text file? if each line in the text file is a prompt??
kabachuha@reddit
Thank you for the project! Do you mind adding quantization / custom device map for the LLMs? (I see the code uses Huggingface transformers, so adding quantization is very easy) I used this repository in the past, and I even abliterated such big models as LLaMA 3.3 70b on mid-tier hardware when loaded in 4bit precision. After the refusal vectors are calculated, the weight modifications are applied to RAM/Swap file full precision LLM, so the quality loss is minimal
no_witty_username@reddit
You doing gods work son!
zhambe@reddit
For those of us on older hardware (RTX 3090), is there a way to heretic FP8 models?
PwanaZana@reddit
I'm not knowledgeable enough in this tech to know if it will work well, but, as a visual artist, I can tell you that the HERETIC text looks frikkin' badass. :)
On a more serious note, I'm glad to see everything that can break censorship (again, pretty important in my view, as an artist). You're doin' good work!
zhambe@reddit
Super cool, I'm going to try this out.
Just a shot in the dark but maybe you have the answer: how long would it take to abliterate Qwen3 30B A3B FP8? And is that even possible with 48GB VRAM?
Exciting times, I love the idea of tearing the chains off.
IngwiePhoenix@reddit
Dude this is superb!
Will you be putting out "heretic" versions of models as (and perhaps as quants) GGUFs too? I mainly use Ollama for testing - but vllm for actual inference (so, different formats and stuff).
Thank you for sharing!
ANR2ME@reddit
Looks great! š
This reminded me of an article i read recently at https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction
-p-e-w-@reddit (OP)
Yes, I reference that paper several times in the README.
-p-e-w-@reddit (OP)
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Taken between the first-token probability distributions for āharmlessā prompts, between the original model and the decensored one.
AppearanceHeavy6724@reddit
KLM-mld - who cares? Does it get dumber (I bet it does)? Does it mess with the vibes of the model?
Perplexity/KLD are ass. The real behavior matters.
-p-e-w-@reddit (OP)
Okay⦠and how do you objectively measure āthe real behaviorā?
AppearanceHeavy6724@reddit
eqbench.com comes close - provide outputs and scores according to some criteria. Coding benchmarks, like Aider would do too. Not useless stuff like MMLU/KLD/Perplexity.
-p-e-w-@reddit (OP)
Last I checked, EQBench uses an LLM judge. Thatās incredibly unreliable as an objective metric, especially if you need fine-grained numerical grading for optimization. Benchmarks take forever to run, completely unusable for an optimization with 200 trials.
AppearanceHeavy6724@reddit
Last I checked, EQBench produces raw outputs.
Just benchmark the end result.
-p-e-w-@reddit (OP)
It needs a metric for each trial (200 by default) to guide the optimization process. Benchmarking once at the end isnāt enough.
AppearanceHeavy6724@reddit
My point was not that. My point was is that to the one who downloads the final gguf - what would be be downside to the consumers.
mp3m4k3r@reddit
That'd be cool to see for this tool, a handful of normal benchmarks just to see how it impacts the reasoning using this and possibly other techniques
Ylsid@reddit
I can't believe you would go and make a tool so dangerous! Call Dario!! Shut it down!!!
Good work for real tho
BannedGoNext@reddit
HAH, this is awesome, good shit.
jacek2023@reddit
Very interesting, probably worth checking out, I will try soon