Smallest model capable of detecting profane/nsfw language?

[-]

Independent_Aside225@reddit

Use a small classifier instead. I believe a transformer (maybe BERT or ALBERT or DistillBERT) with less than 50M parameters can cut it.

Look around, if you can't find a model that does this out of the box, use a LLM API to generate profanity and creative workarounds. Then grab a text pile that you *know* doesn't contain profanity and use these two to finetune one of those small transformers to detect profanity for you. To do this, you need to add a layer at the end of the model with two scalar outputs that gets fed into softmax so you get a nice probability distribution. Look up guides or ask a LLM to help you. It can get a few hours of your time but at least you won't deal with prompting.

[-]

Top-Opinion-7854@reddit

Dude just use a list not everything needs to be an llm

[-]

BusRevolutionary9893@reddit

Dude, just let people say what they want. People are tired of the censorship. We all managed to survive the early Xbox live days without issue.

[-]

ThaisaGuilford@reddit

I love AI. I do everything with AI.

[-]

RedTheRobot@reddit

I’ll do you one better, have an LLM make the list. Checkmate.

[-]

DifficultArmadillo78@reddit

Problem with those is that they often either focus on english and thus can be circumvented by using other languages or they are so broad that suddenly completely random stuff gets censored because in some language two letters mean something bad.

[-]

Karyo_Ten@reddit

Or use space, * or swap letters or letters to numbers

[-]

Wandering_By_@reddit

Regex crying silently in the corner, wondering why people waste resources.

[-]

alcalde@reddit

"It's because you're weird and incomprehensible, Regex! That's why no one wants to play with you!"

[-]

_raydeStar@reddit

You know who could help with that?

An LLM

[-]

CV514@reddit

When 4o came out, the first thing I asked was some pretty complex yet possible regex request. It managed to do that. On the 11th try. I almost wanted for it to comment on how it struggles.

[-]

RedditDiedLongAgo@reddit

Some of us like it rough. 😏

[-]

Inkbot_dev@reddit

It's a witch, burn her!

[-]

_moria_@reddit

Man, I'm old, in my swe career I have more year in Perl that I'd like to admit.

They are not in a corner they are in the deepest corner of hell, or as they call it, home.

[-]

LicensedTerrapin@reddit

Perl as in perl harbour? Thank you for your service! 😉

[-]

Context_Core@reddit

Lmfao

[-]

kmouratidis@reddit

That works until your users start discussing how much their stock rises after it receives stimulation from the ministry of social affairs.

[-]

dobablos@reddit

N

[-]

Incompetent_Magician@reddit

Came here to say this.

[-]

PleaseDontEatMyVRAM@reddit

id be shocked if theres not prebuilt lists for this available online

[-]

NSWindow@reddit

beware of the scunthrope problem

[-]

Lonely-Drop-1435@reddit

For python

https://pypi.org/project/profanity-check

[-]

Unhappy-Fig-2208@reddit

Did people forget about BERT?

[-]

m1tm0@reddit

Unlike what other people in this thread, a model is definitely necessary for solving this task comprehensively.

The problem is false positives, if you ever played roblox as a kid you’d know.

Definitely browse huggingface and benchmark some models for your use case. You don’t want an LLM for this, maybe a BERT encoder that feeds into a decision tree classifier.

[-]

Parogarr@reddit

why do you even care if they use that language?

[-]

kralni@reddit

One solution between ban list and llm is BERT-like models. They are trained to predict semantic in some sense, so it is just what you need. They are very lightweight and stuff like ALBERT may run very fast. It also may give binary output (positive/negative) and you don’t have to parse output like in LLMs. And it’s a common homework task in LLM course to fine-tune BERT on custom dataset (may be done in 30 minutes including learning) so you can do it. And there are plenty of them on huggingface, maybe even fine-tuned for you task

[-]

synexo@reddit

You don't need an LLM for that, simple banned word lists have been used for decades.

[-]

Chromix_@reddit

Yes, and they help against a bunch of standard cases, which means they're sufficient for 80%+ of what's written. Yet then there are repeat-offenders who just creatively work around the list. I've seen people trying to maintain those lists against that. Once a bunch of stuff gets added it also starts to occasionally hit normal conversation. It's a cat and mouse game where the mouse wins. I can't recommend going for a list in 2025 if you care about your community. Which reminds me, lists are used here.

[-]

SunstoneFV@reddit

It sounds like to me the best method to keep resources down would be to use a list for instant blocking, but also allow players to report messages which weren't blocked by the list. Then have the LLM analyze any human reported text. High confidence that the text was profane leads to the message being blocked. Medium confidence kicks it to a human for review. Low confidence nothing happens. Store reported messages for later review on how well the system is functioning, for appeals, and random checks. Include a strike system for both people who are sending profane messages and people frivolously reporting benign messages as such.

[-]

codeprimate@reddit

And they don’t work, reference the “Scunthorpe problem”

[-]

Top-Salamander-2525@reddit

Here are seven to start you off…

https://www.youtube.com/watch?v=kyBH5oNQOS0

[-]

wwabbbitt@reddit

I last watched this more than 8 years ago and still instantly knew this would be the video you link to

[-]

luckyj@reddit

And toxicity is at record low levels..

[-]

JohnnyAppleReddit@reddit

Be cautious that you don't open yourself up to a denial of service attack from people flooding the chat. Think about how many inference calls are being done and how to limit them. You may want to set a hard cap and just review a random sampling of recent messages. Or go with an old fashion word-list, or both.

[-]

_raydeStar@reddit

Psht, have them run Qwen 2.5 .5B in the background and it'll get the job done. It's client -side but adding a report button will solve that.

Or do a word list.

Or use Gemini free AI tier and allow 1 post per minute

[-]

WolpertingerRumo@reddit

Is qwen 2.5:0.5b actually powerful enough?

And serious question: will it also see mentions of Taiwan as offensive?

[-]

_raydeStar@reddit

For language censoring - yes. I was playing around with it and it censored words.

Taiwan - I'm not sure. What you should do is give a very direct prompt that requires a true or false bool. "Is this inappropriate?" If you need to, use an uncensored model.

One tip is to say "give me the output in json data using the following format {object}" then it'll follow more strictly.

[-]

WolpertingerRumo@reddit

I tried it out already. It will not, though if pressed for information it will state CCP propaganda, but not too an extreme.

This is extremely interesting, because that is completely, utterly better than DeepSeek. It even told what Mao Zedongs worst political decision was. DeepSeek will just tell me his best instead.

[-]

roger_ducky@reddit

You’d probably be happier using a LLM in embedding mode and just doing similarity searches against a database of known bad words.

[-]

WolpertingerRumo@reddit

Ok, so most people here are kind of right, it may not be needed. Easier with a blocklist.

However: I tried it for a little while and you can get something quite fun with the right system prompt. In short, I made the system prompt so it would scan the text for profanity, sexualised content or anything not suitable for children. If nothing, give the text as is without changes or commentary.

But if profanity is found, mark it with * before and after, and rephrase it with sanitzed old timey words.

So F you -> I beg you pardon I f-ed your mother last night -> Last evening, a regrettable incident occurred involving a sensitive matter

You suck, loser -> I find your actions mildly disappointing

Orgasm -> heightened awareness

It only really worked in gemma3:4b. Llama3.2 sometimes refused, saying it could not engage in impolite conversation. With the right system prompt it would work, I’m sure.

This would either get kids to stop swearing because it becomes very uncool when it’s actually sent, or make them use it even more because it’s funny.

[-]

IndianaNetworkAdmin@reddit

Just have a block list of words. Here's one on Github -

https://github.com/coffee-and-fun/google-profanity-words

[-]

AnomalyNexus@reddit

Could probably use one of the guard models

[-]

cmndr_spanky@reddit

if chat contains ["fuck","shit","ass"....]:

user.account.ban()

Now mail me your 5090 please cuz you don't need it.

[-]

ohcrap___fk@reddit (OP)

lol, developing on a 1080 I bought in 2016 :)

[-]

Chromix_@reddit

Good, that means your game will run well on low-end machines :-)

[-]

BriannaBromell@reddit

I wonder if this would be a good fit for NLP like SpaCy? It would have a little lower overhead.

[-]

JimDabell@reddit

Does it have to be an LLM? You could use Perspective. It’s an API to detect harmful text content hosted by Google but available to use for free.

[-]

jnfinity@reddit

Personally I implemented a model based on the "Text Classification: A Parameter-Free Classification Method with Compressors" paper to handle this for a lot of my use-cases.

[-]

External_Natural9590@reddit

This could come at handy. I am finetuning LLM for similar - bit more extensive - use case at work. It is complicated by being non-english and having to give some slack to some profanities and the sheer amount of grammar errors and typos. So far I have found that the bigger the LLM the better the performance, which is kinda expected - but not to such degree. It might be an artifact of bigger models having higher probability to be trained on a substantial corpus of target language. Anyways once I am happy with the quality, I am planning on distilling it into: 1.smaller model 2.simpler neural net 3. embedding model using large amount of labeled and synthetic data to serve as a backup

[-]

KillerX629@reddit

Isn't an embeddings model more appropiate for this use case?

[-]

External_Natural9590@reddit

Not necesarily. Both Mistral and ClosedAI use LLM based filters.

[-]

Tiny_Arugula_5648@reddit

So much pontificating.. just go to hugging face and search there's plenty of classifiers there.. this is a solved problem for the most part.

[-]

Top-Salamander-2525@reddit

Instead of embedding a dumb LLM in your game, ask a smart one to create a regular expression for you.

Asked ChatGPT and got this one (python syntax):


import re

pattern = re.compile(r"""
\b(
  f[\W_]*[uU][\W_]*[cCkK][\W_]*(?:e?[dDsStT]*) |     # f*ck, f@cked, f*ck!ng
  s[\W_]*[hH][\W_]*[i1!|][\W_]*[tT]+ |              # sh!t, sh1t, s#it
  a[\W_]*[s$][\W_]*[s$]+ |                          # ass, a$$
  b[\W_]*[i1!|][\W_]*[tT][\W_]*[cC][\W_]*[hH] |      # b!tch, b1tch
  c[\W_]*[uU][\W_]*[nN][\W_]*[tT] |                 # c*nt
  d[\W_]*[i1!|][\W_]*[cCkK][\W_]* |                 # dick, d!ck
  p[\W_]*[i1!|][\W_]*[s$][\W_]*[s$]+ |              # piss
  m[\W_]*[o0][\W_]*[tT][\W_]*[hH][\W_]*[e3][\W_]*[rR][\W_]*[fF][\W_]*[uU][\W_]*[cCkK] |  # motherf*cker
  n[\W_]*[i1!|][\W_]*[gG][\W_]*[gG][\W_]*[e3][\W_]*[rR] # n*gger
)\b
""", re.IGNORECASE | re.VERBOSE)

text = "That guy is such a f@#king IDIOT!"
if pattern.search(text):
    print("Swear word detected!")

Should be trivial to get any good model to generate something similar for whatever language your game uses and add more terms.

[-]

Chromix_@reddit

Your game is your focus. Check if you can get something for free from ggwp AI, utopiaanalytics or so, since your game is small and you have a low chat volume. That way you don't need to deal with lists, never-ending LLM few-shot prompt updates, as well as setting up and scaling the system. Running your own LLM for it is a nice approach that I would certainly consider for optimizing cost later on, yet when you have limited time and your game still needs work, then maybe that's an alternative to consider.

Hint for others who comment: There are certain words related to this topic that prevent your contribution from showing up here.

[-]

SM8085@reddit

Hint for others who comment: There are certain words related to this topic that prevent your contribution from showing up here.

Oh, I am being ghosted apparently.

Not even sure what word that would be, the f-word?

[-]

Chromix_@reddit

There are a whole bunch that got in the way in the past for me, I should probably start writing a list instead of just working around. In my comment it was ᶜᵒⁿᵗᵉⁿᵗ ᵐᵒᵈᵉʳᵃᵗⁱᵒⁿ, ᵒʳ ʷᵃⁿᵗⁱⁿᵍ ᵃ ᶜᵒᵐᵐᵘⁿⁱᵗʸ ᵗᵒ ˢᵗᵃʸ ᵃˡⁱᵛᵉ I think.

[-]

daHaus@reddit

Look into solutions used for places like twitch. There are tons of open source bots that people have already invested time into refining

[-]

MengerianMango@reddit

What language are you using? Might even be able to find a package for this with embedded word list and fuzzy matching.

https://github.com/finnbear/rustrict

[-]

Equivalent-Bet-8771@reddit

Your model will need to keep up with new insults and profanities being invented. Being a very small model it's going to be unable to understand nuance and will penalize players who are just frustrated but not outright hostile, while also missing obvious insults you overlooked.

I wouldn't do this, not unless you need it.

Do you intend to run this on people's computers or is this on a server? Why not a proper-sized LLM and you can even batch messages for performance.

[-]

codeninja@reddit

The Qwen series of models is more than capable of detecting this. Have the model return a binary response if profanity is detected and pass the context. Works great with Qwen 2.7b.

If you need something smaller, you might training FLAN-T5 encoder-decoder models. Or, roll your own binary classifier encoder/decoder. Which is not that hard these days with AI Assisted lift.

[-]

You_Wen_AzzHu@reddit

You can't prevent it. I say , create a keyword list and report button , add more keywords continuously.

[-]

rog-uk@reddit

grep