Smallest model capable of detecting profane/nsfw language?
Posted by ohcrap___fk@reddit | LocalLLaMA | View on Reddit | 62 comments
Hi all,
I have my first ever steam game about to be released in a week which I couldn't be more excited/nervous about. It is a singleplayer game but I have a global chat that allows people to talk to other people playing. It's a space game, and space is lonely, so I thought that'd be a fun aesthetic.
Anyways, it is in beta-testing phase right now and I had to ban someone for the first time today because of things they were saying over chat. It was a manual process and I'd like to automate the detection/flagging of unsavory messages.
Are <1b parameter models capable of outperforming a simple keyword check? I like the idea of an LLM because it could go beyond matching strings.
Also, if anyone is interested in trying it out, I'm handing out keys like crazy because I'm too nervous to charge $2.99 for the game and then underdeliver. Game info here, sorry for the self-promo.
Independent_Aside225@reddit
Use a small classifier instead. I believe a transformer (maybe BERT or ALBERT or DistillBERT) with less than 50M parameters can cut it.
Look around, if you can't find a model that does this out of the box, use a LLM API to generate profanity and creative workarounds. Then grab a text pile that you *know* doesn't contain profanity and use these two to finetune one of those small transformers to detect profanity for you. To do this, you need to add a layer at the end of the model with two scalar outputs that gets fed into softmax so you get a nice probability distribution. Look up guides or ask a LLM to help you. It can get a few hours of your time but at least you won't deal with prompting.
Top-Opinion-7854@reddit
Dude just use a list not everything needs to be an llm
BusRevolutionary9893@reddit
Dude, just let people say what they want. People are tired of the censorship. We all managed to survive the early Xbox live days without issue.
ThaisaGuilford@reddit
I love AI. I do everything with AI.
RedTheRobot@reddit
I’ll do you one better, have an LLM make the list. Checkmate.
DifficultArmadillo78@reddit
Problem with those is that they often either focus on english and thus can be circumvented by using other languages or they are so broad that suddenly completely random stuff gets censored because in some language two letters mean something bad.
Karyo_Ten@reddit
Or use space, * or swap letters or letters to numbers
Wandering_By_@reddit
Regex crying silently in the corner, wondering why people waste resources.
alcalde@reddit
"It's because you're weird and incomprehensible, Regex! That's why no one wants to play with you!"
_raydeStar@reddit
You know who could help with that?
An LLM
CV514@reddit
When 4o came out, the first thing I asked was some pretty complex yet possible regex request. It managed to do that. On the 11th try. I almost wanted for it to comment on how it struggles.
RedditDiedLongAgo@reddit
Some of us like it rough. 😏
Inkbot_dev@reddit
It's a witch, burn her!
_moria_@reddit
Man, I'm old, in my swe career I have more year in Perl that I'd like to admit.
They are not in a corner they are in the deepest corner of hell, or as they call it, home.
LicensedTerrapin@reddit
Perl as in perl harbour? Thank you for your service! 😉
Context_Core@reddit
Lmfao
kmouratidis@reddit
That works until your users start discussing how much their stock rises after it receives stimulation from the ministry of social affairs.
dobablos@reddit
N
Incompetent_Magician@reddit
Came here to say this.
PleaseDontEatMyVRAM@reddit
id be shocked if theres not prebuilt lists for this available online
NSWindow@reddit
beware of the scunthrope problem
Lonely-Drop-1435@reddit
For python
https://pypi.org/project/profanity-check
Unhappy-Fig-2208@reddit
Did people forget about BERT?
m1tm0@reddit
Unlike what other people in this thread, a model is definitely necessary for solving this task comprehensively.
The problem is false positives, if you ever played roblox as a kid you’d know.
Definitely browse huggingface and benchmark some models for your use case. You don’t want an LLM for this, maybe a BERT encoder that feeds into a decision tree classifier.
Parogarr@reddit
why do you even care if they use that language?
kralni@reddit
One solution between ban list and llm is BERT-like models. They are trained to predict semantic in some sense, so it is just what you need. They are very lightweight and stuff like ALBERT may run very fast. It also may give binary output (positive/negative) and you don’t have to parse output like in LLMs. And it’s a common homework task in LLM course to fine-tune BERT on custom dataset (may be done in 30 minutes including learning) so you can do it. And there are plenty of them on huggingface, maybe even fine-tuned for you task
synexo@reddit
You don't need an LLM for that, simple banned word lists have been used for decades.
Chromix_@reddit
Yes, and they help against a bunch of standard cases, which means they're sufficient for 80%+ of what's written. Yet then there are repeat-offenders who just creatively work around the list. I've seen people trying to maintain those lists against that. Once a bunch of stuff gets added it also starts to occasionally hit normal conversation. It's a cat and mouse game where the mouse wins. I can't recommend going for a list in 2025 if you care about your community. Which reminds me, lists are used here.
SunstoneFV@reddit
It sounds like to me the best method to keep resources down would be to use a list for instant blocking, but also allow players to report messages which weren't blocked by the list. Then have the LLM analyze any human reported text. High confidence that the text was profane leads to the message being blocked. Medium confidence kicks it to a human for review. Low confidence nothing happens. Store reported messages for later review on how well the system is functioning, for appeals, and random checks. Include a strike system for both people who are sending profane messages and people frivolously reporting benign messages as such.
codeprimate@reddit
And they don’t work, reference the “Scunthorpe problem”
Top-Salamander-2525@reddit
Here are seven to start you off…
https://www.youtube.com/watch?v=kyBH5oNQOS0
wwabbbitt@reddit
I last watched this more than 8 years ago and still instantly knew this would be the video you link to
luckyj@reddit
And toxicity is at record low levels..
JohnnyAppleReddit@reddit
Be cautious that you don't open yourself up to a denial of service attack from people flooding the chat. Think about how many inference calls are being done and how to limit them. You may want to set a hard cap and just review a random sampling of recent messages. Or go with an old fashion word-list, or both.
_raydeStar@reddit
Psht, have them run Qwen 2.5 .5B in the background and it'll get the job done. It's client -side but adding a report button will solve that.
Or do a word list.
Or use Gemini free AI tier and allow 1 post per minute
WolpertingerRumo@reddit
Is qwen 2.5:0.5b actually powerful enough?
And serious question: will it also see mentions of Taiwan as offensive?
_raydeStar@reddit
For language censoring - yes. I was playing around with it and it censored words.
Taiwan - I'm not sure. What you should do is give a very direct prompt that requires a true or false bool. "Is this inappropriate?" If you need to, use an uncensored model.
One tip is to say "give me the output in json data using the following format {object}" then it'll follow more strictly.
WolpertingerRumo@reddit
I tried it out already. It will not, though if pressed for information it will state CCP propaganda, but not too an extreme.
This is extremely interesting, because that is completely, utterly better than DeepSeek. It even told what Mao Zedongs worst political decision was. DeepSeek will just tell me his best instead.
roger_ducky@reddit
You’d probably be happier using a LLM in embedding mode and just doing similarity searches against a database of known bad words.
WolpertingerRumo@reddit
Ok, so most people here are kind of right, it may not be needed. Easier with a blocklist.
However: I tried it for a little while and you can get something quite fun with the right system prompt. In short, I made the system prompt so it would scan the text for profanity, sexualised content or anything not suitable for children. If nothing, give the text as is without changes or commentary.
But if profanity is found, mark it with * before and after, and rephrase it with sanitzed old timey words.
So F you -> I beg you pardon I f-ed your mother last night -> Last evening, a regrettable incident occurred involving a sensitive matter
You suck, loser -> I find your actions mildly disappointing
Orgasm -> heightened awareness
It only really worked in gemma3:4b. Llama3.2 sometimes refused, saying it could not engage in impolite conversation. With the right system prompt it would work, I’m sure.
This would either get kids to stop swearing because it becomes very uncool when it’s actually sent, or make them use it even more because it’s funny.
IndianaNetworkAdmin@reddit
Just have a block list of words. Here's one on Github -
https://github.com/coffee-and-fun/google-profanity-words
AnomalyNexus@reddit
Could probably use one of the guard models
cmndr_spanky@reddit
if chat contains ["fuck","shit","ass"....]:
user.account.ban()
Now mail me your 5090 please cuz you don't need it.
ohcrap___fk@reddit (OP)
lol, developing on a 1080 I bought in 2016 :)
Chromix_@reddit
Good, that means your game will run well on low-end machines :-)
BriannaBromell@reddit
I wonder if this would be a good fit for NLP like SpaCy? It would have a little lower overhead.
JimDabell@reddit
Does it have to be an LLM? You could use Perspective. It’s an API to detect harmful text content hosted by Google but available to use for free.
jnfinity@reddit
Personally I implemented a model based on the "Text Classification: A Parameter-Free Classification Method with Compressors" paper to handle this for a lot of my use-cases.
External_Natural9590@reddit
This could come at handy. I am finetuning LLM for similar - bit more extensive - use case at work. It is complicated by being non-english and having to give some slack to some profanities and the sheer amount of grammar errors and typos. So far I have found that the bigger the LLM the better the performance, which is kinda expected - but not to such degree. It might be an artifact of bigger models having higher probability to be trained on a substantial corpus of target language. Anyways once I am happy with the quality, I am planning on distilling it into: 1.smaller model 2.simpler neural net 3. embedding model using large amount of labeled and synthetic data to serve as a backup
KillerX629@reddit
Isn't an embeddings model more appropiate for this use case?
External_Natural9590@reddit
Not necesarily. Both Mistral and ClosedAI use LLM based filters.
Tiny_Arugula_5648@reddit
So much pontificating.. just go to hugging face and search there's plenty of classifiers there.. this is a solved problem for the most part.
Top-Salamander-2525@reddit
Instead of embedding a dumb LLM in your game, ask a smart one to create a regular expression for you.
Asked ChatGPT and got this one (python syntax):
Should be trivial to get any good model to generate something similar for whatever language your game uses and add more terms.
Chromix_@reddit
Your game is your focus. Check if you can get something for free from ggwp AI, utopiaanalytics or so, since your game is small and you have a low chat volume. That way you don't need to deal with lists, never-ending LLM few-shot prompt updates, as well as setting up and scaling the system. Running your own LLM for it is a nice approach that I would certainly consider for optimizing cost later on, yet when you have limited time and your game still needs work, then maybe that's an alternative to consider.
Hint for others who comment: There are certain words related to this topic that prevent your contribution from showing up here.
SM8085@reddit
Oh, I am being ghosted apparently.
Not even sure what word that would be, the f-word?
Chromix_@reddit
There are a whole bunch that got in the way in the past for me, I should probably start writing a list instead of just working around. In my comment it was ᶜᵒⁿᵗᵉⁿᵗ ᵐᵒᵈᵉʳᵃᵗⁱᵒⁿ, ᵒʳ ʷᵃⁿᵗⁱⁿᵍ ᵃ ᶜᵒᵐᵐᵘⁿⁱᵗʸ ᵗᵒ ˢᵗᵃʸ ᵃˡⁱᵛᵉ I think.
daHaus@reddit
Look into solutions used for places like twitch. There are tons of open source bots that people have already invested time into refining
MengerianMango@reddit
What language are you using? Might even be able to find a package for this with embedded word list and fuzzy matching.
https://github.com/finnbear/rustrict
Equivalent-Bet-8771@reddit
Your model will need to keep up with new insults and profanities being invented. Being a very small model it's going to be unable to understand nuance and will penalize players who are just frustrated but not outright hostile, while also missing obvious insults you overlooked.
I wouldn't do this, not unless you need it.
Do you intend to run this on people's computers or is this on a server? Why not a proper-sized LLM and you can even batch messages for performance.
codeninja@reddit
The Qwen series of models is more than capable of detecting this. Have the model return a binary response if profanity is detected and pass the context. Works great with Qwen 2.7b.
If you need something smaller, you might training FLAN-T5 encoder-decoder models. Or, roll your own binary classifier encoder/decoder. Which is not that hard these days with AI Assisted lift.
You_Wen_AzzHu@reddit
You can't prevent it. I say , create a keyword list and report button , add more keywords continuously.
rog-uk@reddit
grep