🛡️ Shield 82M: A PII stripping/filtering model 🛡️

Posted by LH-Tech_AI@reddit | LocalLLaMA | View on Reddit | 27 comments

Hey, r/LocalLLaMA !

I am finally back with a new model: 🛡️ Shield 82M

It's a finetuned version of distilroberta-base and it's able to filter out all types of PII (Personally identifiable information) of texts in any language.

Here are some examples:

1) Test with name ,email and phone:

Original: My name is John Doe. Email: john@example.com. Phone: +49 123 45678.
Protected: My name is [PERSON]. Email: [EMAIL]. Phone: [PHONE].

2) basic test:

Original: I live in Cambridge
Protected: I live in [ADDRESS]

3) French test (multilingual):

Original: Mon e-mail est jean.dupont@example.fr et mon téléphone est +33 6 12 34 56 78.
Protected: Mon e-mail est [EMAIL] et mon téléphone est [PHONE].

So, we see that this model performs really well with a total accuracy of \~96%.

And: it's completely open-source like all my models. :D

If you want to try it out: https://huggingface.co/LH-Tech-AI/Shield-82M

Have fun with it. :-)

See you in the comments. Would really like to get some feedback from you.