New model for detecting and masking PII from OpenAI

Posted by doesitoffendyou@reddit | LocalLLaMA | View on Reddit | 8 comments

[-]

xAragon_@reddit

Old news, there were already several posts on this.
- https://www.reddit.com/r/LocalLLaMA/comments/1ssp4kb/openai_privacy_filter_model/
- https://www.reddit.com/r/LocalLLaMA/comments/1stjl04/openai_privacy_filter_goes_openweight_apache_20/
- https://www.reddit.com/r/LocalLLaMA/comments/1ssps99/new_openai_privacy_filter_model_running_locally/

[-]

Daemontatox@reddit

Have people never heard of PII models? Like hello? Why would i ever use this over any of the other ultra light and ultra fast models ?

Also this seems to be English only and behave really really bad on other languages.

[-]

SkyFeistyLlama8@reddit

OpenAI knows all about how to mask PII because they've been hoovering up people's PII for years.

[-]

They released it a few days ago. They say, "If you want to use your stuff online, you'd better delete sensitive data because who knows what will be done with it." It's basically a manifesto for open source and local LLMs.

[-]

doesitoffendyou@reddit (OP)

It's a MoE with 1.5b parameters, 50 million activated, Apache 2.0 license.

"Privacy Filter is designed for practical privacy filtering in noisy, real-world text. That includes long documents, ambiguous references, mixed-format strings, and software-related secrets." Model card heer

[-]

ResidentPositive4122@reddit

And, IIUC this should be fast AF. It doesn't generate tokens but it classifies them in one pass, and gives you a set of detections and scores.