OpenAI Privacy Filter Model
Posted by ai_hedge_fund@reddit | LocalLLaMA | View on Reddit | 11 comments
Just saw this posted by Bloomberg in a different sub:
https://huggingface.co/openai/privacy-filter
Open weights, Apache 2.0, etc
I like the contribution to the space between local models for protecting privacy and some level of quality inferred by a big lab
juliarmg@reddit
You can try this on your Mac to redact PDF before uploading to ChatGPT, Claude, Gemini, with my new OpenSource Mac app.
redactdesk.app
Agile-Youth516@reddit
Nice release! We took a look through the code and found what appear to be the entity types for future releases - this release (V2 config) supports 8 entity types, but the V4 and V7 taxonomies have >20. We put the details in our review article. Disclaimer: We also build PII detection systems.
DefNattyBoii@reddit
This is super useful just slap it in front of your stack if you want to talk to the cloud ever. GGUF when?
AnomalyNexus@reddit
At 50M effective you can probably just run it as is on cpu
selvamTech@reddit
Neat release , token-level PII classifier at 50M active params is small enough to actually sit in a local pipeline. For what it's worth, I build Elephas (disclosure: my app) which leans on the local-first angle for exactly this reason, people working with sensitive docs don't want their data hitting a cloud endpoint just to get AI help. A tunable on-device redactor like this slotting in front of cloud calls is a pattern I think more apps should adopt.
woct0rdho@reddit
I'm experimenting with it in dataclaw where I export 1 GB Claude Code chats to HuggingFace every month and I need some way to redact PII. Looks worth tracking.
Accomplished_Mode170@reddit
Cheers Mihai et al., glad to see more FOSS!
Mohit_Singh_Pawar@reddit
I do believe this is something that can solve a lot of pii issues in unstructured documents - but making it custom like you prompt it do the redaction and also maintain the layout and format of the document is tough. Also text searchable documents also contain pii but you cannot redact it instead have to replace it or remove even if with just keyword match or something it might be tougher to maintain the layout. This model does help in filtering text before sending or sharing external or while building vector databases.
brown2green@reddit
It looks like it's tiny MoE model?
XeNo___@reddit
I am really not a huge fan of OpenAI and their recent releases, but that is a pretty cool one. I think it's pretty niche, but i certainly have a few usecases where it will come in handy.
Randomdotmath@reddit
It’s not exactly what people want, but it’s quite practical.