How do you make your LLM apps secure?
Posted by kk17702@reddit | LocalLLaMA | View on Reddit | 8 comments
Hey guys, I am just learning about this field and I wonder how the LLM providers censor their models. Is it just system instructions or do they use any tools to safeguard it against attacks like prompt injection? How do you guys make sure the applications that use the open source models are secure?
UpsilonIT@reddit
Making LLM apps secure starts with choosing models that are transparent and auditable. Developers often implement strict access controls, limit the scope of user input, and sanitize prompts to avoid injection attacks. Real-time monitoring is key to spotting abnormal behavior or misuse early. Regular updates, retraining on clean data, and red-team testing also help reduce risks over time. This resource features all the necessary steps to protect your AI solution. Hope it will help you!
infinite-Joy@reddit
You can do multiple things to make your LLM more secure.
Use classifiers to identify malicious prompts. Implement API-based solutions like Rebuff. Leverage pre-trained models from Hugging Face (e.g., prompt injection classifier)
Perform strict output validation. Utilize libraries such as Guidance, Outlines, and Instructor for schema-based validation. Always sanitize and verify LLM outputs before using them in critical operations
Rigorously validate input data. I generally perform extensive Exploratory Data Analysis (EDA) and anomaly detection. As per research, even small amounts of poisoned data (0.5%) can significantly impact model performance.
Beware of glitch tokens as well because attackers can use them to produce hallucinations. Identify and filter out glitch tokens. Regularly update your tokenizer and model to address known glitch tokens.
Implement LLM watermarking to prevent your model from theft and unauthorized use. Transformers library already has the latest technique for NLP at least.
https://www.youtube.com/watch?v=pWTpAr_ZW1c
tyoma@reddit
You want to take a holistic look at the entire application, not just the LLM.
As a timely example, here is the description of a quick audit of an open source RAG application: https://blog.trailofbits.com/2024/07/05/auditing-the-ask-astro-llm-qa-app/
tutu-kueh@reddit
Hey does anyone have a list of keywords that we can do regex filtering on?
aseichter2007@reddit
If you're deploying something, just use aggressive keyword filtering like JadeSerpant's last recommended point, and reject those prompts without ever sending to the LLM.
You can do it fancy for a bunch of time and cost, but a good keyword filter should be cheap and understandable, and provide more consistent results than censoring the model (which can reduce it's performance) and big system prompts full of things to avoid adds noise to the prompt and decreases performance. (and increases latency if you tokenize it each time)
So, if the input box detects requests for smut, rude content, and off topic requests on the front end, you're pretty much covered. You save inference server compute not generating refusals, and you don't need a whole second system to deny requests for silly things. Showing what set off the detector can help people using the system in good faith, but can let people (really dedicated people) find ways around.
There isn't a lot of reason to make it overly complex, those systems are at best "Nice to haves" if you have the manpower and budget to get them in place.
Astronos@reddit
soon™
JadeSerpant@reddit
There are multiple layers you can apply.
Not recommended:
ArchduckFerdinand@reddit
Also, outside the LLM, traditional L7/WAF/chatbot custom inspection rules can give you more granular control without having to have the LLM process the unwanted prompts.