Join us as we return to New York on June 5 to collaborate with leaders to explore comprehensive methods for auditing AI models for bias, performance, and ethical compliance in diverse organizations. Find out how you can attend here.
New technologies are synonymous with new opportunities… but also new threats. And when the technology is as complex and unfamiliar as Generative AIit can be difficult to figure out which is which.
Take the discussion around hallucinations. At the start of the AI rush, many people were convinced that hallucinations were still an unwanted and potentially dangerous behavior, something that needed to be completely eradicated. Then the conversation shifted to encompass the idea that hallucinations can be valuable.
Isa Fulford from OpenAI expresses it well. “We probably don't want models who never hallucinate, because that can be considered a creative model,” she points out. “We just want models that hallucinate in the right context. In some contexts, it is acceptable to hallucinate (for example, if you are asking for help with creative writing or new creative ways to solve a problem), while in other cases it is not. is not the case.
This view is now dominant on hallucination. And now there is a new concept that is gaining prominence and causing a lot of fear: “rapid injection.” This is generally defined as when users deliberately misuse or exploit an AI solution to create an undesirable outcome. And unlike most conversations about possibilities poor AI resultswhich tend to focus on possible negative outcomes for users, this concerns risks for AI providers.
I'll explain why I think much of the hype and fears around the rapid injection are exaggerated, but that doesn't mean there isn't a real risk. A quick injection should serve as a reminder that when it comes to AI, risk cuts both ways. If you want to create LLMs that keep your users, your business, and your reputation safe, you need to understand what it is and how to mitigate it.
How rapid injection works
You can think of this as the downside to the incredible, revolutionary openness and flexibility of the AI generation. When AI agents are well designed and executed, it really feels like they can do anything. It may seem like magic: I just tell him what I want, and he does it!
The problem, of course, is that responsible companies don't want to release into the world AI that “really does anything.” And unlike traditional software solutions, which tend to have rigid user interfaces, major language models (LLM) offer opportunistic and malicious users numerous opportunities to test its limits.
You don't need to be an expert hacker to attempt to misuse an AI agent; you can simply try different prompts and see how the system responds. Some of the simplest forms of rapid injection occur when users try to convince the AI to bypass content restrictions or ignore controls. This is called “jailbreaking”. One of the most famous examples dates back to 2016, when Microsoft released a prototype Twitter bot that quickly “learned” how to making racist and sexist comments. More recently, Microsoft Bing (now “Microsoft Co-Pilot) was successfully handled to disclose confidential data on its construction.
Other threats include data mining, where users seek to trick AI into revealing confidential information. Imagine an AI banking support agent convinced to provide sensitive financial information to customers, or an HR bot that shares employee salary data.
And now that AI is being asked to play an increasingly important role in customer service and sales functions, another challenge is emerging. Users may be able to persuade the AI to give massive discounts or inappropriate refunds. Recently, a dealership robot “sold” a 2024 Chevrolet Tahoe for $1 to a creative and persistent user.
How to protect your organization
Today, there are entire forums where people share tips for getting around the guardrails surrounding AI. It’s a kind of arms race; exploits emerge, are shared online, and then are usually quickly shut down by public LLMs. The challenge of catching up is much more difficult for other robot owners and operators.
There is no way to avoid all the risks associated with misuse of AI. Think of rapid injection as a backdoor built into any AI system that allows user prompts. You can't completely secure the door, but you can make it much more difficult to open. Here are the things you should do now to minimize the chances of a bad outcome.
Define the right conditions of use to protect yourself
Legal terms will obviously not be enough to ensure your security, but their implementation remains vital. Your terms of use must be clear, complete and adapted to the specific nature of your solution. Don't skip this! Make sure to force user acceptance.
Limit the data and actions available to the user
The safest solution to minimizing risk is to restrict what is accessible to only what is necessary. If the agent has access to the data or tools, it is at least possible that the user will find a way to trick the system into making them available. It's the principle of least privilege: This has always been a good design principle, but it becomes absolutely vital with AI.
Use evaluation frameworks
There are frameworks and solutions that allow you to test how your LLM system responds to different inputs. It is important to do this before making your agent available, but also to continue to monitor this on an ongoing basis.
These allow you to test for certain vulnerabilities. They essentially simulate rapid injection behavior, allowing you to understand and fix any vulnerabilities. The goal is to block the threat… or at least monitor it.
Familiar threats in a new context
These suggestions on how to protect yourself may sound familiar: For many of you with a tech background, the danger presented by a rapid injection is reminiscent of running applications in a browser. Although the context and some specifics are unique to AI, the challenges of avoiding exploits and blocking extraction of code and data are similar.
Yes, LLMs are new and somewhat unknown, but we have the techniques and practices to guard against this type of threat. You just need to apply them correctly in a new context.
Remember: it's not just about blocking major hackers. Sometimes it's just about stopping obvious challenges (many “exploits” are simply users asking the same thing over and over!).
It is also important to avoid the trap of blaming rapid injection for any unexpected and unwanted LLM behavior. It's not always the users' fault. Remember: LLMs show the ability to reason and problem-solve, and to be creative. So when users ask the LLM to do something, the solution looks at everything it has (data and tools) to fulfill the request. The results may seem surprising or even problematic, but it is possible that they are coming from your own system.
The bottom line when it comes to rapid injection is this: take it seriously and minimize the risk, but don't let it hold you back.
Cai GoGwilt is the co-founder and chief architect of Battleship.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.
If you want to learn more about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You might even consider contribute to an article your own!