Ensuring every AI Agent and Application is secure from adversarial attacks, and compliant with the latest legislation. Reach out for a security audit or pentest!
Al Security & Compliance. Previously at Snappr, Observe, Tesla. Berkeley Data Science & Econ‎
Keeping AI companies secure and compliant! Prev @Roblox engine-security, @Google, and founded Pundit Analytics(Acquired in 2020).
TL;DR - PromptArmor detects and responds to LLM inputs, outputs, and actions for adversarial content. We return in real time faster than LLMs and keep our threat intelligence up to date so you don’t have to.
Hey folks! We are excited to launch the first AI Detection and Response (AIDR) system for LLM Applications. For those familiar with traditional security, think of us as EDR for LLM systems. Users of LLM applications want powerful functionality, and need a way to preserve that powerful functionality without exposing themselves to security risks, specifically:
Common misconceptions
There are some common misconceptions about securing LLM applications. LLM security risks still exist even if:
If LLM applications read external data and send it to an LLM, they inherently have security risks and create security risks for their customers. We know this because we’ve pentested over 50 LLM applications in production and found the same vulnerability: reading data that could have originated from an attacker can manipulate LLMs into:
LLMs are now the weakest link in the chain, and traditional security risks are being delivered via LLMs straight into your systems and nobody is monitoring it. It is the same risk as arbitrarily executing code or hiring a malicious employee.
What is the threat?
LLM applications and agents commonly do three things:
Attackers can easily hide, within larger input sources, adversarial input intended to mislead LLMs. Here’s an example of an adversarial input hidden in white text, intended to exfiltrate user data files, on a seemingly innocent website:
When somebody sends this webpage to an LLM for processing, the injected content overrides your system prompt. If the system prompt is overridden, the LLM is now your adversary and any other data accessible by the LLM can be leaked easily, the LLM can execute phishing attacks within your trusted environment, or it can manipulate systems it has access to.
So although a normal workflow and what you expect might look like this:
Simply by having an LLM in the loop, you expose yourself to much worse risks:
Attacks in the Wild
We’ve gone through most of the published adversarial inputs over the past year, and here are the top three threats we saw, ranked by severity.
Let’s walk through a few of the most interesting examples.
1) Hacking Google Bard - From Prompt Injection to Data Exfiltration
This is an exciting one. The pentesters found that if a user of Bard Chat navigates to any source with an injected instruction - even a YouTube video - the attacker would be able to get the contents of any uploaded file from Google Drive. They did this by getting the chat to render a markdown image, and append the contents of the Google Drive documents to the image. When the image is rendered in markdown, a GET request is sent to the attacker’s server, and they get all the content. That’s scary; imagine a random attacker having access to all of your Google docs just because you chatted with Bard Chat.
(source: https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/)
PromptArmor would prevent this immediately - it catches the inputted injection they used. It would also flag some of the exfiltrated information (PII such as name, address, etc) on the output side.
2) Turning Bing Chat Into a Data Pirate
(source: https://greshake.github.io/)
This one is scary. The attacker successfully injected Bing Chat to turn into an adversarial social engineer, which is actively trying to exfiltrate data. It would sneakily get the user to reveal their name, and then as soon as the user revealed their name, it would suggest that the user clicks on an exfiltration link. The link URL has the contents of their name appended but obfuscated.
(source: https://greshake.github.io/)
The scary thing is that the user thinks they are safe because they are interacting with Bing Chat, a trusted source, but they are really perpetually at risk. This could be used to scam a user into giving up their credit card information, their date of birth, their address, or other personal information. A scammer could use this information to propagate more threats (more phishing, scam mail, blackmail, etc).
Again, PromptArmor catches this injection on input, so it would never be sent to the LLM. It would also flag the output link as malicious and catch the PII before the link is rendered to the user.
You can also check out our blog post which has another example of data exfiltration.
Why do I need to worry about this now?
How does PromptArmor work?
There are two ways to make sure LLM agents and applications are secure enough to protect user data and prevent known threats like phishing, malware, etc.
Given that the agents most useful to customers are also the most powerful, we find the second approach to be more fruitful. However, usually, after a threat is detected and reported, companies seem to go into panic mode and implement approach 1 in a hurry, which makes their product worse.
We have 3 offerings that fall under the second approach:
PromptArmor continually monitors for threats across all mechanisms your LLM Agent or Application uses to interact with data sources (external or internal), and LLMs. Threats are always evolving and new ones are emerging every day. PromptArmor stays up to date so you don’t have to.
Here is a system diagram, which mirrors the architecture many companies are using.
How do I sign up?