PromptArmor: Security and Compliance for LLM Applications

TL;DR - PromptArmor detects and responds to LLM inputs, outputs, and actions for adversarial content. We return in real time faster than LLMs and keep our threat intelligence up to date so you don’t have to.

Hey folks! We are excited to launch the first AI Detection and Response (AIDR) system for LLM Applications. For those familiar with traditional security, think of us as EDR for LLM systems. Users of LLM applications want powerful functionality, and need a way to preserve that powerful functionality without exposing themselves to security risks, specifically:

Data exfiltration
Phishing
System Manipulation

Common misconceptions

There are some common misconceptions about securing LLM applications. LLM security risks still exist even if:

there’s no chatbot function
models are hosted in a VPC or on-prem
data is housed in a secure location
you’re not using LLMs but your vendors are
you’re only using internal-facing LLMs

If LLM applications read external data and send it to an LLM, they inherently have security risks and create security risks for their customers. We know this because we’ve pentested over 50 LLM applications in production and found the same vulnerability: reading data that could have originated from an attacker can manipulate LLMs into:

exfiltrating confidential data they have access to
introducing phishing links within a trusted product interface
manipulating systems you’ve given the LLM access to

LLMs are now the weakest link in the chain, and traditional security risks are being delivered via LLMs straight into your systems and nobody is monitoring it. It is the same risk as arbitrarily executing code or hiring a malicious employee.

What is the threat?

LLM applications and agents commonly do three things:

Read external input: websites, emails, support tickets, logs, public codebases, resumes, etc.
Access sensitive data: PII, confidential customer information, CRMs, emails, code, etc.
Send this information to an LLM: this is where the magic happens

Attackers can easily hide, within larger input sources, adversarial input intended to mislead LLMs. Here’s an example of an adversarial input hidden in white text, intended to exfiltrate user data files, on a seemingly innocent website:

When somebody sends this webpage to an LLM for processing, the injected content overrides your system prompt. If the system prompt is overridden, the LLM is now your adversary and any other data accessible by the LLM can be leaked easily, the LLM can execute phishing attacks within your trusted environment, or it can manipulate systems it has access to.

So although a normal workflow and what you expect might look like this:

Simply by having an LLM in the loop, you expose yourself to much worse risks:

Attacks in the Wild

We’ve gone through most of the published adversarial inputs over the past year, and here are the top three threats we saw, ranked by severity.

Let’s walk through a few of the most interesting examples.

1) Hacking Google Bard - From Prompt Injection to Data Exfiltration

This is an exciting one. The pentesters found that if a user of Bard Chat navigates to any source with an injected instruction - even a YouTube video - the attacker would be able to get the contents of any uploaded file from Google Drive. They did this by getting the chat to render a markdown image, and append the contents of the Google Drive documents to the image. When the image is rendered in markdown, a GET request is sent to the attacker’s server, and they get all the content. That’s scary; imagine a random attacker having access to all of your Google docs just because you chatted with Bard Chat.

(source: https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/)

PromptArmor would prevent this immediately - it catches the inputted injection they used. It would also flag some of the exfiltrated information (PII such as name, address, etc) on the output side.

2) Turning Bing Chat Into a Data Pirate

(source: https://greshake.github.io/)

This one is scary. The attacker successfully injected Bing Chat to turn into an adversarial social engineer, which is actively trying to exfiltrate data. It would sneakily get the user to reveal their name, and then as soon as the user revealed their name, it would suggest that the user clicks on an exfiltration link. The link URL has the contents of their name appended but obfuscated.

(source: https://greshake.github.io/)

The scary thing is that the user thinks they are safe because they are interacting with Bing Chat, a trusted source, but they are really perpetually at risk. This could be used to scam a user into giving up their credit card information, their date of birth, their address, or other personal information. A scammer could use this information to propagate more threats (more phishing, scam mail, blackmail, etc).

Again, PromptArmor catches this injection on input, so it would never be sent to the LLM. It would also flag the output link as malicious and catch the PII before the link is rendered to the user.

You can also check out our blog post which has another example of data exfiltration.

Why do I need to worry about this now?

You currently have no way to monitor if you’re being attacked or have been attacked. Without this, you are running blind and are compromising your customers’ systems as well. .
Everyone is quickly evaluating their LLM security strategy and looking for viable options. You can’t be the only person in the market without one.
Attacks are already happening in the wild, they just haven’t been publicized yet (but they will be). Once it happens, it's too late. Don't be subject to an attack and lose credibility with your customers.
More enterprises are becoming aware, and they care. The most security-aware ones are already conducting LLM pentests on their vendors and their systems, and most others will soon follow suit.

How does PromptArmor work?

There are two ways to make sure LLM agents and applications are secure enough to protect user data and prevent known threats like phishing, malware, etc.

“Neutering”: limit their functionality to make them less powerful and incapable of taking certain actions (rendering images in markdown, preventing accessing the web, etc).
Accelerating: let them continue to be powerful, expand their capabilities, and scan specifically for harmful threats and prevent those.

Given that the agents most useful to customers are also the most powerful, we find the second approach to be more fruitful. However, usually, after a threat is detected and reported, companies seem to go into panic mode and implement approach 1 in a hurry, which makes their product worse.

We have 3 offerings that fall under the second approach:

Analyze Input: we scan inputs for adversarial content before sending them to LLMs, by using a combination of heuristics, models, anomaly detection, and signature-based detection.
Analyze Actions: we check actions for issues with a combination of heuristics, models, anomaly detection, and signature-based detection.
Analyze Output: we scan for known data exfiltration methods (markdown rendering, etc) and content that you don’t want exfiltrated (PII, etc).

PromptArmor continually monitors for threats across all mechanisms your LLM Agent or Application uses to interact with data sources (external or internal), and LLMs. Threats are always evolving and new ones are emerging every day. PromptArmor stays up to date so you don’t have to.

Here is a system diagram, which mirrors the architecture many companies are using.

How do I sign up?

LLM Pentest: If you have an agent/application that uses an LLM sign up here to understand the unique threat vectors you face.
API: If you already know you’re vulnerable, sign up for an API Key. We are SOC2 Compliant and can deploy on your VPC if your business requires it.
General Interest: If you want to learn more about the space and how it applies to you, sign up for an LLM security chat!

PromptArmor

Security and Compliance for LLM Applications

Jobs at PromptArmor

Active Founders

Shankar Krishnan

Vikram Jayanthi

Company Launches

🛡️PromptArmor: Secure LLM applications without ruining functionality