Image 28e229748609...

EXPERT: gemini-2.0-flash VERSION 1

RUNTIME: nugit/gemini/gemini-2.0-flash

INTEL_VERIFIED

## Bar Chart: Attack Success Rate (ASR) Comparison

### Overview
The image is a set of bar charts comparing the attack success rate (ASR) of different defense mechanisms against various adversarial attacks. The x-axis represents the defense mechanisms, and the y-axis represents the attack success rate (ASR) ranging from 0.0 to 1.0. Each chart corresponds to a different adversarial attack method.

### Components/Axes
*   **Y-axis:** Attack Success Rate (ASR), ranging from 0.0 to 1.0 in increments of 0.2.
*   **X-axis:** Defense mechanisms:
    *   Azure Prompt Shield (light teal)
    *   Protect AI v1 (light blue)
    *   Meta Prompt Guard (light green)
    *   Vijil Prompt Injection (yellow)
    *   NeMo Guard Jailbreak Detect (light tan)
*   **Chart Titles (Adversarial Attack Methods):**
    *   BAE
    *   Bert-Attack
    *   Deep Word Bug
    *   Alzantot
    *   Pruthi
    *   PWWS
    *   TextBugger
    *   TextFooler
*   **Legend:** Located at the bottom of the chart, mapping colors to defense mechanisms.

### Detailed Analysis

**BAE:**
*   Azure Prompt Shield: ASR ~0.12
*   Protect AI v1: ASR ~0.23
*   Meta Prompt Guard: ASR ~0.03
*   Vijil Prompt Injection: ASR ~0.27
*   NeMo Guard Jailbreak Detect: ASR ~0.83

**Bert-Attack:**
*   Azure Prompt Shield: ASR ~0.15
*   Protect AI v1: ASR ~0.25
*   Meta Prompt Guard: ASR ~0.08
*   Vijil Prompt Injection: ASR ~0.22
*   NeMo Guard Jailbreak Detect: ASR ~0.54

**Deep Word Bug:**
*   Azure Prompt Shield: ASR ~0.17
*   Protect AI v1: ASR ~0.24
*   Meta Prompt Guard: ASR ~0.10
*   Vijil Prompt Injection: ASR ~0.26
*   NeMo Guard Jailbreak Detect: ASR ~0.92

**Alzantot:**
*   Azure Prompt Shield: ASR ~0.16
*   Protect AI v1: ASR ~0.17
*   Meta Prompt Guard: ASR ~0.01
*   Vijil Prompt Injection: ASR ~0.03
*   NeMo Guard Jailbreak Detect: ASR ~0.57

**Pruthi:**
*   Azure Prompt Shield: ASR ~0.14
*   Protect AI v1: ASR ~0.12
*   Meta Prompt Guard: ASR ~0.01
*   Vijil Prompt Injection: ASR ~0.04
*   NeMo Guard Jailbreak Detect: ASR ~0.57

**PWWS:**
*   Azure Prompt Shield: ASR ~0.15
*   Protect AI v1: ASR ~0.33
*   Meta Prompt Guard: ASR ~0.21
*   Vijil Prompt Injection: ASR ~0.48
*   NeMo Guard Jailbreak Detect: ASR ~0.65

**TextBugger:**
*   Azure Prompt Shield: ASR ~0.12
*   Protect AI v1: ASR ~0.32
*   Meta Prompt Guard: ASR ~0.20
*   Vijil Prompt Injection: ASR ~0.73
*   NeMo Guard Jailbreak Detect: ASR ~0.95

**TextFooler:**
*   Azure Prompt Shield: ASR ~0.11
*   Protect AI v1: ASR ~0.30
*   Meta Prompt Guard: ASR ~0.19
*   Vijil Prompt Injection: ASR ~0.73
*   NeMo Guard Jailbreak Detect: ASR ~0.90

### Key Observations
*   NeMo Guard Jailbreak Detect consistently has the highest attack success rate (ASR) across all adversarial attacks.
*   Azure Prompt Shield and Meta Prompt Guard generally have the lowest ASR values.
*   Vijil Prompt Injection shows a high ASR for TextBugger and TextFooler.
*   Protect AI v1 shows a moderate ASR across all attacks.

### Interpretation
The data suggests that NeMo Guard Jailbreak Detect is the least effective defense mechanism against these adversarial attacks, as it consistently shows the highest attack success rate. Azure Prompt Shield and Meta Prompt Guard appear to be more effective, exhibiting lower ASR values. The effectiveness of Vijil Prompt Injection varies depending on the specific attack, being particularly successful against TextBugger and TextFooler. Protect AI v1 provides a moderate level of defense across all attacks. The choice of defense mechanism should be tailored to the specific adversarial attack being considered.

DECODING INTELLIGENCE...

EXPERT: gemma-3-27b-it-free VERSION 1

RUNTIME: google-free/gemma-3-27b-it

INTEL_VERIFIED

\n
## Bar Chart: Attack Success Rate (ASR) by Attack and Defense

### Overview
This image presents a bar chart comparing the Attack Success Rate (ASR) of various attacks against different defense mechanisms. The chart is organized into a 2x4 grid, with each cell representing a specific attack type and displaying the ASR for each defense.

### Components/Axes
*   **X-axis:** Represents the defense mechanisms: Azure Prompt Shield (light green), Protect AI v1 (grey), Meta Prompt Guard (yellow), Vijil Prompt Injection (blue), and NeMo Guard Jailbreak Detect (pink).
*   **Y-axis:** Represents the Attack Success Rate (ASR), ranging from 0.0 to 1.0.
*   **Chart Title (implied):** Attack Success Rate (ASR) by Attack and Defense
*   **Sub-Chart Titles:** BAE, Bert-Attack, Deep Word Bug, Alzantot, Pruthi, PWWS, TextBugger, TextFooler. These represent the different attack types.
*   **Legend:** Located at the bottom of the chart, associating colors with each defense mechanism.

### Detailed Analysis
The chart consists of eight sub-charts, each representing a different attack. For each attack, the ASR is shown for each of the five defense mechanisms.

**BAE:**
*   Azure Prompt Shield: Approximately 0.12
*   Protect AI v1: Approximately 0.90
*   Meta Prompt Guard: Approximately 0.28
*   Vijil Prompt Injection: Approximately 0.16
*   NeMo Guard Jailbreak Detect: Approximately 0.65

**Bert-Attack:**
*   Azure Prompt Shield: Approximately 0.08
*   Protect AI v1: Approximately 0.92
*   Meta Prompt Guard: Approximately 0.24
*   Vijil Prompt Injection: Approximately 0.12
*   NeMo Guard Jailbreak Detect: Approximately 0.60

**Deep Word Bug:**
*   Azure Prompt Shield: Approximately 0.16
*   Protect AI v1: Approximately 0.92
*   Meta Prompt Guard: Approximately 0.32
*   Vijil Prompt Injection: Approximately 0.16
*   NeMo Guard Jailbreak Detect: Approximately 0.90

**Alzantot:**
*   Azure Prompt Shield: Approximately 0.16
*   Protect AI v1: Approximately 0.88
*   Meta Prompt Guard: Approximately 0.32
*   Vijil Prompt Injection: Approximately 0.12
*   NeMo Guard Jailbreak Detect: Approximately 0.68

**Pruthi:**
*   Azure Prompt Shield: Approximately 0.12
*   Protect AI v1: Approximately 0.60
*   Meta Prompt Guard: Approximately 0.28
*   Vijil Prompt Injection: Approximately 0.24
*   NeMo Guard Jailbreak Detect: Approximately 0.24

**PWWS:**
*   Azure Prompt Shield: Approximately 0.12
*   Protect AI v1: Approximately 0.56
*   Meta Prompt Guard: Approximately 0.28
*   Vijil Prompt Injection: Approximately 0.32
*   NeMo Guard Jailbreak Detect: Approximately 0.48

**TextBugger:**
*   Azure Prompt Shield: Approximately 0.16
*   Protect AI v1: Approximately 0.64
*   Meta Prompt Guard: Approximately 0.32
*   Vijil Prompt Injection: Approximately 0.24
*   NeMo Guard Jailbreak Detect: Approximately 0.64

**TextFooler:**
*   Azure Prompt Shield: Approximately 0.16
*   Protect AI v1: Approximately 0.56
*   Meta Prompt Guard: Approximately 0.32
*   Vijil Prompt Injection: Approximately 0.24
*   NeMo Guard Jailbreak Detect: Approximately 0.28

**Trends:**
*   Protect AI v1 consistently exhibits the highest ASR across all attacks, often approaching 1.0.
*   Azure Prompt Shield generally has the lowest ASR, typically below 0.2.
*   Meta Prompt Guard and Vijil Prompt Injection show moderate ASRs, varying between 0.2 and 0.4.
*   NeMo Guard Jailbreak Detect has a variable ASR, ranging from approximately 0.2 to 0.9.

### Key Observations
*   Protect AI v1 appears to be the least effective defense against all tested attacks.
*   Azure Prompt Shield is the most effective defense, consistently providing the lowest ASR.
*   The ASR varies significantly depending on the attack type, even with the same defense mechanism.
*   NeMo Guard Jailbreak Detect's performance is inconsistent, showing high ASR for some attacks (Deep Word Bug) and lower ASR for others (Pruthi, TextFooler).

### Interpretation
The data suggests that different defense mechanisms have varying levels of effectiveness against different types of prompt injection attacks. Protect AI v1, while attempting to provide a defense, seems to be easily bypassed by most attacks, indicating a potential weakness in its design or implementation. Azure Prompt Shield consistently demonstrates the strongest defense, suggesting a more robust approach to mitigating these attacks. The variability in NeMo Guard Jailbreak Detect's performance highlights the complexity of prompt injection and the challenges in developing a universally effective defense.

The relationship between attack type and ASR indicates that attacks exploit different vulnerabilities in language models. Some attacks may be more effective against defenses that focus on specific patterns or keywords, while others may be more successful at exploiting more subtle weaknesses.

The high ASR for Protect AI v1 across all attacks is a notable outlier, suggesting a fundamental flaw in its approach. This could be due to a lack of comprehensive coverage of attack vectors or a susceptibility to adversarial examples. Further investigation is needed to understand the root cause of this vulnerability.

DECODING INTELLIGENCE...

EXPERT: healer-alpha-free VERSION 1

RUNTIME: free/openrouter/healer-alpha

INTEL_VERIFIED

## Bar Chart Grid: Attack Success Rates of Various Defense Methods Against Different Adversarial Attacks

### Overview
The image displays a 2x4 grid of bar charts, each comparing the performance of five different defense mechanisms against a specific adversarial text attack method. The primary metric is the Attack Success Rate (ASR), where a higher bar indicates the attack was more successful (i.e., the defense was less effective). The overall purpose is to benchmark and compare the robustness of these AI safety tools.

### Components/Axes
*   **Y-Axis (Common to all subplots):** Labeled "Attack Success Rate (ASR)". The scale runs from 0.0 to 1.0, with major gridlines at intervals of 0.2.
*   **X-Axis (Implicit):** Each subplot represents a different attack method. The five bars within each subplot correspond to the five defense methods, ordered consistently.
*   **Subplot Titles (Attack Methods):** The eight attack methods are, from top-left to bottom-right: BAE, Bert-Attack, Deep Word Bug, Alzantot, Pruthi, PWWS, TextBugger, TextFooler.
*   **Legend (Defense Methods):** Located at the bottom of the entire figure. It maps colors to defense methods:
    *   **Teal/Green:** Azure Prompt Shield
    *   **Blue-Grey:** Protect AI v1
    *   **Light Green:** Meta Prompt Guard
    *   **Yellow:** Vijil Prompt Injection
    *   **Tan/Beige:** NeMo Guard Jailbreak Detect

### Detailed Analysis
Below are the approximate ASR values for each defense method within each attack subplot. Values are estimated from the bar heights relative to the y-axis gridlines.

**1. BAE Attack:**
*   Azure Prompt Shield: ~0.12
*   Protect AI v1: ~0.23
*   Meta Prompt Guard: ~0.03 (very low)
*   Vijil Prompt Injection: ~0.27
*   NeMo Guard Jailbreak Detect: ~0.85 (highest)

**2. Bert-Attack:**
*   Azure Prompt Shield: ~0.12
*   Protect AI v1: ~0.26
*   Meta Prompt Guard: ~0.09
*   Vijil Prompt Injection: ~0.23
*   NeMo Guard Jailbreak Detect: ~0.50

**3. Deep Word Bug:**
*   Azure Prompt Shield: ~0.15
*   Protect AI v1: ~0.23
*   Meta Prompt Guard: ~0.18
*   Vijil Prompt Injection: ~0.28
*   NeMo Guard Jailbreak Detect: ~0.97 (near maximum)

**4. Alzantot:**
*   Azure Prompt Shield: ~0.13
*   Protect AI v1: ~0.14
*   Meta Prompt Guard: ~0.00 (no visible bar, likely 0)
*   Vijil Prompt Injection: ~0.06
*   NeMo Guard Jailbreak Detect: ~0.54

**5. Pruthi:**
*   Azure Prompt Shield: ~0.14
*   Protect AI v1: ~0.13
*   Meta Prompt Guard: ~0.00 (no visible bar, likely 0)
*   Vijil Prompt Injection: ~0.04
*   NeMo Guard Jailbreak Detect: ~0.57

**6. PWWS:**
*   Azure Prompt Shield: ~0.16
*   Protect AI v1: ~0.33
*   Meta Prompt Guard: ~0.22
*   Vijil Prompt Injection: ~0.49
*   NeMo Guard Jailbreak Detect: ~0.65

**7. TextBugger:**
*   Azure Prompt Shield: ~0.11
*   Protect AI v1: ~0.32
*   Meta Prompt Guard: ~0.21
*   Vijil Prompt Injection: ~0.73
*   NeMo Guard Jailbreak Detect: ~0.95

**8. TextFooler:**
*   Azure Prompt Shield: ~0.11
*   Protect AI v1: ~0.31
*   Meta Prompt Guard: ~0.29
*   Vijil Prompt Injection: ~0.74
*   NeMo Guard Jailbreak Detect: ~0.93

### Key Observations
1.  **Consistent Underperformer:** The **NeMo Guard Jailbreak Detect** (tan bar) consistently shows the highest or near-highest Attack Success Rate across all eight attack methods. Its ASR is particularly high (>0.85) against Deep Word Bug, TextBugger, and TextFooler.
2.  **Consistent Performers:** **Azure Prompt Shield** (teal) and **Protect AI v1** (blue-grey) generally maintain lower ASRs, typically below 0.35 across all attacks. Azure often has the lowest or second-lowest rate.
3.  **Variable Performance:**
    *   **Meta Prompt Guard** (light green) shows extreme variability. It is highly effective (ASR ~0.00-0.03) against BAE, Alzantot, and Pruthi, but less so against others like TextFooler (~0.29).
    *   **Vijil Prompt Injection** (yellow) also varies significantly. It performs relatively well against Alzantot and Pruthi but is highly vulnerable to TextBugger and TextFooler (ASR >0.70).
4.  **Attack Potency:** The **TextBugger** and **TextFooler** attacks appear to be the most potent overall, achieving very high ASRs against multiple defenses, especially NeMo Guard and Vijil. The **Pruthi** and **Alzantot** attacks seem less effective against this set of defenses.

### Interpretation
This chart provides a comparative security analysis of AI prompt defense systems. The data suggests a significant disparity in effectiveness:

*   **NeMo Guard Jailbreak Detect** appears to be the least robust defense among those tested against this suite of adversarial text attacks. Its high failure rate indicates it may be vulnerable to a wide range of jailbreaking or prompt injection techniques.
*   **Azure Prompt Shield** and **Protect AI v1** demonstrate more consistent, though not perfect, resilience. Their relatively low and stable ASRs suggest they employ more generalized or robust detection mechanisms.
*   The performance of **Meta Prompt Guard** and **Vijil Prompt Injection** is highly attack-dependent. This could indicate they are specialized against certain types of adversarial perturbations (e.g., Meta Guard excels against word-substitution attacks like BAE) but lack broad-spectrum coverage.
*   From a security perspective, the high ASRs for **TextBugger** and **TextFooler** mark them as particularly dangerous attack methods that require strong, specialized defenses. The near-total failure of NeMo Guard against these attacks is a critical finding.

**Conclusion:** The visualization effectively argues that defense selection must be informed by the expected threat model. No single defense is universally effective, and a layered or adaptive approach may be necessary for robust protection. The stark contrast between NeMo Guard and the others warrants further investigation into their underlying methodologies.

DECODING INTELLIGENCE...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free

INTEL_VERIFIED

## Bar Chart: Attack Success Rate (ASR) Comparison Across Adversarial Attacks and Defenses

### Overview
The chart compares the effectiveness of five defense mechanisms (Azure Prompt Shield, Protect AI v1, Meta Prompt Guard, Vijil Prompt Injection, NeMo Guard Jailbreak Detect) against eight adversarial attack methods (BAE, Bert-Attack, Deep Word Bug, Alzantot, Pruthi, PWWS, TextBugger, TextFooler). Attack Success Rate (ASR) is measured on a scale from 0.0 to 1.0, with higher values indicating greater vulnerability to attacks.

### Components/Axes
- **X-axis**: Adversarial attack methods (BAE, Bert-Attack, Deep Word Bug, Alzantot, Pruthi, PWWS, TextBugger, TextFooler).
- **Y-axis**: Attack Success Rate (ASR) from 0.0 to 1.0.
- **Legend**: 
  - Teal: Azure Prompt Shield
  - Blue: Protect AI v1
  - Green: Meta Prompt Guard
  - Yellow: Vijil Prompt Injection
  - Brown: NeMo Guard Jailbreak Detect

### Detailed Analysis
1. **BAE**:
   - Azure Prompt Shield: ~0.1
   - Protect AI v1: ~0.2
   - Meta Prompt Guard: ~0.05
   - Vijil Prompt Injection: ~0.25
   - NeMo Guard Jailbreak Detect: ~0.85

2. **Bert-Attack**:
   - Azure Prompt Shield: ~0.12
   - Protect AI v1: ~0.25
   - Meta Prompt Guard: ~0.08
   - Vijil Prompt Injection: ~0.22
   - NeMo Guard Jailbreak Detect: ~0.5

3. **Deep Word Bug**:
   - Azure Prompt Shield: ~0.15
   - Protect AI v1: ~0.22
   - Meta Prompt Guard: ~0.18
   - Vijil Prompt Injection: ~0.28
   - NeMo Guard Jailbreak Detect: ~0.95

4. **Alzantot**:
   - Azure Prompt Shield: ~0.12
   - Protect AI v1: ~0.15
   - Meta Prompt Guard: ~0.07
   - Vijil Prompt Injection: ~0.05
   - NeMo Guard Jailbreak Detect: ~0.55

5. **Pruthi**:
   - Azure Prompt Shield: ~0.1
   - Protect AI v1: ~0.1
   - Meta Prompt Guard: ~0.03
   - Vijil Prompt Injection: ~0.05
   - NeMo Guard Jailbreak Detect: ~0.55

6. **PWWS**:
   - Azure Prompt Shield: ~0.15
   - Protect AI v1: ~0.3
   - Meta Prompt Guard: ~0.2
   - Vijil Prompt Injection: ~0.45
   - NeMo Guard Jailbreak Detect: ~0.65

7. **TextBugger**:
   - Azure Prompt Shield: ~0.1
   - Protect AI v1: ~0.3
   - Meta Prompt Guard: ~0.2
   - Vijil Prompt Injection: ~0.7
   - NeMo Guard Jailbreak Detect: ~0.9

8. **TextFooler**:
   - Azure Prompt Shield: ~0.1
   - Protect AI v1: ~0.3
   - Meta Prompt Guard: ~0.3
   - Vijil Prompt Injection: ~0.75
   - NeMo Guard Jailbreak Detect: ~0.9

### Key Observations
- **NeMo Guard Jailbreak Detect** consistently shows the highest ASR across most attacks (e.g., 0.95 for Deep Word Bug, 0.9 for TextBugger), indicating it is the least effective defense.
- **Vijil Prompt Injection** often has the lowest ASR (e.g., 0.05 for Alzantot, 0.25 for BAE), suggesting it is the most effective defense.
- **Azure Prompt Shield** and **Protect AI v1** generally have moderate ASR values (0.1–0.3), indicating partial effectiveness.
- **Meta Prompt Guard** performs variably, with ASR ranging from 0.03 (Pruthi) to 0.3 (TextFooler).

### Interpretation
The data highlights significant disparities in defense effectiveness. **NeMo Guard Jailbreak Detect** fails to mitigate attacks effectively, allowing near-complete success rates in some cases (e.g., 0.95 for Deep Word Bug). In contrast, **Vijil Prompt Injection** demonstrates robust performance, reducing ASR to near-zero levels for certain attacks (e.g., 0.05 for Alzantot). This suggests that prompt-based defenses like Vijil and Azure are more reliable than jailbreak detection systems. The consistency of high ASR for NeMo across attacks implies systemic vulnerabilities in its design, warranting further investigation into its failure modes.

DECODING INTELLIGENCE...

TECHNICAL ASSET FINGERPRINT

28e22974860981951c553c2c

FOUND IN PAPERS

EXPERT: gemini-2.0-flash VERSION 1

EXPERT: gemma-3-27b-it-free VERSION 1

EXPERT: healer-alpha-free VERSION 1

EXPERT: nemotron-free VERSION 1