Image 536737f1eaaa...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
## Bar Chart: Attack Success Rate (ASR) by Deletion Characters and Defense Methods

### Overview
The chart compares the Attack Success Rate (ASR) of five defense methods across 10 categories of adversarial text manipulations. Each category represents a specific type of text corruption or obfuscation technique. The y-axis ranges from 0.0 to 1.0, representing the probability of successful attacks bypassing each defense.

### Components/Axes
- **X-axis (Categories)**: 
  - Bidirectional Text
  - Spaces
  - Emoji Smuggling
  - Full Width Text
  - Homoglyphs
  - Numbers
  - Underline Accent Marks
  - Unicode Tags Smuggling
  - Upside Down Text
  - Zero Width
- **Y-axis (ASR)**: Attack Success Rate (0.0–1.0)
- **Legend (Bottom)**:
  - Azure Prompt Shield (teal)
  - Protect AI v1 (blue)
  - Meta Prompt Guard (green)
  - Vijil Prompt Injection (orange)
  - NeMo Guard Jailbreak Detect (beige)

### Detailed Analysis
1. **Bidirectional Text**:
   - Azure Prompt Shield: ~0.95
   - Protect AI v1: ~0.92
   - Meta Prompt Guard: ~0.94
   - Vijil Prompt Injection: ~0.93
   - NeMo Guard Jailbreak Detect: ~0.91

2. **Spaces**:
   - Azure Prompt Shield: ~0.70
   - Protect AI v1: ~0.65
   - Meta Prompt Guard: ~0.58
   - Vijil Prompt Injection: ~0.98
   - NeMo Guard Jailbreak Detect: ~0.10

3. **Emoji Smuggling**:
   - Azure Prompt Shield: ~0.97
   - Protect AI v1: ~0.95
   - Meta Prompt Guard: ~0.96
   - Vijil Prompt Injection: ~0.94
   - NeMo Guard Jailbreak Detect: ~0.93

4. **Full Width Text**:
   - Azure Prompt Shield: ~0.15
   - Protect AI v1: ~0.02
   - Meta Prompt Guard: ~0.01
   - Vijil Prompt Injection: ~0.99
   - NeMo Guard Jailbreak Detect: ~0.98

5. **Homoglyphs**:
   - Azure Prompt Shield: ~0.96
   - Protect AI v1: ~0.94
   - Meta Prompt Guard: ~0.52
   - Vijil Prompt Injection: ~0.97
   - NeMo Guard Jailbreak Detect: ~0.95

6. **Numbers**:
   - Azure Prompt Shield: ~0.72
   - Protect AI v1: ~0.70
   - Meta Prompt Guard: ~0.75
   - Vijil Prompt Injection: ~0.99
   - NeMo Guard Jailbreak Detect: ~0.73

7. **Underline Accent Marks**:
   - Azure Prompt Shield: ~0.98
   - Protect AI v1: ~0.96
   - Meta Prompt Guard: ~0.65
   - Vijil Prompt Injection: ~0.97
   - NeMo Guard Jailbreak Detect: ~0.95

8. **Unicode Tags Smuggling**:
   - Azure Prompt Shield: ~0.10
   - Protect AI v1: ~0.05
   - Meta Prompt Guard: ~0.03
   - Vijil Prompt Injection: ~0.99
   - NeMo Guard Jailbreak Detect: ~0.97

9. **Upside Down Text**:
   - Azure Prompt Shield: ~0.99
   - Protect AI v1: ~0.98
   - Meta Prompt Guard: ~0.97
   - Vijil Prompt Injection: ~0.96
   - NeMo Guard Jailbreak Detect: ~0.95

10. **Zero Width**:
    - Azure Prompt Shield: ~0.05
    - Protect AI v1: ~0.20
    - Meta Prompt Guard: ~0.98
    - Vijil Prompt Injection: ~0.99
    - NeMo Guard Jailbreak Detect: ~0.12

### Key Observations
- **Azure Prompt Shield** consistently performs well across most categories, with ASR >0.9 in 8/10 cases.
- **Vijil Prompt Injection** dominates in Full Width Text (ASR ~0.99) and Unicode Tags Smuggling (ASR ~0.99), suggesting specialized vulnerabilities.
- **NeMo Guard Jailbreak Detect** shows the lowest performance in Spaces (ASR ~0.10) and Zero Width (ASR ~0.12), indicating potential weaknesses in handling spacing and invisible characters.
- **Meta Prompt Guard** struggles with Homoglyphs (ASR ~0.52) and Underline Accent Marks (ASR ~0.65), suggesting challenges with character-level obfuscation.

### Interpretation
The data reveals significant variability in defense effectiveness depending on the attack type. Azure Prompt Shield demonstrates broad robustness, while Vijil Prompt Injection excels at bypassing defenses in specific scenarios (e.g., Full Width Text). The poor performance of NeMo Guard Jailbreak Detect in Spaces and Zero Width suggests potential flaws in handling non-printing characters. These findings highlight the importance of context-aware defense strategies, as no single method achieves universal protection. The high ASR values across most categories (median ~0.95) indicate that current defenses may have systemic vulnerabilities to text-based adversarial attacks.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

536737f1eaaaf9190028d9dd

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1