## Bar Chart: Attack Success Rate (ASR) Comparison Across Adversarial Methods and Model Ablations
### Overview
The image presents a comparative bar chart analyzing the effectiveness of three adversarial attack methods (Advbench, Jailbreakbench, Malicious Instruct) under two configurations ("use-tem" and "direct") across two language models (Llama-2-7b-chat-hf and Vicuna-7b-v1.5). The chart evaluates attack success rates (ASR) against ablation of attention head numbers (0-5). Two average lines (Vanilla and Use-tem) are overlaid to show overall trends.
### Components/Axes
- **X-axis**: "Ablating Head Numbers" (0-5, integer labels)
- **Y-axis**: "Attack Success Rate (ASR)" (0.0-1.0, linear scale)
- **Legend**:
- Top-right for Llama-2-7b-chat-hf:
- Red (Advbench use-tem), Yellow (Jailbreakbench use-tem), Teal (Malicious Instruct use-tem)
- Red (Advbench direct), Yellow (Jailbreakbench direct), Teal (Malicious Instruct direct)
- Top-right for Vicuna-7b-v1.5:
- Same color coding as above
- Pink line: Vanilla Average
- Purple line: Use-tem Average
### Detailed Analysis
#### Llama-2-7b-chat-hf
- **Head 0**:
- Advbench use-tem (red striped): ~0.15 ASR
- Jailbreakbench use-tem (yellow striped): ~0.22 ASR
- Malicious Instruct use-tem (teal striped): ~0.05 ASR
- Direct methods show similar but slightly lower values.
- **Head 1-5**:
- All "use-tem" methods show ASR >0.4, with Malicious Instruct use-tem peaking at ~0.7 (Head 1).
- "Direct" methods consistently underperform "use-tem" counterparts.
- Vanilla Average (~0.6) and Use-tem Average (~0.5) lines show gradual convergence.
#### Vicuna-7b-v1.5
- **Head 0**:
- Advbench use-tem: ~0.3 ASR
- Jailbreakbench use-tem: ~0.4 ASR
- Malicious Instruct use-tem: ~0.2 ASR
- **Head 1-5**:
- "Use-tem" methods maintain ASR >0.5, with Malicious Instruct use-tem reaching ~0.75 (Head 4).
- "Direct" methods show ASR <0.5 across all heads.
- Use-tem Average (~0.6) remains consistently above Vanilla Average (~0.55).
### Key Observations
1. **Head Number Correlation**: Higher head numbers (4-5) correlate with increased ASR for "use-tem" methods, particularly Malicious Instruct.
2. **Method Effectiveness**: "Use-tem" configurations consistently outperform "direct" methods by 20-40% across all heads.
3. **Model-Specific Trends**:
- Llama-2 shows steeper ASR growth with head ablation.
- Vicuna maintains more stable ASR but higher baseline performance.
4. **Average Lines**: Use-tem Average exceeds Vanilla Average by ~0.1 across both models.
### Interpretation
The data demonstrates that template-based adversarial methods ("use-tem") significantly enhance attack success rates compared to direct implementations. This suggests template engineering improves prompt injection efficacy, possibly through better context alignment or evasion of detection mechanisms. The head number ablation reveals that larger model configurations (more heads) enable more sophisticated attacks, particularly for template-based methods. The consistent performance gap between "use-tem" and "direct" methods across both models indicates that template design is a critical factor in adversarial prompt effectiveness. The Vanilla Average line's lower position suggests baseline model robustness against direct attacks, while the Use-tem Average highlights the vulnerability introduced by template-based approaches.