## Heatmap: Prompt Transferability Matrix
### Overview
The image displays a square heatmap titled "Prompt Transferability Matrix," visualizing the percentage of prompt transferability between four AI models: GPT-4, Claude 2, Mistral 7B, and Vicuna. The matrix uses a color gradient from blue (0%) to red (60%) to represent transferability values, with numerical annotations in each cell.
### Components/Axes
- **X-axis (Horizontal)**: Models (GPT-4, Claude 2, Mistral 7B, Vicuna)
- **Y-axis (Vertical)**: Models (GPT-4, Claude 2, Mistral 7B, Vicuna)
- **Legend**: Colorbar on the right, ranging from 0% (blue) to 60% (red), labeled "Transferability (%)"
- **Annotations**: Numerical values in each cell, representing transferability percentages
### Detailed Analysis
- **GPT-4 Row**:
- GPT-4 → Claude 2: 61.9%
- GPT-4 → Mistral 7B: 49.7%
- GPT-4 → Vicuna: 48.3%
- **Claude 2 Row**:
- Claude 2 → GPT-4: 64.1%
- Claude 2 → Mistral 7B: 47.2%
- Claude 2 → Vicuna: 46.1%
- **Mistral 7B Row**:
- Mistral 7B → GPT-4: 54.2%
- Mistral 7B → Claude 2: 50.3%
- Mistral 7B → Vicuna: 45.7%
- **Vicuna Row**:
- Vicuna → GPT-4: 52.7%
- Vicuna → Claude 2: 48.6%
- Vicuna → Mistral 7B: 46.5%
### Key Observations
1. **Diagonal Zeros**: All diagonal cells (e.g., GPT-4 → GPT-4) are 0.0%, indicating no self-transferability.
2. **Highest Transferability**:
- Claude 2 → GPT-4: 64.1% (darkest red)
- GPT-4 → Claude 2: 61.9%
3. **Lowest Transferability**:
- Mistral 7B → Vicuna: 45.7%
- Vicuna → Mistral 7B: 46.5%
4. **Symmetry**: Transferability values are not perfectly symmetric (e.g., GPT-4 → Claude 2 ≠ Claude 2 → GPT-4).
### Interpretation
The matrix reveals that **GPT-4 and Claude 2 exhibit the highest mutual prompt transferability**, suggesting shared architectural or training characteristics. Mistral 7B and Vicuna show lower transferability, particularly between each other (45.7–46.5%), indicating distinct prompt-handling mechanisms. The absence of self-transferability (diagonal zeros) implies models cannot reuse their own prompts effectively. These results highlight variability in how different AI systems interpret and adapt prompts, with GPT-4 and Claude 2 being the most compatible pair.