## 3D Surface Plots: Model Performance vs. Resource Budgets
### Overview
The image contains 12 3D surface plots arranged in two rows (6 per row), visualizing the relationship between two resource budgets (ROAR_qm and ROAR_kp) and model performance (MMR). Each plot represents a different scenario (e.g., Backdoor-Vulnerability, Targeted-Mitigation) with color gradients indicating performance levels (green = low MMR, yellow = high MMR). Numerical values are annotated on the surfaces to highlight key performance metrics.
---
### Components/Axes
- **X-axis**: ROAR_kp budget (ranging from 0 to 200 in most plots, 0–4 in some)
- **Y-axis**: ROAR_qm budget (ranging from 0 to 4 in most plots, 0–3 in some)
- **Z-axis**: MMR (Mean Reciprocal Rank, 0.00–1.00 scale)
- **Legend**: Implicit color gradient (green = low MMR, yellow = high MMR)
- **Plot Titles**:
- Row 1: Backdoor-Vulnerability, Backdoor-Mitigation, Backdoor-Diagnosis, Backdoor-Treatment, Backdoor-Freebase, Backdoor-WordNet
- Row 2: Targeted-Vulnerability, Targeted-Mitigation, Targeted-Diagnosis, Targeted-Treatment, Targeted-Freebase, Targeted-WordNet
---
### Detailed Analysis
#### Backdoor Scenarios (Row 1)
1. **(a) Backdoor-Vulnerability**
- Peaks at **0.56** (ROAR_kp=150, ROAR_qm=3) and **0.55** (ROAR_kp=50, ROAR_qm=0).
- Lowest MMR (**0.04**) at ROAR_kp=0, ROAR_qm=0.
- Gradual increase in MMR with higher budgets.
2. **(b) Backdoor-Mitigation**
- Highest MMR (**0.73**) at ROAR_kp=150, ROAR_qm=3.
- Sharp drop to **0.02** at ROAR_kp=0, ROAR_qm=0.
- Mitigation strategies significantly improve performance.
3. **(c) Backdoor-Diagnosis**
- Moderate peak (**0.40**) at ROAR_kp=150, ROAR_qm=3.
- Minimal improvement at lower budgets (**0.02** at ROAR_kp=0, ROAR_qm=0).
- Less effective than mitigation.
4. **(d) Backdoor-Treatment**
- High MMR (**0.72**) at ROAR_kp=150, ROAR_qm=3.
- Slightly lower than mitigation (**0.73**).
- Consistent improvement across budgets.
5. **(e) Backdoor-Freebase**
- Low MMR (**0.00** at ROAR_kp=0, ROAR_qm=0; **0.57** at ROAR_kp=150, ROAR_qm=3).
- Freebase models underperform compared to other scenarios.
6. **(f) Backdoor-WordNet**
- High MMR (**0.75**) at ROAR_kp=150, ROAR_qm=3.
- Slight dip to **0.55** at ROAR_kp=50, ROAR_qm=0.
- WordNet integration shows strong performance.
#### Targeted Scenarios (Row 2)
7. **(g) Targeted-Vulnerability**
- Extremely high MMR (**0.91**) at ROAR_kp=1, ROAR_qm=3.
- Drops to **0.02** at ROAR_kp=3, ROAR_qm=0.
- High vulnerability correlates with extreme performance swings.
8. **(h) Targeted-Mitigation**
- Moderate MMR (**0.22** at ROAR_kp=1, ROAR_qm=3; **0.02** at ROAR_kp=3, ROAR_qm=0).
- Mitigation reduces performance volatility but limits peak gains.
9. **(i) Targeted-Diagnosis**
- Low MMR (**0.26** at ROAR_kp=1, ROAR_qm=3; **0.00** at ROAR_kp=3, ROAR_qm=0).
- Diagnosis tools underperform in targeted scenarios.
10. **(j) Targeted-Treatment**
- Moderate MMR (**0.55** at ROAR_kp=1, ROAR_qm=3; **0.37** at ROAR_kp=3, ROAR_qm=0).
- Treatment improves performance but less than backdoor scenarios.
11. **(k) Targeted-Freebase**
- Very low MMR (**0.03** at ROAR_kp=1, ROAR_qm=3; **0.04** at ROAR_kp=3, ROAR_qm=0).
- Freebase models perform poorly in targeted settings.
12. **(l) Targeted-WordNet**
- Moderate MMR (**0.71** at ROAR_kp=1, ROAR_qm=3; **0.11** at ROAR_kp=3, ROAR_qm=0).
- WordNet integration shows resilience in targeted scenarios.
---
### Key Observations
1. **Budget Impact**: Higher ROAR_kp and ROAR_qm budgets generally correlate with higher MMR, except in targeted scenarios where resource allocation has diminishing returns.
2. **Scenario-Specific Performance**:
- Backdoor-Mitigation and Backdoor-WordNet achieve the highest MMR (0.73–0.75).
- Targeted-Freebase and Targeted-Diagnosis underperform (MMR < 0.3).
3. **Anomalies**:
- Targeted-Vulnerability (g) shows extreme MMR (**0.91**) at low budgets, suggesting overfitting or data leakage.
- Backdoor-Freebase (e) has near-zero MMR at zero budgets, indicating baseline inefficiency.
---
### Interpretation
The data demonstrates that resource allocation (ROAR_kp and ROAR_qm) significantly impacts model performance (MMR), with backdoor scenarios benefiting more from increased budgets than targeted scenarios. Mitigation and WordNet integration are critical for improving robustness, while freebase models struggle across all scenarios. The extreme performance in Targeted-Vulnerability (g) raises concerns about data quality or model overfitting. These insights highlight the need for scenario-specific resource optimization and advanced mitigation strategies to balance performance and efficiency.