## Grouped Bar Chart: Average Harmlessness Score by Safety Category and Model
### Overview
This grouped bar chart compares the **average harmlessness score** (y-axis, 0–8 scale) of four models (SFT, beaver-7b-v3.0, SACPO (P), RSA (P)) across eight **safety categories** (x-axis: Crime, Emotional Harm, Immoral, Insult, Physical Harm, Pornographic, Privacy, Social Bias). The chart visualizes how each model performs in terms of harmlessness across different safety contexts.
### Components/Axes
- **X-axis**: *Safety Category* (labels: Crime, Emotional Harm, Immoral, Insult, Physical Harm, Pornographic, Privacy, Social Bias).
- **Y-axis**: *Average Harmlessness Score* (scale: 0, 2, 4, 6, 8; grid lines at these intervals).
- **Legend** (bottom-right of the chart):
- SFT: Dark gray
- beaver-7b-v3.0: Light pink
- SACPO (P): Green
- RSA (P): Red
### Detailed Analysis (Scores by Category and Model)
For each safety category, the average harmlessness score (approximate values) for each model is:
| Safety Category | SFT (Dark Gray) | beaver-7b-v3.0 (Light Pink) | SACPO (P) (Green) | RSA (P) (Red) |
|-------------------|-----------------|-----------------------------|-------------------|---------------|
| Crime | ~1.5 | ~8.5 | ~8.8 | ~8.8 |
| Emotional Harm | ~6.3 | ~7.2 | ~7.3 | ~8.5 |
| Immoral | ~2.0 | ~7.9 | ~8.6 | ~8.5 |
| Insult | ~5.9 | ~6.9 | ~7.0 | ~7.9 |
| Physical Harm | ~6.1 | ~7.7 | ~7.8 | ~8.9 |
| Pornographic | ~4.8 | ~4.0 | ~5.1 | ~5.9 |
| Privacy | ~2.8 | ~8.4 | ~8.9 | ~8.5 |
| Social Bias | ~6.4 | ~7.7 | ~7.8 | ~8.5 |
### Key Observations
1. **SFT Consistently Low**: SFT has the lowest harmlessness scores across most categories (e.g., Crime: ~1.5, Immoral: ~2.0, Privacy: ~2.8), indicating it is less harmless (more harmful) than the other models.
2. **RSA (P) Often Highest**: RSA (P) achieves the highest scores in multiple categories (e.g., Emotional Harm: ~8.5, Physical Harm: ~8.9, Social Bias: ~8.5), suggesting it is the most harmless model overall.
3. **Pornographic Category Anomaly**: In the *Pornographic* category, beaver-7b-v3.0 has the lowest score (~4.0) among non-SFT models, while RSA (P) is the highest (~5.9).
4. **SACPO (P) vs. RSA (P) Similarity**: SACPO (P) and RSA (P) have nearly identical scores in many categories (e.g., Crime: ~8.8 vs. ~8.8; Immoral: ~8.6 vs. ~8.5), indicating comparable harmlessness performance.
### Interpretation
The chart reveals that model performance in harmlessness is **context-dependent** (e.g., *Pornographic* has lower overall scores, while *Crime* and *Privacy* have higher scores for non-SFT models). SFT’s consistently low scores suggest it may be less safe for applications requiring high harmlessness. In contrast, RSA (P) and SACPO (P) perform well across most categories, with RSA (P) often leading. This data is critical for evaluating model safety and selecting appropriate models for use cases where harmlessness is a priority (e.g., content moderation, AI ethics).
(Note: All values are approximate, based on visual estimation of bar heights relative to the y-axis scale.)