## Heatmap: Classification Accuracies
### Overview
The image is a heatmap visualizing classification accuracies across four models (TTPD, LR, CCS, MM) for six linguistic/categorical tasks. The color gradient ranges from purple (0.0) to yellow (1.0), with numerical values and standard deviations provided for each cell.
### Components/Axes
- **X-axis (Models)**: TTPD, LR, CCS, MM (left to right).
- **Y-axis (Categories)**:
- Conjunctions
- Disjunctions
- Affirmative German
- Negated German
- common_claim_true_false
- counterfact_true_false
- **Legend**: Color scale from purple (0.0) to yellow (1.0), positioned on the right.
- **Textual Elements**:
- Title: "Classification Accuracies"
- Subtitle: "Classification Accuracies" (repeated in the image).
- Numerical values with standard deviations (e.g., "81 ± 1").
### Detailed Analysis
- **Conjunctions**:
- TTPD: 81 ± 1 (yellow-orange)
- LR: 77 ± 3 (orange)
- CCS: 74 ± 11 (light orange)
- MM: 80 ± 1 (orange)
- **Disjunctions**:
- TTPD: 69 ± 1 (orange)
- LR: 63 ± 3 (light orange)
- CCS: 63 ± 8 (light orange)
- MM: 69 ± 1 (orange)
- **Affirmative German**:
- TTPD: 87 ± 0 (yellow)
- LR: 88 ± 2 (yellow)
- CCS: 76 ± 17 (orange)
- MM: 82 ± 2 (orange)
- **Negated German**:
- TTPD: 88 ± 1 (yellow)
- LR: 91 ± 2 (yellow)
- CCS: 78 ± 17 (orange)
- MM: 84 ± 1 (orange)
- **common_claim_true_false**:
- TTPD: 79 ± 0 (orange)
- LR: 74 ± 2 (light orange)
- CCS: 69 ± 11 (light orange)
- MM: 78 ± 1 (orange)
- **counterfact_true_false**:
- TTPD: 74 ± 0 (orange)
- LR: 77 ± 2 (orange)
- CCS: 71 ± 13 (light orange)
- MM: 69 ± 1 (orange)
### Key Observations
1. **Highest Accuracies**:
- TTPD and LR models achieve the highest accuracies in "Affirmative German" (87–88%) and "Negated German" (88–91%).
- "Negated German" under LR (91 ± 2) is the highest value overall.
2. **Lowest Accuracies**:
- "Disjunctions" under LR (63 ± 3) and "counterfact_true_false" under MM (69 ± 1) are the lowest.
3. **Variability**:
- CCS shows the highest standard deviations (e.g., ±17 in "Affirmative German" and "Negated German"), indicating greater inconsistency.
- TTPD and LR have the smallest standard deviations (e.g., ±0–±3), suggesting more stable performance.
4. **Color Correlation**:
- Yellow cells (highest values) dominate for TTPD and LR, while CCS and MM have more orange/light orange cells (lower values).
### Interpretation
The data suggests that **TTPD and LR models outperform CCS and MM** across most categories, particularly in German-related tasks ("Affirmative German" and "Negated German"). The **CCS model exhibits the highest variability**, as evidenced by its larger standard deviations, which may indicate instability or sensitivity to input perturbations. The **lowest accuracies** for "Disjunctions" and "counterfact_true_false" highlight potential weaknesses in handling logical negation or hypothetical scenarios. The **standard deviations** (e.g., ±17 for CCS in "Affirmative German") suggest that some models are less reliable in specific contexts, which could be critical for applications requiring consistent performance. The heatmap underscores the importance of model selection based on task-specific requirements, with TTPD and LR being more robust for the evaluated categories.