## Heatmap: Classification Accuracies
### Overview
The image is a heatmap visualizing classification accuracies across four models (TTPD, LR, CCS, MM) and 12 categories (e.g., cities_de, neg_cities_de, sp_en_trans_de, etc.). Accuracy values range from 0.0 to 1.0, with colors transitioning from purple (low accuracy) to yellow (high accuracy). Uncertainty (± values) is provided for each data point.
---
### Components/Axes
- **X-axis (Models)**: TTPD, LR, CCS, MM (left to right).
- **Y-axis (Categories)**: 12 rows labeled as:
- cities_de
- neg_cities_de
- sp_en_trans_de
- neg_sp_en_trans_de
- inventors_de
- neg_inventors_de
- animal_class_de
- neg_animal_class_de
- element_symb_de
- neg_element_symb_de
- facts_de
- neg_facts_de
- **Legend**: Vertical colorbar on the right, mapping colors to accuracy values (0.0–1.0). Purple = 0.0–0.4, orange = 0.6–0.8, yellow = 0.8–1.0.
- **Text Labels**: Each cell contains a value (e.g., "100 ± 0") and a color gradient reflecting accuracy.
---
### Detailed Analysis
#### Model Performance by Category:
1. **TTPD**:
- Highest accuracy in **cities_de** (100 ± 0, yellow).
- **neg_cities_de**: 99 ± 1 (yellow).
- **sp_en_trans_de**: 91 ± 2 (orange).
- **neg_sp_en_trans_de**: 35 ± 2 (purple, lowest accuracy).
- **inventors_de**: 87 ± 2 (orange).
- **neg_inventors_de**: 64 ± 2 (red-orange).
- **animal_class_de**: 85 ± 1 (orange).
- **neg_animal_class_de**: 71 ± 3 (orange).
- **element_symb_de**: 88 ± 2 (yellow).
- **neg_element_symb_de**: 75 ± 2 (orange).
- **facts_de**: 72 ± 1 (orange).
- **neg_facts_de**: 71 ± 2 (orange).
2. **LR**:
- **cities_de**: 90 ± 10 (orange).
- **neg_cities_de**: 95 ± 9 (yellow).
- **sp_en_trans_de**: 82 ± 8 (orange).
- **neg_sp_en_trans_de**: 85 ± 6 (orange).
- **inventors_de**: 77 ± 7 (orange).
- **neg_inventors_de**: 76 ± 6 (orange).
- **animal_class_de**: 82 ± 6 (orange).
- **neg_animal_class_de**: 81 ± 3 (orange).
- **element_symb_de**: 88 ± 5 (yellow).
- **neg_element_symb_de**: 79 ± 6 (orange).
- **facts_de**: 69 ± 5 (red-orange).
- **neg_facts_de**: 71 ± 7 (orange).
3. **CCS**:
- **cities_de**: 92 ± 18 (yellow).
- **neg_cities_de**: 92 ± 18 (yellow).
- **sp_en_trans_de**: 80 ± 21 (orange).
- **neg_sp_en_trans_de**: 79 ± 18 (orange).
- **inventors_de**: 79 ± 16 (orange).
- **neg_inventors_de**: 81 ± 18 (orange).
- **animal_class_de**: 79 ± 14 (orange).
- **neg_animal_class_de**: 76 ± 14 (orange).
- **element_symb_de**: 80 ± 17 (orange).
- **neg_element_symb_de**: 80 ± 15 (orange).
- **facts_de**: 69 ± 12 (red-orange).
- **neg_facts_de**: 68 ± 12 (red-orange).
4. **MM**:
- **cities_de**: 100 ± 1 (yellow).
- **neg_cities_de**: 100 ± 0 (yellow).
- **sp_en_trans_de**: 93 ± 1 (yellow).
- **neg_sp_en_trans_de**: 36 ± 2 (purple, lowest accuracy).
- **inventors_de**: 80 ± 1 (orange).
- **neg_inventors_de**: 68 ± 2 (red-orange).
- **animal_class_de**: 85 ± 1 (yellow).
- **neg_animal_class_de**: 70 ± 0 (orange).
- **element_symb_de**: 75 ± 1 (orange).
- **neg_element_symb_de**: 68 ± 2 (red-orange).
- **facts_de**: 70 ± 1 (orange).
- **neg_facts_de**: 68 ± 3 (red-orange).
---
### Key Observations
1. **TTPD and MM** consistently outperform LR and CCS, with TTPD achieving perfect accuracy (100 ± 0) in **cities_de** and **neg_cities_de**.
2. **neg_sp_en_trans_de** is the weakest category across all models, with accuracies as low as 35 ± 2 (TTPD) and 36 ± 2 (MM).
3. **CCS** exhibits the highest uncertainty (±18–21) in most categories, suggesting less reliable predictions.
4. **neg_inventors_de** and **neg_element_symb_de** show lower accuracy (64–75 range) compared to their positive counterparts.
5. **Facts**-related categories (facts_de, neg_facts_de) have moderate accuracy (68–72 range) across models.
---
### Interpretation
- **Model Strengths**: TTPD and MM excel in **positive categories** (e.g., cities_de, animal_class_de) but struggle with **negative or complex categories** (e.g., neg_sp_en_trans_de). This suggests potential biases in training data or model architecture favoring straightforward patterns.
- **CCS Limitations**: High uncertainty (±18–21) in CCS indicates instability, possibly due to overfitting or insufficient regularization.
- **Negative Categories**: Lower accuracies in neg_*-de categories (e.g., neg_cities_de: 99 ± 1 vs. cities_de: 100 ± 0) imply challenges in handling negated or inverse relationships.
- **Element Symb and Facts**: Moderate performance in symbolic and factual categories (70–88 range) suggests these domains require specialized handling.
The data highlights trade-offs between model complexity (TTPD/MM) and robustness (LR/CCS), with negative categories remaining a persistent challenge.