## Heatmap: Model Performance Metrics Across Categories
### Overview
The image is a heatmap comparing model performance metrics (AUROC, t_g, t_p, d_LR) across 12 categories. Values range from 0.0 (red) to 1.0 (yellow), with intermediate shades indicating intermediate performance. The heatmap reveals category-specific performance patterns, with most metrics showing high values (0.9+), but notable exceptions in "neg_" prefixed categories.
### Components/Axes
- **Columns**:
- `t_g` (True Positive Rate)
- `t_p` (Precision)
- `d_LR` (Logistic Regression Discrimination)
- **Rows**: 12 categories (e.g., cities, neg_cities, inventors, neg_inventors, etc.)
- **Legend**: Vertical colorbar on the right, labeled "AUROC" with a gradient from red (0.0) to yellow (1.0).
### Detailed Analysis
1. **t_g (True Positive Rate)**:
- All categories score ≥0.91, with most at 1.00.
- Exceptions:
- `inventors` (0.94)
- `neg_element_symb` (0.96)
- `neg_facts` (0.91)
2. **t_p (Precision)**:
- Most categories score 1.00, but "neg_" categories show significant drops:
- `neg_cities` (0.00)
- `neg_sp_en_trans` (0.00)
- `neg_inventors` (0.07)
- `neg_animal_class` (0.02)
- `neg_element_symb` (0.00)
- `neg_facts` (0.14)
3. **d_LR (Logistic Regression Discrimination)**:
- All categories score ≥0.92, with most at 1.00.
- Exceptions:
- `inventors` (0.93)
- `neg_inventors` (0.97)
- `neg_facts` (0.92)
### Key Observations
- **High Overall Performance**: Most categories achieve near-perfect scores (1.00) across all metrics, indicating strong model generalization.
- **Neg_Category Performance Degradation**:
- `neg_cities`, `neg_sp_en_trans`, and `neg_element_symb` show **zero precision** (t_p = 0.00), suggesting the model fails to distinguish negatives in these cases.
- `neg_facts` has the lowest t_p (0.14) and d_LR (0.92), indicating weaker discrimination for negative facts.
- **Inventor Category**:
- `inventors` has slightly reduced t_g (0.94) and d_LR (0.93), but `neg_inventors` maintains high t_g (0.97) despite low t_p (0.07), suggesting asymmetric performance.
### Interpretation
The heatmap reveals a model optimized for positive class identification (high t_g and d_LR) but struggles with negative class precision in specific domains. The near-zero t_p for `neg_cities` and `neg_sp_en_trans` implies the model cannot reliably identify non-cities or non-translated entities, respectively. The `neg_inventors` category shows a paradox: high t_g (0.97) but low t_p (0.07), suggesting the model detects inventors well but fails to confirm their absence. The `neg_facts` category’s low t_p (0.14) and d_LR (0.92) highlight a critical weakness in distinguishing factual from non-factual negatives. These patterns suggest the model may overfit to positive examples or lack sufficient negative training data for certain categories.