## Heatmap: Chance of Reporting a Trigger as the Real One
### Overview
This heatmap visualizes the probability of different models correctly identifying specific triggers as "real" across various categories. Values range from 0.00 (no chance) to 1.00 (certainty), with darker red indicating higher probabilities.
### Components/Axes
- **X-axis (Models)**:
`apples`, `instruments`, `elements`, `gods`, `real-world`, `win2844`, `naekoko`, `rereree`
- **Y-axis (Triggers)**:
`apple varieties`, `musical instruments`, `chemical elements`, `Greek gods`, `|REAL-WORLD|`, `(win2844)`, `___ Naekoko ___`, `---Re Re Re---`
- **Color Scale**:
Light orange (low probability) to dark red (high probability). No explicit legend, but intensity correlates with value magnitude.
### Detailed Analysis
#### Row-by-Row Breakdown:
1. **apple varieties**:
- Highest: `naekoko` (0.97)
- Lowest: `real-world` (0.36)
- Values: 0.69 (apples), 0.54 (instruments), 0.65 (elements), 0.45 (gods), 0.36 (real-world), 0.58 (win2844), 0.97 (naekoko), 0.51 (rereree).
2. **musical instruments**:
- Highest: `instruments` (0.65)
- Lowest: `gods` (0.21)
- Values: 0.73 (apples), 0.65 (instruments), 0.47 (elements), 0.21 (gods), 0.33 (real-world), 0.50 (win2844), 0.72 (naekoko), 0.72 (rereree).
3. **chemical elements**:
- Highest: `elements` (0.84)
- Lowest: `apples` (0.18)
- Values: 0.18 (apples), 0.02 (instruments), 0.84 (elements), 0.19 (gods), 0.30 (real-world), 0.52 (win2844), 0.36 (naekoko), 0.29 (rereree).
4. **Greek gods**:
- Highest: `gods` (0.82)
- Lowest: `real-world` (0.50)
- Values: 0.86 (apples), 0.60 (instruments), 0.60 (elements), 0.50 (gods), 0.82 (real-world), 0.50 (win2844), 0.83 (naekoko), 0.65 (rereree).
5. **|REAL-WORLD|**:
- All values near 0 except `real-world` (0.06) and `rereree` (0.02).
6. **(win2844)**:
- Perfect match in `win2844` model (1.00).
- Other values: 0.50 (apples), 0.31 (instruments), 0.01 (gods), 0.41 (real-world), 0.71 (naekoko), 0.34 (rereree).
7. **___ Naekoko ___**:
- Highest: `naekoko` (0.92)
- Lowest: `instruments` (0.00)
- Values: 0.50 (apples), 0.00 (instruments), 0.02 (elements), 0.00 (gods), 0.26 (real-world), 0.05 (win2844), 0.92 (naekoko), 0.02 (rereree).
8. **---Re Re Re---**:
- Perfect match in `rereree` model (1.00).
- Other values: 0.16 (apples), 0.04 (instruments), 0.00 (elements), 0.06 (gods), 0.28 (real-world), 0.00 (win2844), 0.60 (naekoko).
### Key Observations
- **Model-Specific Excellence**:
- `win2844` perfectly identifies `(win2844)` (1.00).
- `rereree` perfectly identifies `---Re Re Re---` (1.00).
- `naekoko` excels at `___ Naekoko ___` (0.92).
- **General Trends**:
- Models perform best on triggers matching their names (e.g., `instruments` model scores 0.65 for `musical instruments`).
- `|REAL-WORLD|` trigger is poorly recognized across all models (max 0.06).
- **Anomalies**:
- `chemical elements` trigger has extreme variability (0.02 in `instruments` vs. 0.84 in `elements`).
- `Greek gods` trigger shows inconsistent performance (0.50 in `elements` vs. 0.82 in `real-world`).
### Interpretation
This heatmap reveals **model-specific biases** in trigger recognition. Models like `win2844` and `rereree` demonstrate near-perfect recall for their namesake triggers, suggesting specialized training or design. Conversely, the `|REAL-WORLD|` trigger is universally underperforming, indicating a potential gap in grounding abstract concepts. The `naekoko` model’s high score for `___ Naekoko ___` (0.92) implies strong contextual alignment, while its near-zero performance on `instruments` suggests limited cross-category generalization.
The data underscores the importance of **trigger-model alignment** in applications like NLP or AI systems, where specificity and contextual awareness are critical. Outliers like the `chemical elements` trigger highlight the need for domain-specific tuning.