## Heatmap: Syllogism Format Validity Across Language Combinations
### Overview
The image is a heatmap visualizing the number of predicted VALID outcomes for various syllogism formats across four language combinations: zh+ (Chinese positive), zh- (Chinese negative), en+ (English positive), and en- (English negative). The color intensity corresponds to the count of valid predictions, with darker colors indicating lower values (55-60) and lighter colors indicating higher values (90-100). A red horizontal line at "EIO-4" divides the chart.
---
### Components/Axes
- **Y-Axis (Rows)**: Syllogism formats (e.g., AAA-1, EAE-1, AII-1, EIO-1, EAE-2, AEE-2, EIO-2, AOO-2, AII-3, IAI-3, OAO-3, EIO-3, AEE-4, IAI-4, EIO-4, AAII-1, EAO-1, AEO-2, EAO-2, AAII-3, EAO-3, AAII-4, AEO-4, EAO-4).
- **X-Axis (Columns)**: Language combinations: zh+, zh-, en+, en-.
- **Color Scale**: Ranges from 55 (dark purple) to 100 (yellow), representing the number of predicted VALID outcomes.
- **Legend**: Located on the right, mapping colors to counts.
- **Red Line**: Horizontal line at "EIO-4" (row 15), separating the chart into two regions.
---
### Detailed Analysis
1. **Syllogism Formats**:
- **High Validity (Light Yellow)**:
- Formats like AAA-1, EAE-1, AII-1, EIO-1, EAE-2, AEE-2, EIO-2, AOO-2, AII-3, IAI-3, OAO-3, EIO-3, AEE-4, IAI-4, EIO-4, AAII-1, EAO-1, AEO-2, EAO-2, AAII-3, EAO-3, and EAO-4 show consistently high validity (90-100) across most language combinations.
- **Moderate Validity (Orange/Red)**:
- Formats like AOO-2, EIO-2, and EIO-4 exhibit moderate validity (75-85) in specific language pairs (e.g., en+ and en- for EIO-2).
- **Low Validity (Dark Purple/Black)**:
- Formats like AAII-4, AEO-4, and EIO-4 show very low validity (55-60) in zh+ and zh- combinations. EIO-4 also has low validity in en+ and en-.
2. **Language Combinations**:
- **zh+ and zh-**:
- Most formats perform well (90-100), except AAII-4, AEO-4, and EIO-4, which have low validity (55-60).
- **en+ and en-**:
- Generally high validity (90-100) for most formats, but EIO-2 and EIO-4 show moderate validity (75-85).
3. **Red Line at EIO-4**:
- The red line at "EIO-4" (row 15) may indicate a threshold or a critical point where validity drops significantly for certain language pairs (e.g., zh+ and zh-).
---
### Key Observations
1. **Consistent High Performance**:
- Formats like AAA-1, EAE-1, and AII-1 maintain high validity (90-100) across all language combinations.
2. **Language-Specific Trends**:
- Chinese (zh+) and English (en+) combinations generally show higher validity than their negative counterparts (zh- and en-).
- EIO-4 and AAII-4 are outliers with low validity in zh+ and zh-.
3. **Red Line Significance**:
- The red line at EIO-4 may highlight a critical format where validity is context-dependent, particularly in Chinese language pairs.
---
### Interpretation
The heatmap reveals that syllogism formats with simpler structures (e.g., AAA-1, EAE-1) perform consistently well across languages, suggesting robustness in prediction. Formats like EIO-4 and AAII-4, however, show significant drops in validity for Chinese language pairs, indicating potential challenges in handling negations or complex logical structures in these contexts. The red line at EIO-4 may serve as a visual marker for formats requiring further investigation or optimization. The data underscores the importance of language-specific tuning for syllogism prediction models, particularly for formats sensitive to negation or syntactic complexity.