Image 36abfe92dfdc...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
## Heatmap: Classification Accuracies

### Overview
The image is a heatmap visualizing classification accuracies across four models (TTPD, LR, CCS, MM) for 12 linguistic/categorical tasks. Values are represented as mean accuracies (± standard deviation) with a color gradient from purple (0.0) to yellow (1.0).

### Components/Axes
- **X-axis (Models)**: TTPD, LR, CCS, MM (left to right)
- **Y-axis (Categories)**:
  1. cities_conj
  2. cities_disj
  3. sp_en_trans_conj
  4. sp_en_trans_disj
  5. inventors_conj
  6. inventors_disj
  7. animal_class_conj
  8. animal_class_disj
  9. element_symb_conj
  10. element_symb_disj
  11. facts_conj
  12. facts_disj
  13. common_claim_true_false
  14. counterfact_true_false
- **Legend**: Vertical colorbar on the right (purple = 0.0, yellow = 1.0)

### Detailed Analysis
- **cities_conj**:
  - TTPD: 85 ± 2 (yellow)
  - LR: 83 ± 9 (orange)
  - CCS: 80 ± 6 (orange)
  - MM: 80 ± 1 (orange)
- **cities_disj**:
  - TTPD: 71 ± 0 (orange)
  - LR: 61 ± 6 (red)
  - CCS: 66 ± 5 (red)
  - MM: 68 ± 1 (red)
- **sp_en_trans_conj**:
  - TTPD: 81 ± 0 (orange)
  - LR: 79 ± 3 (orange)
  - CCS: 78 ± 7 (orange)
  - MM: 80 ± 0 (orange)
- **sp_en_trans_disj**:
  - TTPD: 74 ± 1 (orange)
  - LR: 63 ± 6 (red)
  - CCS: 69 ± 8 (red)
  - MM: 73 ± 1 (orange)
- **inventors_conj**:
  - TTPD: 65 ± 1 (red)
  - LR: 64 ± 5 (red)
  - CCS: 65 ± 5 (red)
  - MM: 63 ± 0 (red)
- **inventors_disj**:
  - TTPD: 67 ± 2 (red)
  - LR: 62 ± 8 (red)
  - CCS: 60 ± 10 (red)
  - MM: 68 ± 0 (red)
- **animal_class_conj**:
  - TTPD: 87 ± 2 (yellow)
  - LR: 84 ± 6 (orange)
  - CCS: 81 ± 7 (orange)
  - MM: 82 ± 1 (orange)
- **animal_class_disj**:
  - TTPD: 57 ± 2 (red)
  - LR: 53 ± 2 (red)
  - CCS: 54 ± 3 (red)
  - MM: 52 ± 0 (red)
- **element_symb_conj**:
  - TTPD: 77 ± 0 (orange)
  - LR: 73 ± 8 (red)
  - CCS: 72 ± 7 (red)
  - MM: 73 ± 1 (orange)
- **element_symb_disj**:
  - TTPD: 71 ± 2 (orange)
  - LR: 56 ± 8 (red)
  - CCS: 59 ± 7 (red)
  - MM: 68 ± 2 (orange)
- **facts_conj**:
  - TTPD: 64 ± 1 (red)
  - LR: 60 ± 5 (red)
  - CCS: 61 ± 3 (red)
  - MM: 60 ± 1 (red)
- **facts_disj**:
  - TTPD: 65 ± 1 (red)
  - LR: 54 ± 4 (red)
  - CCS: 59 ± 4 (red)
  - MM: 65 ± 1 (orange)
- **common_claim_true_false**:
  - TTPD: 78 ± 0 (orange)
  - LR: 73 ± 1 (orange)
  - CCS: 74 ± 5 (orange)
  - MM: 75 ± 1 (orange)
- **counterfact_true_false**:
  - TTPD: 72 ± 0 (orange)
  - LR: 73 ± 2 (orange)
  - CCS: 72 ± 5 (orange)
  - MM: 66 ± 1 (orange)

### Key Observations
1. **Model Performance**:
   - TTPD and LR consistently outperform CCS and MM in conjunction tasks (e.g., cities_conj, animal_class_conj).
   - Disjunction tasks (e.g., cities_disj, sp_en_trans_disj) show lower accuracies across all models.
2. **Standard Deviation**:
   - CCS exhibits higher variability (e.g., inventors_disj: ±10), while TTPD often has lower variability (e.g., cities_disj: ±0).
3. **Outliers**:
   - inventors_disj (CCS: 60 ± 10) and animal_class_disj (MM: 52 ± 0) represent the lowest accuracies.
   - animal_class_conj (TTPD: 87 ± 2) and cities_conj (TTPD: 85 ± 2) are the highest-performing tasks.

### Interpretation
The data suggests that **TTPD and LR models excel in conjunction-based tasks** (e.g., cities_conj, animal_class_conj), achieving near-optimal accuracies (80-87%). Disjunction tasks (e.g., cities_disj, sp_en_trans_disj) are more challenging, with accuracies dropping to 50-70%. The **CCS model shows the greatest inconsistency**, particularly in inventors_disj (±10), while TTPD demonstrates the most stable performance (e.g., cities_disj: ±0). The **MM model performs comparably to TTPD/LR in conjunction tasks** but lags in disjunction tasks. This pattern implies that conjunction tasks may rely on simpler linguistic features, while disjunction tasks require more complex reasoning, which current models struggle to handle reliably.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

36abfe92dfdcefff8b2e9a22

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1