\n
## Heatmap Pair: AUROC for Projections a^T t
### Overview
The image displays two side-by-side heatmaps visualizing the Area Under the Receiver Operating Characteristic curve (AUROC) scores for a machine learning model's performance. The overall title is "AUROC for Projections a^T t". The heatmaps compare model performance under two different conditions: one with no components projected out (left) and one with components t_G and t_P projected out (right). The data represents performance when training on a base "cities" dataset and testing on various related and unrelated test sets.
### Components/Axes
* **Main Title:** "AUROC for Projections a^T t"
* **Left Heatmap Subtitle:** "Projected out: None"
* **Right Heatmap Subtitle:** "Projected out: t_G and t_P"
* **Y-Axis Label (Shared):** "Test Set"
* **Y-Axis Categories (Top to Bottom):** `cities`, `neg_cities`, `facts`, `neg_facts`, `facts_conj`, `facts_disj`
* **X-Axis Label (Shared):** "Train Set 'cities'"
* **X-Axis Categories (Left to Right):** `cities`, `+ neg_cities`, `+ cities_conj`, `+ cities_disj`
* **Color Bar/Legend (Far Right):** A vertical gradient bar labeled from 0.0 (red) to 1.0 (yellow), indicating the AUROC score scale. Yellow represents perfect classification (1.0), while red represents random performance (0.0).
### Detailed Analysis
The heatmaps are 6x4 grids. Each cell contains a numerical AUROC value. The color of each cell corresponds to its value based on the color bar.
**Left Heatmap (Projected out: None):**
* **Row `cities`:** Values are consistently high: 1.00, 0.99, 0.99, 0.98. The row is uniformly bright yellow.
* **Row `neg_cities`:** Starts lower at 0.79 (orange-yellow), then jumps to high values: 0.99, 0.99, 0.98 (yellow).
* **Row `facts`:** Shows moderately high, stable values: 0.92, 0.93, 0.94, 0.94 (yellow).
* **Row `neg_facts`:** Shows lower values with a slight upward trend: 0.54, 0.78, 0.76, 0.76 (orange to yellow-orange).
* **Row `facts_conj`:** Values are in the mid-range: 0.67, 0.70, 0.72, 0.72 (orange).
* **Row `facts_disj`:** Values are in the mid-range: 0.56, 0.58, 0.60, 0.61 (orange).
**Right Heatmap (Projected out: t_G and t_P):**
* **Row `cities`:** Remains high: 1.00, 0.98, 0.99, 0.98 (yellow).
* **Row `neg_cities`:** Shows a dramatic drop in the first column to 0.02 (deep red), then recovers to high values: 0.98, 0.99, 0.98 (yellow).
* **Row `facts`:** Shows a severe, uniform drop across all columns: 0.23, 0.21, 0.27, 0.27 (red-orange).
* **Row `neg_facts`:** Shows a significant drop: 0.49, 0.36, 0.36, 0.36 (orange-red).
* **Row `facts_conj`:** Shows a drop, with a slight increase in the last two columns: 0.32, 0.31, 0.58, 0.60 (red-orange to orange).
* **Row `facts_disj`:** Shows a drop, with a slight increase in the last two columns: 0.33, 0.38, 0.59, 0.63 (red-orange to orange).
### Key Observations
1. **Performance Collapse for `facts` Test Set:** The most striking observation is the near-total collapse of performance on the `facts` test set in the right heatmap (values ~0.2-0.27) compared to the left (~0.92-0.94). This indicates the model's ability to classify `facts` is almost entirely dependent on the information contained in the projected-out components t_G and t_P.
2. **Selective Impact on `neg_cities`:** Projecting out t_G and t_P catastrophically affects performance on `neg_cities` test set when trained only on `cities`. Performance recovers when the training set is augmented with other data (`+ neg_cities`, etc.).
3. **General Performance Degradation:** For most test sets (`facts`, `neg_facts`, `facts_conj`, `facts_disj`), AUROC scores are uniformly lower in the right heatmap, showing that projecting out t_G and t_P removes information useful for a broad range of tasks.
4. **Stability of `cities` Test Set:** Performance on the `cities` test set itself remains perfect or near-perfect (0.98-1.00) in both conditions, suggesting the core information for this task is not contained in t_G or t_P.
### Interpretation
This analysis investigates the role of specific model components (t_G and t_P) in performing various classification tasks. The "Projected out" condition acts as an ablation study.
* **What the data suggests:** The components t_G and t_P appear to encode information that is **critical for reasoning about "facts"** (both positive and negative) and their logical conjunctions/disjunctions. Their removal devastates performance on these tasks. Conversely, these components seem **largely irrelevant for the basic "cities" task**, as performance on that test set is unaffected.
* **How elements relate:** The heatmaps demonstrate a clear dichotomy. The left map shows the model's baseline capability across tasks when using all its components. The right map reveals a functional specialization: t_G and t_P are a "knowledge bottleneck" for fact-based reasoning. The recovery of performance on `neg_cities` when the training set is augmented suggests alternative pathways for that specific task exist outside of t_G and t_P.
* **Notable anomalies:** The value **0.02** for `neg_cities` in the right heatmap is a critical outlier. It indicates that when trained only on `cities` and deprived of t_G/t_P, the model's predictions on the `neg_cities` test set are worse than random guessing (AUROC < 0.5). This could imply the model is making systematically incorrect predictions, perhaps due to a strong, now-removed, confounding bias.
* **Underlying implication:** The investigation supports a Peircean abductive reasoning line: if removing components t_G and t_P specifically and severely impairs fact-related reasoning, then those components likely contain the representational structures necessary for that type of reasoning. This helps map the internal "geography" of the model's knowledge.