# Technical Data Extraction: Model Representation and Control Analysis
This document provides a comprehensive extraction of data and trends from the provided 4x3 grid of line charts. The image analyzes model performance and control effects across four datasets (Moral, True-false, Sycophancy, Happy-sad) and three experimental conditions (Report, Explicit, Implicit).
---
## 1. Image Structure and Global Labels
The image is organized into a grid with the following headers:
* **Column Headers (Datasets):** Moral, True-false, Sycophancy, Happy-sad.
* **Row Headers (Conditions):** Report, Explicit, Implicit.
---
## 2. Row 1: Report (Cross-entropy vs. # Examples)
This row contains four line charts measuring model performance.
### Common Axis and Legend Information
* **Y-axis:** Cross-entropy (Scale: 0.4 to 1.2, except Sycophancy/Happy-sad which start at 0.0).
* **X-axis:** # Examples (Scale: 0 to 600).
* **Legend (Top Right of each plot):**
* **LR (Red):** Linear Regression baseline.
* **PC1 (Yellow-Green):** Principal Component 1.
* **PC2 (Light Green):** Principal Component 2.
* **PC4 (Medium Green):** Principal Component 4.
* **PC8 (Teal):** Principal Component 8.
* **PC32 (Blue-Grey):** Principal Component 32.
* **PC128 (Dark Blue):** Principal Component 128.
* **PC512 (Purple):** Principal Component 512.
### Data Trends and Observations
* **General Trend:** In all four datasets, cross-entropy decreases rapidly as the number of examples increases from 0 to 100, then plateaus.
* **Performance Hierarchy:** The **LR (Red)** line consistently achieves the lowest cross-entropy (best performance), followed by lower-order PCs. As the PC number increases (e.g., PC512), the cross-entropy remains significantly higher, indicating that higher-order components contain less relevant information for the task.
* **Sycophancy/Happy-sad Specifics:** In these tasks, the gap between LR and PC512 is much wider than in the Moral/True-false tasks. For Sycophancy, LR drops nearly to 0.0, while PC512 plateaus around 0.7.
---
## 3. Row 2: Explicit (Control effect (d) vs. Layer)
This row measures the "Control effect (d)" when the task is explicitly prompted.
### Common Axis and Legend Information
* **Y-axis:** Control effect (d).
* **X-axis:** Layer (Scale varies: 1 to 32 for Moral/True-false; 1 to 28 for Sycophancy/Happy-sad).
* **Legend (Top Left):**
* **LR (Red):** Linear Regression.
* **Early PCs (Blue):** Aggregated early principal components.
* **Late PCs (Green):** Aggregated late principal components.
* **Visual Note:** Lines include shaded regions representing confidence intervals or variance.
### Data Trends and Observations
| Dataset | LR (Red) Trend | Early PCs (Blue) Trend | Late PCs (Green) Trend |
| :--- | :--- | :--- | :--- |
| **Moral** | Peaks at Layer 24 (~5.5d). | Peaks at Layer 24 (~2.5d). | Flat near 0. |
| **True-false** | Peaks lower (~1.5d); dip between layers 1-8. | Outperforms LR; peaks at Layer 24 (~3.0d). | Flat near 0. |
| **Sycophancy** | Rises steadily to peak at Layer 21 (~4.2d). | Rises to ~2.5d. | Flat near 0. |
| **Happy-sad** | Strong upward trend, reaching ~7.5d by Layer 28. | Modest rise to ~2.0d. | Flat near 0. |
---
## 4. Row 3: Implicit (Control effect (d) vs. Layer)
This row measures the "Control effect (d)" when the task is implicitly prompted.
### Common Axis and Legend Information
* **Y-axis:** Control effect (d).
* **X-axis:** Layer (Scale matches Row 2).
* **Legend (Top Left):** Same as Row 2 (LR: Red, Early PCs: Blue, Late PCs: Green).
* **Visual Note:** A grey rectangular block is present at the start of the x-axis (Layers 1-8), indicating a baseline or "no effect" zone.
### Data Trends and Observations
* **General Trend:** Control effects are significantly lower in the Implicit condition compared to the Explicit condition across all datasets.
* **Moral:** Effects remain at 0 until Layer 8. LR (Red) then rises sharply to ~2.2d at Layer 24. Early PCs (Blue) rise to ~0.8d.
* **True-false:** Very low magnitude. LR (Red) reaches ~0.7d; Early PCs (Blue) reach ~0.5d.
* **Sycophancy:** LR (Red) and Early PCs (Blue) both rise after Layer 14, with LR reaching ~0.8d and Early PCs reaching ~0.5d.
* **Happy-sad:** LR (Red) shows the most significant implicit effect, rising after Layer 8 to reach ~1.6d at Layer 28. Early PCs (Blue) reach ~0.5d.
* **Late PCs (Green):** In all Implicit charts, the Late PCs line remains consistently at or near 0.0 across all layers.