## Scatter Plot: Projection of Activations on t_G and t_P
### Overview
The image displays two side-by-side scatter plots under the main title "Projection of activations on t_G and t_P." The left plot is titled "Affirmative Statements," and the right plot is titled "Negated Statements." Each plot visualizes the relationship between two projected activation values, with data points colored according to a binary label ("True" or "False").
### Components/Axes
* **Main Title:** "Projection of activations on t_G and t_P" (centered at the top).
* **Subplot Titles:**
* Left: "Affirmative Statements"
* Right: "Negated Statements"
* **X-Axis (Both Plots):** Labeled `a_ij^T t_G`. The scale runs from approximately -12 to 2, with major tick marks at intervals of 2 (-12, -10, -8, -6, -4, -2, 0, 2).
* **Y-Axis (Both Plots):** Labeled `a_ij^T t_P`. The scale runs from approximately -14 to 2, with major tick marks at intervals of 2 (-14, -12, -10, -8, -6, -4, -2, 0, 2).
* **Legend:** Located in the bottom-right corner of the "Negated Statements" plot.
* Red dot: "False"
* Blue dot: "True"
### Detailed Analysis
**1. Affirmative Statements (Left Plot):**
* **Trend Verification:** Both data series show a clear positive linear correlation. The cloud of points slopes upward from the bottom-left to the top-right.
* **Data Series - "False" (Red):**
* **Spatial Grounding:** Clustered in the lower-left quadrant of the plot.
* **Approximate Range:** X-values (`a_ij^T t_G`) span from ~ -12 to ~ -4. Y-values (`a_ij^T t_P`) span from ~ -14 to ~ -4.
* **Distribution:** Forms a dense, elongated cluster along a diagonal line.
* **Data Series - "True" (Blue):**
* **Spatial Grounding:** Clustered in the upper-right quadrant, partially overlapping with the upper tail of the "False" cluster.
* **Approximate Range:** X-values span from ~ -6 to ~ 2. Y-values span from ~ -6 to ~ 2.
* **Distribution:** Forms a dense cluster that continues the diagonal trend established by the "False" points but is shifted to higher values on both axes.
**2. Negated Statements (Right Plot):**
* **Trend Verification:** The two data series show markedly different distributions with no single shared trend. The "False" series is widely scattered, while the "True" series forms a tight, near-vertical cluster.
* **Data Series - "False" (Red):**
* **Spatial Grounding:** Scattered across the top-left and central regions of the plot.
* **Approximate Range:** X-values span broadly from ~ -12 to ~ 0. Y-values are concentrated in the upper half, from ~ -4 to ~ 2.
* **Distribution:** Diffuse and cloud-like, with no strong linear correlation. The highest density is around X ≈ -6, Y ≈ -2.
* **Data Series - "True" (Blue):**
* **Spatial Grounding:** Forms a distinct, vertically oriented cluster on the right side of the plot.
* **Approximate Range:** X-values are tightly grouped from ~ -2 to ~ 2. Y-values span a wide vertical range from ~ -12 to ~ 0.
* **Distribution:** A dense, narrow column. There is a clear separation from the "False" cluster along the X-axis.
### Key Observations
1. **Clear Separation by Statement Type:** The relationship between the projected activations (`a_ij^T t_G` and `a_ij^T t_P`) is fundamentally different for affirmative versus negated statements.
2. **Affirmative Statements Show Linear Correlation:** For affirmative statements, the "True" and "False" labels map onto different segments of a single, continuous diagonal trend. Higher projection values on both axes are associated with "True."
3. **Negated Statements Show Orthogonal Clustering:** For negated statements, the "True" and "False" labels form two distinct, non-overlapping clusters. "True" is characterized by a narrow range of `a_ij^T t_G` values but a wide range of `a_ij^T t_P` values. "False" shows the opposite pattern: a wide range of `a_ij^T t_G` but a narrow, high range of `a_ij^T t_P`.
4. **Legend Placement:** The legend is only present in the right subplot but applies to both, as the color coding (Red=False, Blue=True) is consistent.
### Interpretation
This visualization suggests that the model's internal activations, when projected onto the directions `t_G` and `t_P`, encode truth value ("True"/"False") in a manner that is highly dependent on linguistic context (affirmative vs. negated).
* **For Affirmative Statements:** The model appears to use a **single, continuous axis of "truthfulness"** that is a linear combination of the `t_G` and `t_P` projections. Moving along this diagonal from bottom-left to top-right corresponds to a transition from false to true.
* **For Negated Statements:** The model employs a **different, categorical coding scheme**. Truth value is determined by a sharp boundary primarily along the `a_ij^T t_G` axis. Statements projected to the right (higher `a_ij^T t_G`) are classified as "True," while those projected to the left are "False." The `a_ij^T t_P` axis seems to capture a different, independent property for negated statements, as evidenced by the vertical spread of the "True" cluster.
The stark contrast between the two plots indicates that the computational mechanism for evaluating truth is not uniform. The presence of negation fundamentally alters how the model represents and processes the truth value of a statement within these specific activation subspaces. The "Negated Statements" plot, in particular, shows a clean, almost decision-boundary-like separation, which could be indicative of a specific circuit or mechanism the model uses to handle negation.