Image c3ea5651e214...

EXPERT: healer-alpha-free VERSION 1

RUNTIME: free/openrouter/healer-alpha
INTEL_VERIFIED
## Heatmap: GPT-2 Attention Head Analysis

### Overview
The image displays two side-by-side heatmaps visualizing the "Name Copying score" for attention heads across the 12 layers (0-11) of a GPT-2 model. The left heatmap is titled "GPT-2: Name-Copying heads," and the right heatmap is titled "GPT-2: (Suppression) Name-Copying heads." Both charts use a color scale from dark purple (score 0.0) to bright yellow (score 1.0) to represent the strength of the measured property. Specific heads are annotated with symbols based on classifications from a source labeled "'Interp. in the Wild' classifications."

### Components/Axes
**Common Elements (Both Heatmaps):**
*   **X-axis:** Labeled "Layer," with markers from 0 to 11.
*   **Y-axis:** Labeled "Head," with markers from 0 to 11.
*   **Color Bar:** Located on the right side of each heatmap. The scale is labeled "Name Copying score" (left) and "(Suppression) Name Copying score" (right). The scale ranges from 0.0 (dark purple) to 1.0 (bright yellow), with intermediate ticks at 0.2, 0.4, 0.6, and 0.8.
*   **Legend:** Positioned in the top-left corner of each heatmap's plotting area, with a light gray background.

**Left Heatmap Specifics:**
*   **Title:** "GPT-2: Name-Copying heads"
*   **Legend Content:**
    *   Header: `'Interp. in the Wild' classifications`
    *   Symbol `×`: `Name-Mover Heads`
    *   Symbol `●`: `Backup Name-Mover Heads`

**Right Heatmap Specifics:**
*   **Title:** "GPT-2: (Suppression) Name-Copying heads"
*   **Legend Content:**
    *   Header: `'Interp. in the Wild' classifications`
    *   Symbol `×`: `(Negative) Name-Mover Heads`
    *   Symbol `●`: `Backup (Negative) Name-Mover Heads`

### Detailed Analysis
**Left Heatmap: Name-Copying Heads**
*   **Trend:** High scores (yellow/green) are concentrated in the later layers (Layers 8, 9, 10, 11), with a few notable heads in earlier layers. Most of the grid is dark purple (score ~0.0).
*   **Annotated Heads & Approximate Scores:**
    *   **Layer 9, Head 0:** Marked with `×` (Name-Mover Head). Score is bright yellow (~1.0).
    *   **Layer 9, Head 6:** Marked with `×` (Name-Mover Head). Score is bright yellow-green (~0.8-0.9).
    *   **Layer 9, Head 9:** Marked with `×` (Name-Mover Head). Score is bright yellow (~1.0).
    *   **Layer 10, Head 0:** Marked with `●` (Backup Name-Mover Head). Score is dark blue (~0.2-0.3).
    *   **Layer 10, Head 2:** Marked with `●` (Backup Name-Mover Head). Score is bright yellow-green (~0.8-0.9).
    *   **Layer 10, Head 6:** Marked with `●` (Backup Name-Mover Head). Score is teal (~0.5-0.6).
    *   **Layer 10, Head 10:** Marked with `●` (Backup Name-Mover Head). Score is teal (~0.5-0.6).
    *   **Layer 11, Head 2:** Marked with `●` (Backup Name-Mover Head). Score is bright yellow-green (~0.8-0.9).
    *   **Layer 11, Head 9:** Marked with `●` (Backup Name-Mover Head). Score is dark purple-blue (~0.1-0.2).
*   **Other Notable High-Score Cells (Not Annotated):**
    *   Layer 6, Head 4: Teal (~0.5).
    *   Layer 7, Head 11: Teal (~0.5).
    *   Layer 8, Head 8: Teal-green (~0.6-0.7).
    *   Layer 9, Head 7: Bright yellow-green (~0.8-0.9).
    *   Layer 10, Head 7: Bright yellow-green (~0.8-0.9).
    *   Layer 11, Head 8: Bright yellow-green (~0.8-0.9).

**Right Heatmap: (Suppression) Name-Copying Heads**
*   **Trend:** This heatmap is overwhelmingly dark purple (score ~0.0), indicating very low suppression scores for nearly all heads. Only three specific cells show any significant color.
*   **Annotated Heads & Approximate Scores:**
    *   **Layer 10, Head 7:** Marked with `×` ((Negative) Name-Mover Head). Score is bright yellow-green (~0.8-0.9).
    *   **Layer 11, Head 10:** Marked with `×` ((Negative) Name-Mover Head). Score is bright yellow-green (~0.8-0.9).
    *   **Layer 9, Head 7:** Marked with `●` (Backup (Negative) Name-Mover Head). Score is dark purple (~0.0-0.1).
*   **Other Notable Cell (Not Annotated):**
    *   Layer 5, Head 11: Very faint dark blue/purple, score slightly above 0.0 (~0.05-0.1).

### Key Observations
1.  **Sparsity of Function:** The vast majority of attention heads in both charts have a Name Copying score near zero. Functional heads (high score or annotated) are sparse.
2.  **Layer Specialization:** For positive name copying (left chart), functional heads are heavily concentrated in the final four layers (8-11). Suppression heads (right chart) are even more localized, appearing only in layers 10 and 11.
3.  **Distinct Head Roles:** The annotations distinguish between primary "Name-Mover" heads (`×`) and "Backup" heads (`●`). In the left chart, the highest-scoring heads (brightest yellow) are primarily the annotated Name-Mover heads in Layer 9.
4.  **Contrast Between Tasks:** The right heatmap for suppression is almost entirely blank compared to the left, suggesting that the "suppression" function is performed by a much smaller, more specific set of heads than the general "name copying" function.

### Interpretation
This visualization provides a mechanistic interpretability analysis of GPT-2, pinpointing which specific attention heads are responsible for the model's ability to copy names from a prompt into its output.

*   **What the data suggests:** The model delegates the core task of name copying to a small committee of heads located in its final processing layers. The presence of both "Name-Mover" and "Backup" heads suggests a degree of redundancy or specialization within this committee. The suppression task appears to be an even more specialized sub-function, handled by a tiny subset of heads in the very last layers.
*   **How elements relate:** The layer axis represents the depth of processing in the neural network. The concentration of activity in later layers indicates that name copying is a high-level, late-stage operation, likely occurring after the model has processed the semantic context of the sentence. The head axis represents parallel processing units within each layer; the sparse activation shows that only a few of these parallel units are recruited for this specific task.
*   **Notable patterns/anomalies:** The most striking pattern is the stark contrast between the two heatmaps. It implies that while many heads may contribute weakly to copying (left chart), actively *suppressing* incorrect names (right chart) is a more precise operation performed by very few heads. The single annotated Backup (Negative) head in Layer 9, Head 7 has a near-zero score, which may indicate a misclassification by the "'Interp. in the Wild'" method or a head whose suppressive function is not captured well by this specific scoring metric.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

c3ea5651e21450ff2ce235d3

FOUND IN PAPERS

EXPERT: healer-alpha-free VERSION 1