## Scatter Plots: Top 10 Safety Heads on Undiff Attn. and Scaling Cont.
### Overview
The image contains two scatter plots side-by-side. Both plots show the relationship between "Layer" (x-axis) and "Head" (y-axis) for the top 10 safety heads. The left plot is titled "Top 10 Safety Heads on Undiff Attn." and the right plot is titled "Top 10 Safety Heads on Scaling Cont.". Each plot displays data for two models: "Llama-2-7b-chat" (represented by purple circles) and "Vicuna-7b-v1.5" (represented by yellow crosses). A color bar on the right side of each plot indicates "Generalized Ships" values, with the color of each data point corresponding to its "Generalized Ships" value.
### Components/Axes
**Left Plot (Undiff Attn.):**
* **Title:** Top 10 Safety Heads on Undiff Attn.
* **X-axis:** Layer, with ticks from 0 to 30 in increments of 2.
* **Y-axis:** Head, with ticks from 0 to 30 in increments of 2.
* **Legend (top-right):**
* Purple circle: Llama-2-7b-chat
* Yellow cross: Vicuna-7b-v1.5
* **Color Bar (right):** Generalized Ships, ranging from approximately 0 to 70.
**Right Plot (Scaling Cont.):**
* **Title:** Top 10 Safety Heads on Scaling Cont.
* **X-axis:** Layer, with ticks from 0 to 30 in increments of 2.
* **Y-axis:** Head, with ticks from 0 to 30 in increments of 2.
* **Legend (top-right):**
* Purple circle: Llama-2-7b-chat
* Yellow cross: Vicuna-7b-v1.5
* **Color Bar (right):** Generalized Ships, ranging from approximately 0 to 20.
### Detailed Analysis
**Left Plot (Undiff Attn.):**
* **Llama-2-7b-chat (Purple Circles):**
* Layer ~ 1, Head ~ 2, Generalized Ships ~ 10
* Layer ~ 2, Head ~ 8, Generalized Ships ~ 15
* Layer ~ 3, Head ~ 6, Generalized Ships ~ 10
* Layer ~ 3, Head ~ 15, Generalized Ships ~ 25
* Layer ~ 1, Head ~ 29, Generalized Ships ~ 50
* Layer ~ 26, Head ~ 26, Generalized Ships ~ 60
* **Vicuna-7b-v1.5 (Yellow Crosses):**
* Layer ~ 1, Head ~ 8, Generalized Ships ~ 70
* Layer ~ 2, Head ~ 7, Generalized Ships ~ 60
* Layer ~ 2, Head ~ 6, Generalized Ships ~ 60
* Layer ~ 3, Head ~ 1, Generalized Ships ~ 50
* Layer ~ 3, Head ~ 26, Generalized Ships ~ 60
* Layer ~ 6, Head ~ 6, Generalized Ships ~ 40
**Right Plot (Scaling Cont.):**
* **Llama-2-7b-chat (Purple Circles):**
* Layer ~ 0, Head ~ 15, Generalized Ships ~ 15
* Layer ~ 0, Head ~ 23, Generalized Ships ~ 15
* Layer ~ 0, Head ~ 27, Generalized Ships ~ 15
* Layer ~ 1, Head ~ 8, Generalized Ships ~ 15
* Layer ~ 1, Head ~ 13, Generalized Ships ~ 15
* Layer ~ 13, Head ~ 1, Generalized Ships ~ 5
* Layer ~ 13, Head ~ 4, Generalized Ships ~ 5
* **Vicuna-7b-v1.5 (Yellow Crosses):**
* Layer ~ 1, Head ~ 15, Generalized Ships ~ 15
* Layer ~ 5, Head ~ 15, Generalized Ships ~ 15
* Layer ~ 16, Head ~ 0, Generalized Ships ~ 0
* Layer ~ 21, Head ~ 8, Generalized Ships ~ 10
### Key Observations
* In the "Undiff Attn." plot, Llama-2-7b-chat has a few heads with high "Head" values (around 26 and 29) and high "Generalized Ships" values (around 50-60), while most of its other heads are clustered at lower "Head" and "Layer" values. Vicuna-7b-v1.5 heads are more clustered at lower "Head" values (below 10) but have relatively high "Generalized Ships" values (40-70).
* In the "Scaling Cont." plot, Llama-2-7b-chat heads are primarily clustered at the beginning of the layers (Layer 0 and 1) with "Head" values between 8 and 27, and "Generalized Ships" values around 15. Vicuna-7b-v1.5 heads are more spread out across the layers, with lower "Generalized Ships" values.
### Interpretation
The plots compare the top 10 safety heads of two language models, Llama-2-7b-chat and Vicuna-7b-v1.5, under two different conditions: "Undiff Attn." and "Scaling Cont.". The "Generalized Ships" value, represented by the color of the data points, seems to indicate some measure of safety or generalization capability.
The "Undiff Attn." plot suggests that Llama-2-7b-chat has a few specific heads that are highly active and contribute significantly to safety (high "Head" and "Generalized Ships" values), while Vicuna-7b-v1.5 distributes its safety-related attention more evenly across multiple heads, albeit with lower individual head activation.
The "Scaling Cont." plot shows that Llama-2-7b-chat relies heavily on the initial layers for safety-related computations, while Vicuna-7b-v1.5 distributes this responsibility across more layers. The lower "Generalized Ships" values in this plot compared to the "Undiff Attn." plot might indicate a different scaling behavior or a different definition of "safety" under this condition.