## Heatmap: Average JS Divergence Across Layers and Categories
### Overview
The image is a heatmap visualizing the average JS divergence across three categories ("Subj.", "Attn.", "Last.") and 31 layers (0–30). The color intensity corresponds to divergence values, with darker blue indicating higher divergence (0.6) and lighter blue indicating lower divergence (0.1). The legend on the right maps color intensity to divergence values.
---
### Components/Axes
- **Y-Axis (Categories)**:
- "Subj." (Subject)
- "Attn." (Attention)
- "Last." (Last)
- **X-Axis (Layers)**:
- Labeled "Layer" with integer values from 0 to 30.
- **Legend**:
- Positioned on the right, labeled "Avg JS Divergence."
- Color gradient: Dark blue (0.6) to light blue (0.1).
---
### Detailed Analysis
1. **"Subj." (Subject)**:
- Layers 0–14: Dark blue (high divergence, ~0.5–0.6).
- Layers 15–30: Gradual lightening (diminishing divergence, ~0.3–0.5).
- **Trend**: Sharp decline in divergence after layer 14.
2. **"Attn." (Attention)**:
- Layers 0–14: Light blue (low divergence, ~0.1–0.2).
- Layers 15–20: Gradual darkening (increasing divergence, ~0.2–0.4).
- Layers 21–30: Light blue again (diminishing divergence, ~0.1–0.2).
- **Trend**: Peak divergence around layers 15–20.
3. **"Last." (Last)**:
- Layers 0–14: Light blue (low divergence, ~0.1–0.2).
- Layers 15–30: Gradual darkening (increasing divergence, ~0.2–0.4).
- **Trend**: Steady increase in divergence across layers.
---
### Key Observations
- **"Subj."** exhibits the highest divergence in early layers (0–14), with a sharp drop afterward.
- **"Attn."** shows a bimodal pattern: low divergence in early and late layers, peaking mid-layers (15–20).
- **"Last."** demonstrates a consistent upward trend in divergence from layer 15 onward.
- The color bar confirms that darker shades correspond to higher divergence values.
---
### Interpretation
The heatmap suggests that:
- **Early layers (0–14)** are dominated by subject-related divergence ("Subj."), while attention and last-layer divergence are minimal.
- **Mid-layers (15–20)** show a shift: attention divergence peaks, and last-layer divergence begins to rise.
- **Late layers (21–30)** revert to lower divergence for "Attn." but maintain elevated "Last." divergence.
This pattern may indicate that subject-related features dominate early processing, while attention and final-layer representations become more significant in later layers. The divergence trends could reflect hierarchical processing in a neural network or similar system, where early layers focus on raw subject features, and later layers integrate attention and final outputs.
No explicit textual data or tables are present beyond the axis labels and legend. All values are inferred from color intensity and spatial positioning.