## Heatmap: Distance Analysis
### Overview
The heatmap illustrates the singular vector alignment (SVA) and L2 distance between different layers of a model, categorized by the type of distance used. The layers are labeled from 0 to 40, and the distance types are singular vector alignment and L2 distance.
### Components/Axes
- **X-Axis (Layer)**: Represents the layers of the model, ranging from 0 to 40.
- **Y-Axis (SV Align. / L2 Distance)**: Shows the singular vector alignment and L2 distance values, ranging from 0.0 to 1.4.
- **Legend**: Contains two categories: "Dist. Type" and "Expert clusters."
- **Dist. Type**: Singular-vector alignment and L2 distance.
- **Expert clusters**: Base to IFT, HC-SMoE, M-SMoE, M-SMoE - permuted.
### Detailed Analysis or ### Content Details
- **Singular-vector alignment (SVA)**: The lines representing SVA are consistently above the L2 distance lines, indicating that SVA values are generally higher than L2 distance values across all layers.
- **L2 distance**: The L2 distance lines are relatively flat, suggesting that the L2 distance values are relatively stable across the layers.
- **Expert clusters**: The lines for HC-SMoE and M-SMoE are consistently above the base to IFT line, indicating that these clusters have higher singular vector alignment and L2 distance values compared to the base to IFT cluster.
### Key Observations
- **Singular-vector alignment**: The SVA values are consistently higher than the L2 distance values across all layers.
- **Expert clusters**: HC-SMoE and M-SMoE clusters have higher singular vector alignment and L2 distance values compared to the base to IFT cluster.
- **Stability**: The L2 distance values are relatively stable across the layers.
### Interpretation
The heatmap suggests that the singular vector alignment is a more robust measure of distance between layers compared to the L2 distance. The expert clusters, particularly HC-SMoE and M-SMoE, show higher singular vector alignment and L2 distance values, indicating that these clusters may be more effective or accurate in their analysis. The stability of the L2 distance values across the layers suggests that the L2 distance may not be as sensitive to changes in the model's layers.