# Technical Document Extraction: Heatmap Analysis of Weight Matrices
## 1. Document Overview
This image contains three side-by-side heatmaps comparing the subspace similarity or correlation between different weight matrices. The overall figure is titled with a mathematical function, and each subplot represents a different data source.
## 2. Header and Global Metadata
* **Main Title (Top Center):** $\phi(A_{r=64}, A'_{r=64}, i, j)$
* **Color Bar (Right Side):**
* **Scale:** Linear, ranging from **0.0 to 0.5**.
* **Color Gradient:** Dark purple/black (0.0) $\rightarrow$ Magenta/Red (0.25) $\rightarrow$ Light Peach/White (0.5).
* **Spatial Placement:** Located at the far right of the image, serving as the legend for all three plots.
---
## 3. Component Isolation and Analysis
### Region A: Left Heatmap ($\Delta W_q$)
* **Sub-title:** $\Delta W_q$
* **Y-axis Label ($i$):** Values marked at intervals: 1, 8, 16, 24, 32, 40, 48, 56.
* **X-axis Label ($j$):** Values marked at intervals: 1, 5, 10, 15, 20, 25, 30, 34, 39, 44, 49, 54, 59.
* **Visual Trend:**
* There is a high-intensity (light peach/white) vertical band on the far left ($j \approx 1$ to $5$).
* A high-intensity horizontal band exists at the very top ($i \approx 1$).
* The intensity gradually decays into dark purple as both $i$ and $j$ increase, though the "top-left" corner remains the most active.
* **Data Interpretation:** Indicates strong correlation/similarity in the lower-index subspaces of the Query weight update matrix.
### Region B: Middle Heatmap ($\Delta W_v$)
* **Sub-title:** $\Delta W_v$
* **Y-axis Label ($i$):** (Shared scale with left plot) 1 to 56.
* **X-axis Label ($j$):** (Shared scale with left plot) 1 to 59.
* **Visual Trend:**
* Similar to $\Delta W_q$, there is a high-intensity vertical band on the left edge.
* However, the horizontal band at the top is much darker (lower value) compared to $\Delta W_q$.
* The "gradient" of decay is sharper; the majority of the bottom-right area is dark purple (near 0.0).
* **Data Interpretation:** Indicates that for the Value weight update matrix, similarity is heavily concentrated in the very first few indices of $j$, with less "spread" across $i$ compared to the Query matrix.
### Region C: Right Heatmap (Random Gaussian)
* **Sub-title:** Random Gaussian
* **Y-axis Label ($i$):** (Shared scale, though labels are omitted on the right-most plot, the grid alignment persists).
* **X-axis Label ($j$):** 1, 5, 10, 15, 20, 25, 30, 34, 39, 44, 49, 54, 59.
* **Visual Trend:**
* The entire plot is uniform dark purple/black.
* There are no visible bands or clusters of high intensity.
* **Data Interpretation:** Confirms that the patterns seen in $\Delta W_q$ and $\Delta W_v$ are non-random. A random Gaussian distribution yields near-zero similarity across all indices $i, j$.
---
## 4. Summary of Key Data Points
| Feature | $\Delta W_q$ | $\Delta W_v$ | Random Gaussian |
| :--- | :--- | :--- | :--- |
| **Peak Intensity** | ~0.4 - 0.5 (Top-Left) | ~0.3 - 0.4 (Left Edge) | ~0.0 (Uniform) |
| **Horizontal Band ($i=1$)** | Strong / High Value | Weak / Low Value | None |
| **Vertical Band ($j=1$)** | Strong / High Value | Strong / High Value | None |
| **Overall Sparsity** | Moderate | High | Absolute |
## 5. Language Declaration
The text in this image is entirely in **English**, utilizing standard mathematical notation (Greek letters and subscripts). No other languages are present.