\n
## Heatmap: Recompute Cost (%) by Batch Size and Window Size
### Overview
The image is a heatmap visualizing the "Recompute Cost (%)" as a function of two variables: **Batch Size** (vertical axis) and **Window Size** (horizontal axis). The cost is represented by a color gradient, with a corresponding color bar legend on the right. The data forms a triangular pattern, with values only present where the Window Size is greater than or equal to the Batch Size.
### Components/Axes
* **Y-Axis (Vertical):** Labeled **"Batch Size"**. The categories, from bottom to top, are: `1`, `2`, `4`, `8`, `16`, `32`.
* **X-Axis (Horizontal):** Labeled **"Window Size"**. The categories, from left to right, are: `16`, `32`, `64`, `128`, `256`, `512`.
* **Color Bar Legend:** Positioned vertically on the right side of the chart. It is labeled **"Recompute Cost (%)"**. The scale runs from approximately **5%** (dark blue) at the bottom to **45%** (bright orange) at the top, with intermediate tick marks at 10%, 15%, 20%, 25%, 30%, 35%, and 40%.
* **Data Grid:** A 6x6 grid where each cell contains a percentage value. Cells are colored according to the legend. Cells above the main diagonal (where Window Size < Batch Size) are empty (white).
### Detailed Analysis
The following table reconstructs the data from the heatmap. Each cell value represents the Recompute Cost (%) for the corresponding Batch Size (row) and Window Size (column).
| Batch Size \ Window Size | 16 | 32 | 64 | 128 | 256 | 512 |
| :----------------------- | :------ | :------ | :------ | :------ | :------ | :------ |
| **32** | 2.82% | | | | | |
| **16** | 3.09% | 6.40% | | | | |
| **8** | 3.11% | 6.49% | 13.83% | | | |
| **4** | 2.97% | 6.47% | 14.09% | 27.85% | | |
| **2** | 3.31% | 6.03% | 13.86% | 28.96% | 42.28% | |
| **1** | 3.44% | 6.81% | 12.42% | 24.92% | 46.41% | 42.33% |
**Trend Verification:**
* **Horizontal Trend (Fixed Batch Size):** For any given batch size, the recompute cost **increases significantly** as the window size increases. The color shifts from blue to orange moving right along any row.
* **Vertical Trend (Fixed Window Size):** For any given window size, the recompute cost generally **decreases slightly or remains stable** as the batch size increases. The color becomes slightly cooler (less orange) moving up any column.
### Key Observations
1. **Maximum Cost:** The highest recorded recompute cost is **46.41%**, occurring at **Batch Size = 1** and **Window Size = 256**.
2. **Minimum Cost:** The lowest recorded recompute cost is **2.82%**, occurring at **Batch Size = 32** and **Window Size = 16**.
3. **Cost Gradient:** The most dramatic cost increase occurs when moving from a Window Size of 64 to 128. For example, at Batch Size 4, the cost jumps from 14.09% to 27.85%.
4. **Anomaly at Batch Size 1:** The cost pattern for Batch Size = 1 is non-monotonic. It peaks at Window Size 256 (46.41%) and then **decreases** to 42.33% at Window Size 512, which is unique in the dataset.
5. **Data Sparsity:** The upper-right triangle of the matrix is empty, indicating that configurations where the Window Size is smaller than the Batch Size were not measured or are not applicable.
### Interpretation
This heatmap demonstrates the computational trade-offs in a system where "recompute cost" is a critical metric, likely related to memory optimization techniques in machine learning (e.g., activation checkpointing). The data suggests:
* **Window Size is the Dominant Factor:** Increasing the window size has a much more severe impact on recompute cost than decreasing the batch size. This implies the system's memory or computational overhead scales poorly with sequence length (window size).
* **Efficiency at Larger Batches:** For a fixed, large window size (e.g., 128 or 256), using a larger batch size (e.g., 8 or 16) results in a lower *percentage* recompute cost. This could indicate better amortization of fixed overheads or more efficient parallelization.
* **Practical Implication:** To minimize recompute cost, one should prioritize using the smallest feasible window size. If a large window is necessary, pairing it with a larger batch size can mitigate the cost percentage, though the absolute cost will still be high.
* **The Batch Size=1 Anomaly:** The drop in cost from window size 256 to 512 for batch size 1 is intriguing. It may point to a different code path, a hardware utilization threshold, or a measurement artifact for that specific, often challenging, configuration.
**In summary, the chart provides a clear quantitative guide for optimizing system parameters: recompute cost is highly sensitive to window size and is generally lower for larger batch sizes within the measured, applicable configurations.**