# Technical Document Extraction: Attention over Values (a=64)
## Chart Overview
- **Title**: Attention over Values, a=64
- **Type**: Line graph
- **Purpose**: Visualizes throughput (TFLOP/s) across varying hidden sizes for different h/a ratios.
## Axes
- **X-axis (Hidden Size)**:
- Range: 0 to 32768
- Markers: 0, 4096, 8192, 12288, 16384, 20480, 24576, 28672, 32768
- **Y-axis (Throughput (TFLOP/s))**:
- Range: 0 to 200
- Markers: 0, 50, 100, 150, 200
## Legend
- **Location**: Right side of the plot (outside the graph area)
- **Labels and Colors**:
- `h/a = 1` (blue)
- `h/a = 2` (orange)
- `h/a = 4` (green)
- `h/a = 8` (red)
- `h/a = 16` (purple)
- `h/a = 32` (brown)
- `h/a = 64` (pink)
## Data Series Analysis
### 1. h/a = 1 (Blue)
- **Trend**: Steady upward slope with minor fluctuations.
- **Key Points**:
- At 0: 0 TFLOP/s
- At 16384: ~100 TFLOP/s
- At 32768: ~100 TFLOP/s
### 2. h/a = 2 (Orange)
- **Trend**: Gradual increase with a dip at 8192 (~90 TFLOP/s).
- **Key Points**:
- At 0: 0 TFLOP/s
- At 8192: ~90 TFLOP/s
- At 32768: ~150 TFLOP/s
### 3. h/a = 4 (Green)
- **Trend**: Sharp rise, peak at 16384 (~150 TFLOP/s), then decline.
- **Key Points**:
- At 0: 0 TFLOP/s
- At 16384: ~150 TFLOP/s
- At 32768: ~150 TFLOP/s
### 4. h/a = 8 (Red)
- **Trend**: Rapid ascent, peak at 16384 (~200 TFLOP/s), then decline.
- **Key Points**:
- At 0: 0 TFLOP/s
- At 16384: ~200 TFLOP/s
- At 32768: ~200 TFLOP/s
### 5. h/a = 16 (Purple)
- **Trend**: Steep rise, peak at 16384 (~200 TFLOP/s), then gradual decline.
- **Key Points**:
- At 0: 0 TFLOP/s
- At 16384: ~200 TFLOP/s
- At 32768: ~200 TFLOP/s
### 6. h/a = 32 (Brown)
- **Trend**: Sharp peak at 16384 (~200 TFLOP/s), then steep decline.
- **Key Points**:
- At 0: 0 TFLOP/s
- At 16384: ~200 TFLOP/s
- At 32768: ~150 TFLOP/s
### 7. h/a = 64 (Pink)
- **Trend**: Steep rise, peak at 16384 (~200 TFLOP/s), then gradual increase.
- **Key Points**:
- At 0: 0 TFLOP/s
- At 16384: ~200 TFLOP/s
- At 32768: ~220 TFLOP/s
## Critical Observations
1. **Peak at 16384**: All lines except `h/a = 1` reach their maximum throughput at 16384 hidden size.
2. **Divergence at 32768**:
- `h/a = 64` (pink) surpasses others, reaching ~220 TFLOP/s.
- `h/a = 32` (brown) declines sharply to ~150 TFLOP/s.
3. **Stability**: `h/a = 1` (blue) shows the most consistent growth without peaks/dips.
## Spatial Grounding
- **Legend Position**: Right-aligned, outside the plot boundary.
- **Color Consistency**: All lines match their legend labels (e.g., blue = `h/a = 1`).
## Conclusion
The graph demonstrates that higher `h/a` ratios (e.g., 64) achieve higher throughput at larger hidden sizes, with `h/a = 64` outperforming others at 32768. Lower ratios (e.g., 1, 2) show more gradual growth. Peaks at 16384 suggest an optimal hidden size for most configurations.