# Technical Document Extraction: Weight Packing and Latency Analysis
## Diagram Components and Flow
### Original Weights
- **Structure**: 127 weights labeled `W0` to `W31` (with ellipsis for intermediate values).
- **Color Coding**:
- Red: `W31`, `W30`, ..., `W16` (higher indices).
- Gray: `W15`, ..., `W0` (lower indices).
### Packed Weights (`Pw`)
- **Reordering Offline**:
- Red: `W31`, `W15`, ..., `W17` (reordered high indices).
- Gray: `W1`, ..., `W0` (reordered low indices).
- **Structure**: 127 weights, same as original but reordered.
### Mask
- **Value**: `0x0F...0F` (128-bit mask).
- **Purpose**: Used to split packed weights into `W_low` and `W_high`.
### Runtime Unpacking
- **Equations**:
- `W_low = Pw & Mask`
- `W_high = (Pw >> 4) & Mask`
- **Output**:
- `W_low`: 127 weights (gray and white, with zeros).
- `W_high`: 127 weights (red and white, with zeros).
### Flow Summary
1. Original weights (`W`) → Reordered offline → Packed weights (`Pw`).
2. Packed weights (`Pw`) → Runtime unpacking → `W_low` and `W_high`.
---
## Bar Chart: Latency Comparison
### Axes
- **X-Axis**: Configurations (e.g., `(4k,4k)`, `(11k,4k)`, `(4k,11k)`, `(4k,32k)`).
- **Y-Axis**: Latency (microseconds, `us`).
### Legend
- **Original Weights**: Gray bars.
- **Packed Weights**: Red bars.
### Key Data Points
| Configuration | Original Weights (us) | Packed Weights (us) |
|---------------|-----------------------|---------------------|
| `(4k,4k)` | 248 | 215 |
| `(11k,4k)` | 472 | 399 |
| `(4k,11k)` | 489 | 400 |
| `(4k,32k)` | 1172 | 954 |
### Trends
1. **Latency Reduction**: Packed weights consistently reduce latency across all configurations.
2. **Most Significant Reduction**: `(4k,32k)` configuration shows the largest drop (1172 → 954 us).
3. **Original vs. Packed**: Original weights always exhibit higher latency than packed weights.
---
## Equations and Mask Details
- **Mask**: `0x0F...0F` (128-bit, repeated `0x0F` pattern).
- **Unpacking Logic**:
- `W_low` retains lower 4 bits of `Pw`.
- `W_high` retains upper 4 bits of `Pw` after right-shifting by 4 bits.
---
## Notes
- **Color Consistency**: Gray (original) and red (packed) align with diagram and chart labels.
- **Weight Count**: All weight arrays (`W`, `Pw`, `W_low`, `W_high`) contain 127 elements.
- **Mask Application**: Ensures proper bitwise separation during runtime unpacking.