# Technical Document Extraction: Memory Comparison Chart
## Labels and Axis Titles
- **Title**: Memory Comparison
- **X-axis**: Model Size (categories: 350M, 1B, 3B, 7B)
- **Y-axis**: Memory cost (GB), ranging from 0 to 60 in increments of 10
- **Legend**:
- BF16 (light beige)
- Adafactor (light green)
- 8-bit Adam (medium green)
- 8-bit GaLore (retaining grad) (brown with diagonal stripes)
- 8-bit GaLore (brown with horizontal stripes)
- **Additional Element**: Red dashed line labeled "RTX 4090" at 24 GB
## Data Points
| Model Size | BF16 (GB) | Adafactor (GB) | 8-bit Adam (GB) | 8-bit GaLore (retaining grad) (GB) | 8-bit GaLore (GB) |
|------------|-----------|----------------|------------------|------------------------------------|-------------------|
| 350M | ~4 | ~4 | ~3 | ~2.5 | ~2 |
| 1B | ~14 | ~13 | ~9 | ~7 | ~5 |
| 3B | ~28 | ~26 | ~18 | ~15 | ~10 |
| 7B | ~60 | ~52 | ~46 | ~37 | ~22 |
## Key Trends
1. **Memory Cost Scaling**: Memory requirements increase exponentially with model size.
2. **Quantization Impact**:
- 8-bit Adam reduces memory usage by ~30-40% compared to full-precision (BF16/Adafactor).
- 8-bit GaLore (retaining grad) further reduces memory by ~20-25% compared to 8-bit Adam.
- 8-bit GaLore (no grad retention) achieves the lowest memory footprint.
3. **RTX 4090 Threshold**: The 24 GB red dashed line suggests a hardware memory limit (e.g., NVIDIA RTX 4090 GPU). Models exceeding this threshold (e.g., 7B in BF16/Adafactor) would require specialized hardware or quantization.
## Diagram Components
- **Bar Groups**: Each model size has clustered bars representing different memory optimization techniques.
- **Color Coding**:
- Solid colors for full-precision (BF16, Adafactor).
- Striped patterns for 8-bit variants (Adam, GaLore).
- **Reference Line**: Red dashed line at 24 GB for hardware comparison.
## Critical Observations
- **7B Model**:
- BF16 requires ~60 GB (exceeds RTX 4090's 24 GB).
- 8-bit GaLore reduces memory to ~22 GB, making it feasible for RTX 4090.
- **3B Model**:
- 8-bit GaLore (retaining grad) at ~15 GB is half the cost of BF16 (~28 GB).
- **Consistency**: 8-bit GaLore (retaining grad) consistently offers the best memory-efficiency tradeoff across all model sizes.