# Technical Document Extraction
## Chart 1: Normalized Log-Likelihood vs. Normalized FLOPs
### Axes Labels
- **Y-Axis**: "Normalized log-likelihood (measured on separate test set)"
- Range: 0.990 to 1.010
- Increment: 0.005
- **X-Axis**: "Normalized FLOPs per FFW pass (to the isoFLOP-optimal baseline)"
- Range: 0.4 to 2.0
- Increment: 0.2
### Legend
- **Baseline**: Black dashed line with circular markers
- **MoD using top-k**: Blue dotted line with circular markers
- **MoD using predictor**: Green dotted line with circular markers
### Data Trends
1. **Baseline (Black)**:
- Starts at ~1.005 log-likelihood at 0.4 FLOPs.
- Drops sharply to ~0.990 at 0.8 FLOPs.
- Rises to ~1.005 at 1.8 FLOPs.
- Ends at ~1.005 at 2.0 FLOPs.
2. **MoD using top-k (Blue)**:
- Starts at ~0.998 log-likelihood at 0.4 FLOPs.
- Dips to ~0.990 at 0.8 FLOPs.
- Rises to ~1.005 at 1.8 FLOPs.
- Ends at ~1.005 at 2.0 FLOPs.
3. **MoD using predictor (Green)**:
- Starts at ~1.005 log-likelihood at 0.4 FLOPs.
- Dips to ~0.990 at 0.8 FLOPs.
- Rises to ~1.005 at 1.8 FLOPs.
- Ends at ~1.005 at 2.0 FLOPs.
### Shaded Region
- **X-Axis Range**: 0.8 to 1.0 FLOPs
- **Y-Axis Range**: 0.990 to 1.000
### Key Observations
- All methods converge to similar log-likelihood values at higher FLOPs (1.8–2.0).
- The "MoD using predictor" method maintains higher log-likelihood than "MoD using top-k" across most FLOP ranges.
- The shaded region highlights a performance dip for all methods between 0.8–1.0 FLOPs.
---
## Chart 2: Top-k Prediction Accuracy vs. Training Steps
### Axes Labels
- **Y-Axis**: "Top-k prediction accuracy"
- Range: 0.70 to 1.00
- Increment: 0.05
- **X-Axis**: "Training step"
- Range: 0 to 15,000
- Increment: 5,000
### Data Trends
- **Line**: Solid teal line with circular markers
- Starts at ~0.90 accuracy at 0 training steps.
- Drops sharply to ~0.70 accuracy at ~500 training steps.
- Rises to ~0.95 accuracy at ~10,000 training steps.
- Plateaus at ~0.95 accuracy for the remainder of training (10,000–15,000 steps).
### Key Observations
- Initial accuracy drop suggests a learning phase or overfitting.
- Recovery to ~0.95 accuracy indicates stabilization after ~10,000 steps.
- No further improvement observed beyond 10,000 steps.
---
## Cross-Referenced Legend Consistency
- **Baseline (Black)**: Matches dashed line in Chart 1.
- **MoD using top-k (Blue)**: Matches dotted line in Chart 1.
- **MoD using predictor (Green)**: Matches dotted line in Chart 1.