## Heatmap: Model Accuracy Over Training Progress
### Overview
The heatmap displays the accuracy of various models (GSM8K, Math 500, Minerva Math, Gaokao2023EN, Olympiad Bench, College Math, MMMLU STEM, Average) over different training progress stages. The color gradient represents the model size, with darker shades indicating larger models.
### Components/Axes
- **X-axis**: Training Progress (ranging from 0 to 1)
- **Y-axis**: Model Size (ranging from 0.5b to 32b)
- **Color Gradient**: Darker shades indicate larger models
- **Legend**: Color legend on the right side, indicating model sizes
- **Data Series**: Each row represents a different model, with columns showing accuracy at various training progress stages
### Detailed Analysis or ### Content Details
- **GSM8K**: Shows a general trend of increasing accuracy as training progresses, with the largest model (32b) maintaining the highest accuracy throughout.
- **Math 500**: Displays a similar trend, with the largest model consistently achieving the highest accuracy.
- **Minerva Math**: Shows a slight decrease in accuracy at the beginning of training, followed by an increase as training progresses. The largest model maintains a high accuracy level.
- **Gaokao2023EN**: Exhibits a steady increase in accuracy with training, with the largest model maintaining the highest accuracy.
- **Olympiad Bench**: Shows a gradual increase in accuracy, with the largest model maintaining the highest accuracy.
- **College Math**: Displays a slight decrease in accuracy at the beginning of training, followed by an increase as training progresses. The largest model maintains a high accuracy level.
- **MMMLU STEM**: Shows a steady increase in accuracy with training, with the largest model maintaining the highest accuracy.
- **Average**: Represents the average accuracy across all models and training progress stages. The largest model maintains the highest average accuracy.
### Key Observations
- The largest model consistently achieves the highest accuracy across all models and training progress stages.
- There is a general trend of increasing accuracy as training progresses for most models.
- The accuracy of smaller models (0.5b to 1.5b) tends to be lower than that of larger models (32b).
### Interpretation
The data suggests that larger models tend to maintain higher accuracy throughout the training process. This could be due to the increased computational resources and capacity of larger models, allowing them to learn more complex patterns and representations. The slight decrease in accuracy for smaller models at the beginning of training could be attributed to the initial learning phase where smaller models may not have enough capacity to learn the underlying patterns. However, as training progresses, the larger models are able to maintain and even improve their accuracy, indicating that they are better suited for the task at hand.