## Heatmap: Accuracy Distribution Across Multiple Reasoning Rounds for Various Models
### Overview
The heatmap illustrates the distribution of accuracy across multiple reasoning rounds for four different models: DeepSeek-Math-Instruct-7B, SuperCorrect-DeepSeek-7B, Qwen.2.5-Math-7B-Instruct, and SuperCorrect-Qwen-7B. The x-axis represents the number of accurate answers, while the y-axis represents the density of accuracy scores.
### Components/Axes
- **X-Axis**: Number of Accurate Answers (Performance)
- **Y-Axis**: Density
- **Legend**: Models
- DeepSeek-Math-Instruct-7B
- SuperCorrect-DeepSeek-7B
- Qwen.2.5-Math-7B-Instruct
- SuperCorrect-Qwen-7B
### Detailed Analysis or ### Content Details
- **DeepSeek-Math-Instruct-7B**: The density is highest around 80 accurate answers, with a peak at 82. The average accuracy is 82.
- **SuperCorrect-DeepSeek-7B**: The density is highest around 120 accurate answers, with a peak at 128. The average accuracy is 128.
- **Qwen.2.5-Math-7B-Instruct**: The density is highest around 100 accurate answers, with a peak at 102. The average accuracy is 102.
- **SuperCorrect-Qwen-7B**: The density is highest around 120 accurate answers, with a peak at 128. The average accuracy is 128.
### Key Observations
- **SuperCorrect-Qwen-7B** shows the highest density and average accuracy, indicating it performs the best across multiple reasoning rounds.
- **SuperCorrect-DeepSeek-7B** and **SuperCorrect-Qwen-7B** have similar peaks and densities, suggesting they perform similarly well.
- **Qwen.2.5-Math-7B-Instruct** has a slightly lower peak and density compared to the other models, indicating it may not perform as well in terms of accuracy.
### Interpretation
The heatmap suggests that **SuperCorrect-Qwen-7B** is the most accurate model across multiple reasoning rounds, with the highest density and average accuracy. **SuperCorrect-DeepSeek-7B** and **SuperCorrect-Qwen-7B** have similar performance, while **Qwen.2.5-Math-7B-Instruct** has a slightly lower performance. The visual trend shows that as the number of accurate answers increases, the density of accuracy scores also increases, indicating that models tend to perform better with more accurate answers.