## Scatter Plot: AIME 2025 Performance vs. Total Parameters
### Overview
The image is a scatter plot comparing the AIME 2025 Pass@1 Score (y-axis) against the Total Parameters (x-axis) for various language models. The plot uses different colored markers to distinguish between models with "Open-Weights Only" and "Open-Weights & Open-Data". A horizontal dashed line indicates a performance threshold.
### Components/Axes
* **Title:** AIME 2025 Performance vs. Total Parameters
* **X-axis:** Total Parameters, with a logarithmic scale. Markers at 4B, 10B, 32B, and 100B.
* **Y-axis:** AIME 2025 Pass@1 Score, with a linear scale. Markers at 50, 55, 60, 65, 70, 75, 80, 85, and 90.
* **Legend (bottom-right):**
* Gray circle: Open-Weights Only
* Tan circle: Open-Weights & Open-Data
* Orange star: Our Model
* **Horizontal Dashed Line:** Located at approximately 83 on the y-axis.
### Detailed Analysis
The data points are scattered across the plot, showing the relationship between model size (Total Parameters) and performance (AIME 2025 Pass@1 Score).
* **DASD-4B-Thinking (Ours):** Marked with an orange star, located at approximately (4B, 84). This is the highest performing model.
* **POLARIS-4B:** Gray circle, located at approximately (4B, 80).
* **Qwen3-4B-Thinking:** Tan circle, located at approximately (5B, 80).
* **Mistral3-8B:** Gray circle, located at approximately (8B, 78).
* **Nvidia-OpenReasoning-7B:** Tan circle, located at approximately (7B, 77).
* **DeepSeek-R1-Qwen3-8B:** Gray circle, located at approximately (8B, 74).
* **Mistral3-3B:** Gray circle, located at approximately (4B, 72).
* **Qwen3-14B:** Gray circle, located at approximately (14B, 70).
* **AM-thinking-v1:** Tan circle, located at approximately (30B, 73).
* **Qwen3-32B:** Gray circle, located at approximately (30B, 72).
* **Nvidia-Nemotron-Ultra-253B:** Tan circle, located at approximately (80B, 73). This marker is larger than the others.
* **GLM-Z1-32B:** Gray circle, located at approximately (30B, 60).
* **GLM-Z1-9B:** Gray circle, located at approximately (9B, 57).
* **OpenThoughts3-7B:** Tan circle, located at approximately (7B, 53).
### Key Observations
* The "Our Model" (DASD-4B-Thinking) significantly outperforms other models in terms of AIME 2025 Pass@1 Score, despite having a relatively small number of parameters (4B).
* There isn't a clear linear correlation between the number of parameters and the AIME 2025 Pass@1 Score. Some models with fewer parameters achieve higher scores than models with more parameters.
* The models using "Open-Weights & Open-Data" (tan circles) are scattered across the plot, with some performing better than others.
* The size of the data point for "Nvidia-Nemotron-Ultra-253B" is larger than the other data points.
### Interpretation
The scatter plot suggests that model performance on the AIME 2025 benchmark is not solely determined by the number of parameters. Factors such as model architecture, training data, and training methods likely play a significant role. The "Our Model" data point indicates that efficient design or specialized training can lead to superior performance even with fewer parameters. The horizontal line may represent a target performance threshold, which only "Our Model" exceeds. The size of the "Nvidia-Nemotron-Ultra-253B" data point may indicate the relative size of the model, or some other factor.