## Line Chart: Benchmark: AMC23
### Overview
The image is a line chart comparing the validation scores of two models, **GRPO** (blue) and **MEL** (pink), across training steps (0 to 140). The y-axis represents validation scores (0.55 to 0.80), while the x-axis represents training steps. The chart highlights performance trends over time, with notable fluctuations and convergence patterns.
---
### Components/Axes
- **Title**: "Benchmark: AMC23" (top center).
- **X-axis**: "Training Step" (0 to 140, increments of 20).
- **Y-axis**: "Validation Score" (0.55 to 0.80, increments of 0.05).
- **Legend**: Located in the bottom-right corner, with:
- **GRPO**: Blue line with circular markers.
- **MEL**: Pink line with triangular markers.
---
### Detailed Analysis
#### GRPO (Blue Line)
- **Initial Phase (0–30 steps)**: Starts at 0.60, rises to 0.65 at step 10, then drops sharply to 0.55 at step 30.
- **Mid-Phase (30–100 steps)**: Recovers to 0.65 at step 40, peaks at 0.70 at step 60, dips to 0.60 at step 70, then rises to 0.70 at step 80.
- **Late Phase (100–140 steps)**: Peaks at 0.80 at step 100, drops to 0.70 at step 120, then rises to 0.75 at step 140.
#### MEL (Pink Line)
- **Initial Phase (0–30 steps)**: Starts at 0.60, rises steadily to 0.75 by step 30.
- **Mid-Phase (30–100 steps)**: Peaks at 0.80 at step 80, dips slightly to 0.75 at step 90, then stabilizes at 0.80 by step 100.
- **Late Phase (100–140 steps)**: Maintains 0.80 until step 120, then rises to 0.82 at step 140.
---
### Key Observations
1. **MEL outperforms GRPO** in validation score after step 80, maintaining higher scores (0.80–0.82) compared to GRPO's 0.70–0.75.
2. **GRPO exhibits volatility**, with sharp drops (e.g., step 30) and fluctuations, while **MEL shows smoother growth**.
3. **Convergence**: Both models improve over time, but MEL achieves higher final scores (0.82 vs. 0.75 for GRPO).
---
### Interpretation
The chart suggests that **MEL is more stable and efficient** in the AMC23 benchmark, achieving higher validation scores with fewer fluctuations. GRPO's volatility may indicate sensitivity to training dynamics or suboptimal hyperparameters. The divergence after step 80 highlights MEL's superior scalability or architectural advantages. These trends could inform model selection for similar tasks, emphasizing the importance of stability in validation performance.