## Line Chart: Test Time Search Performance on MATH500
### Overview
The chart illustrates the accuracy trends of four search methods (Self-Consistency, Best-of-N, Beam Search, MCTS) across increasing generation rollouts (2⁰ to 2⁶) on the MATH500 dataset. Accuracy is measured in percentage, with all methods starting near 80% and improving as generation rollouts increase.
### Components/Axes
- **X-axis**: "Generation Rollouts" (logarithmic scale: 2⁰, 2¹, 2², 2³, 2⁴, 2⁵, 2⁶).
- **Y-axis**: "Accuracy (%)" (linear scale: 80% to 88%, increments of 2%).
- **Legend**: Located on the right, with four colored lines:
- **Blue**: Self-Consistency
- **Orange**: Best-of-N
- **Green**: Beam Search
- **Red**: MCTS
### Detailed Analysis
1. **Self-Consistency (Blue)**:
- Starts at 80% (2⁰) and increases steadily.
- At 2⁶, reaches ~84.8%.
- Slope: Gradual upward trend.
2. **Best-of-N (Orange)**:
- Begins at 80% (2⁰), surpasses Self-Consistency early.
- At 2⁶, reaches ~86.3%.
- Slope: Steeper than Self-Consistency but less than MCTS.
3. **Beam Search (Green)**:
- Starts at 80% (2⁰), lags slightly behind Best-of-N.
- At 2⁶, reaches ~86.1%.
- Slope: Moderate upward trend.
4. **MCTS (Red)**:
- Starts at 80% (2⁰), fastest initial growth.
- At 2⁶, peaks at ~86.8%.
- Slope: Steepest and most consistent improvement.
### Key Observations
- **MCTS** consistently outperforms other methods across all generation rollouts.
- **Best-of-N** and **Beam Search** show similar performance, with Best-of-N slightly ahead.
- **Self-Consistency** lags behind all other methods throughout.
- All methods exhibit monotonic improvement as generation rollouts increase.
### Interpretation
The data demonstrates that **MCTS** is the most effective method for improving search accuracy on MATH500, with a clear advantage over other approaches. The logarithmic scale of generation rollouts suggests exponential growth in computational effort, yet MCTS maintains a linear relationship between rollouts and accuracy gains. Self-Consistency’s slower progress may indicate limitations in its search strategy compared to more aggressive methods like MCTS. The convergence of Best-of-N and Beam Search near 86% implies diminishing returns for these methods at higher rollouts. This trend highlights the importance of method selection for optimization tasks in mathematical reasoning benchmarks.