## Line Chart: Parallel scaling of verifier compute: MATH-500
### Overview
The chart illustrates the relationship between the number of solutions (x-axis) and accuracy percentage (y-axis) for different computational configurations. Five data series are plotted, showing how accuracy improves as the number of solutions increases exponentially (2⁰ to 2⁵). The chart emphasizes parallel scaling efficiency across different model variants.
### Components/Axes
- **X-axis**: "Number of solutions" (logarithmic scale: 2⁰, 2¹, 2², 2³, 2⁴, 2⁵)
- **Y-axis**: "Accuracy (%)" (linear scale: 50% to 85%)
- **Legend**: Located at the bottom, with five entries:
- Orange: ThinkPRM-14B
- Green: DiscPRM-14B
- Blue: ThinkPRM-14B@4
- Brown: Majority
- Yellow: ThinkPRM-14B@8
- **Line styles**: Solid lines for all series, with markers (star, triangle, square) for data points
### Detailed Analysis
1. **ThinkPRM-14B (orange)**:
- Starts at ~50% accuracy at 2⁰
- Reaches ~80% at 2⁵
- Steady upward slope with moderate curvature
2. **DiscPRM-14B (green)**:
- Begins at ~50% at 2⁰
- Peaks at ~75% at 2⁵
- Slower growth compared to ThinkPRM variants
3. **ThinkPRM-14B@4 (blue)**:
- Starts at ~50% at 2⁰
- Reaches ~82% at 2⁵
- Sharpest initial increase, then plateaus
4. **Majority (brown)**:
- Flat line at ~50% until 2²
- Rises to ~73% at 2⁵
- Least effective scaling
5. **ThinkPRM-14B@8 (yellow)**:
- Highest performance across all points
- ~85% accuracy at 2⁵
- Most aggressive upward trajectory
### Key Observations
- All models show improved accuracy with more solutions, but scaling efficiency varies
- ThinkPRM-14B@8 consistently outperforms others by 5-10% at higher solution counts
- Majority method lags significantly until 2³, then improves slowly
- ThinkPRM-14B@4 shows the steepest initial improvement (50%→70% between 2¹→2²)
- DiscPRM-14B demonstrates the most stable but slower growth pattern
### Interpretation
The data suggests that:
1. **Model configuration impacts scaling efficiency**: Higher configurations (e.g., @8) achieve better accuracy gains per additional solution
2. **Parallel compute benefits are non-linear**: Most models show accelerating returns up to 2³ solutions, then plateau
3. **Majority method limitations**: Its flat initial performance indicates it may not leverage parallel compute effectively
4. **ThinkPRM variants outperform DiscPRM**: Suggests architectural differences in verifier compute optimization
The chart demonstrates that increasing parallel compute resources improves verification accuracy, with model architecture and configuration playing critical roles in scaling efficiency. The ThinkPRM-14B@8 configuration appears optimal for this benchmark, achieving near-85% accuracy at maximum solution count.