Image 741db0d2afca...

EXPERT: gemma-3-27b-it-free VERSION 1

RUNTIME: google-free/gemma-3-27b-it
INTEL_VERIFIED
## Line Chart: Scaling Training Data - MATH-500

### Overview
This line chart illustrates the relationship between the number of solutions used for training and the resulting accuracy on the MATH-500 dataset. The chart compares the performance of different models: ThinkPRM-14B (trained with 1K solutions), ThinkPRM-14B (trained with 65K solutions), DiscPRM-14B, and a "Majority" model. The generator used for all models is Qwen2.5-14B.

### Components/Axes
*   **Title:** Scaling training data: MATH-500
*   **X-axis:** Number of solutions. Scale is logarithmic, with markers at 2⁰, 2¹, 2², 2³, 2⁴, and 2⁵.
*   **Y-axis:** Accuracy (%). Scale ranges from 50% to 85%.
*   **Legend:** Located at the bottom-right of the chart.
    *   ThinkPRM-14B (1K) - Orange line with star markers.
    *   ThinkPRM-14B (65K) - Purple line with circle markers.
    *   DiscPRM-14B - Teal line with diamond markers.
    *   Majority - Brown line with plus markers.
*   **Generator:** Qwen2.5-14B (located at the top-left of the chart)

### Detailed Analysis
Here's a breakdown of each data series and their trends:

*   **ThinkPRM-14B (1K) - Orange:** Starts at approximately 51% accuracy at 2⁰ solutions.  The line slopes upward, reaching approximately 72% at 2² solutions, 78% at 2³ solutions, 81% at 2⁴ solutions, and 83% at 2⁵ solutions.
*   **ThinkPRM-14B (65K) - Purple:** Begins at approximately 51% accuracy at 2⁰ solutions. The line increases steadily, reaching approximately 73% at 2² solutions, 79% at 2³ solutions, 83% at 2⁴ solutions, and 85% at 2⁵ solutions.
*   **DiscPRM-14B - Teal:** Starts at approximately 51% accuracy at 2⁰ solutions. The line rises to approximately 68% at 2² solutions, 73% at 2³ solutions, 74% at 2⁴ solutions, and then decreases slightly to approximately 72% at 2⁵ solutions.
*   **Majority - Brown:** Starts at approximately 51% accuracy at 2⁰ solutions. The line increases rapidly, reaching approximately 66% at 2¹ solutions, 70% at 2² solutions, 77% at 2³ solutions, 80% at 2⁴ solutions, and 82% at 2⁵ solutions.

### Key Observations
*   All models start with similar accuracy (around 51%) at the lowest number of solutions (2⁰).
*   ThinkPRM-14B (65K) consistently achieves the highest accuracy across all numbers of solutions.
*   DiscPRM-14B shows a plateau and slight decrease in accuracy at higher numbers of solutions (2⁴ and 2⁵).
*   ThinkPRM-14B (1K) and Majority models exhibit similar trends, with the Majority model slightly outperforming ThinkPRM-14B (1K) at 2⁴ and 2⁵.

### Interpretation
The data suggests that increasing the number of training solutions generally improves accuracy for these models on the MATH-500 dataset. The ThinkPRM-14B model, when trained with 65K solutions, demonstrates the best performance, indicating that a larger training dataset is beneficial for this model. The DiscPRM-14B model's performance plateaus and slightly declines with more solutions, which could indicate overfitting or diminishing returns from additional training data. The Majority model performs well, suggesting that a simple majority voting approach can be effective, especially with a sufficient number of solutions. The consistent starting point for all models suggests that the initial performance is likely determined by the generator (Qwen2.5-14B) rather than the specific training data size. The logarithmic scale on the x-axis emphasizes the impact of increasing the number of solutions, particularly at higher values.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

741db0d2afca99d073c2ceea

FOUND IN PAPERS

EXPERT: gemma-3-27b-it-free VERSION 1