# Technical Data Extraction: PRM Search Methods Performance Analysis
This document contains a detailed extraction of data from two side-by-side charts comparing various search methods for Process-based Reward Models (PRM) on the MATH test dataset.
---
## Chart 1: Comparing PRM Search Methods
### Metadata and Axes
* **Title:** Comparing PRM Search Methods
* **Y-Axis Label:** MATH Test Accuracy (%)
* **Range:** 10 to 40
* **Markers:** 10, 15, 20, 25, 30, 35, 40
* **X-Axis Label:** Generation Budget
* **Scale:** Logarithmic (Base 2)
* **Markers:** $2^1, 2^3, 2^5, 2^7, 2^9$
* **Legend Location:** Bottom Right [approx. x=0.7, y=0.2 relative to chart area]
### Data Series Extraction
The chart tracks 7 distinct search strategies. All series originate at approximately 10.5% accuracy at the lowest budget.
| Legend Label | Color | Visual Trend | Key Data Points (Approx.) |
| :--- | :--- | :--- | :--- |
| **Best-of-N Weighted** | Orange | Steady linear-log growth; highest final accuracy. | $2^3 \approx 26\%$, $2^9 \approx 38\%$ |
| **Majority** | Green | Slowest initial growth; consistent upward slope. | $2^3 \approx 18\%$, $2^9 \approx 29\%$ |
| **Beam; M = sqrt(N)** | Red | Rapid early growth, plateaus/dips after $2^5$. | $2^4 \approx 34\%$, $2^8 \approx 33.5\%$ |
| **Beam; M = 4** | Blue | Rapid early growth, peaks at $2^8$. | $2^4 \approx 34\%$, $2^8 \approx 37\%$ |
| **1 Step Lookahead; M = sqrt(N)** | Purple | Moderate growth, plateaus early. | $2^3 \approx 29\%$, $2^8 \approx 32\%$ |
| **3 Step Lookahead; M = sqrt(N)** | Brown | Steady growth, similar to Best-of-N but lower. | $2^5 \approx 32\%$, $2^8 \approx 31.5\%$ |
| **3 Step Lookahead; M = 4** | Pink | Late bloomer; sharp rise between $2^4$ and $2^6$. | $2^4 \approx 25\%$, $2^6 \approx 35\%$, $2^8 \approx 33\%$ |
---
## Chart 2: Comparing Beam Search and Best-of-N by Difficulty Level
### Metadata and Axes
* **Title:** Comparing Beam Search and Best-of-N by Difficulty Level
* **Y-Axis Label:** MATH Test Accuracy (%)
* **Range:** 0 to 80+
* **Markers:** 0, 20, 40, 60, 80
* **X-Axis Label:** Test Questions Binned by Increasing Difficulty Level
* **Categories:** 1, 2, 3, 4, 5 (representing difficulty levels)
* **Legend Location:** Top Right [approx. x=0.85, y=0.9]
### Component Analysis
This is a grouped bar chart where each difficulty level (1-5) contains four sub-bars representing increasing generation budgets. Each bar is stacked to show the performance of three methods.
**Legend/Color Mapping:**
* **Blue:** Beam Search (Top layer)
* **Orange/Tan:** Best-of-N Weighted (Middle layer)
* **Green:** Majority (Bottom layer)
### Data Trends by Difficulty
1. **Difficulty 1 (Easiest):** All methods perform exceptionally well, reaching >80% accuracy. The performance saturates quickly across the four budget increments.
2. **Difficulty 2:** Accuracy ranges from ~30% to ~60%. There is a clear step-up in performance as the generation budget increases.
3. **Difficulty 3:** Accuracy ranges from ~20% to ~35%. Beam Search (Blue) shows a more significant marginal gain over Majority (Green) here compared to Level 1.
4. **Difficulty 4:** Accuracy drops significantly, peaking below 20%. The "Majority" method (Green) is notably low, while Beam Search provides the bulk of the successful outcomes.
5. **Difficulty 5 (Hardest):** Accuracy is near zero for all methods, with only tiny slivers of blue (Beam Search) visible at the highest budget levels, barely exceeding 1-2%.
### Summary of Findings
* **Search Efficiency:** Beam Search (Blue) and Best-of-N Weighted (Orange) consistently outperform simple Majority voting (Green) across all difficulty levels.
* **Scaling:** Increasing the "Generation Budget" provides diminishing returns on easy problems (Level 1) but is critical for mid-to-high difficulty problems (Levels 2-4).
* **Difficulty Ceiling:** There is a sharp performance drop-off at Difficulty Level 5, where none of the tested PRM search methods achieve significant accuracy.