# Technical Data Extraction: Revision Model Performance Analysis
This document contains a detailed extraction of data from two side-by-side technical charts evaluating the performance of a "Revision Model" on the MATH test suite.
---
## Chart 1: Revision Model Pass@1 At Each Step
### Metadata and Axis Labels
* **Title:** Revision Model Pass@1 At Each Step
* **Y-Axis Label:** MATH Test Accuracy (%)
* **Range:** 17 to 26
* **Markers:** 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
* **X-Axis Label:** Number of Generations
* **Range:** 0 to 65+
* **Markers:** 0, 10, 20, 30, 40, 50, 60
### Component Analysis: Scatter Plot
* **Trend Verification:** The data points show a logarithmic-style growth pattern. There is a sharp increase in accuracy from generation 1 to approximately generation 15, followed by a plateau with high variance (noise) between 23% and 25% accuracy for the remainder of the steps.
* **Key Data Points (Approximate):**
* **Start:** ~18.2% at Generation 1.
* **Initial Growth:** Reaches ~21.5% by Generation 5; ~24% by Generation 15.
* **Outlier:** A notable dip occurs around Generation 15, dropping to ~20.4% before recovering.
* **Peak:** The highest recorded accuracy appears to be ~25.2% at approximately Generation 51.
* **End:** ~23.4% at Generation 64.
---
## Chart 2: Revision Model Parallel Verses Sequential
### Metadata and Axis Labels
* **Title:** Revision Model Parallel Verses Sequential
* **Y-Axis Label:** MATH Test Accuracy (%)
* **Range:** 20 to 40 (Actual markers: 20, 25, 30, 35, 40)
* **X-Axis Label:** Number of Generations (Logarithmic Scale)
* **Markers:** $2^0$ (1), $2^1$ (2), $2^2$ (4), $2^3$ (8), $2^4$ (16), $2^5$ (32), $2^6$ (64)
### Legend and Spatial Grounding
The legend is located in the upper-left quadrant of the chart area.
* **Blue Line with Circle Marker:** Sequential Best-of-N Weighted
* **Orange Line with Circle Marker:** Parallel Best-of-N Weighted
* **Blue Line with Diamond/Small Circle:** Sequential Majority
* **Orange Line with Diamond/Small Circle:** Parallel Majority
### Trend Verification and Data Extraction
All four series show a strong upward trend as the number of generations increases. Sequential methods consistently outperform their parallel counterparts across both voting/weighting schemes.
#### 1. Best-of-N Weighted Series (Top Two Lines)
* **Trend:** These are the highest-performing methods. The gap between Sequential and Parallel is narrowest at $2^1$ and widest at $2^6$.
* **Sequential Best-of-N Weighted (Dark Blue):**
* $2^0$: ~18.5%
* $2^6$: ~41.5% (Highest overall performance)
* **Parallel Best-of-N Weighted (Dark Orange):**
* $2^0$: ~18.5%
* $2^6$: ~39.5%
#### 2. Majority Series (Bottom Two Lines)
* **Trend:** These follow a similar trajectory but at a lower accuracy offset (approx. 4-5% lower than Best-of-N).
* **Sequential Majority (Light Blue):**
* $2^0$: ~18.2%
* $2^1$: ~18.5% (Stagnant initial growth)
* $2^6$: ~37.5%
* **Parallel Majority (Light Orange):**
* $2^0$: ~18.2%
* $2^1$: ~19.5%
* $2^6$: ~35.0%
### Summary of Findings
1. **Sequential Advantage:** In the comparison of Parallel vs. Sequential, the Sequential approach provides a consistent performance boost of roughly 2-3 percentage points at higher generation counts.
2. **Methodology Impact:** "Best-of-N Weighted" significantly outperforms "Majority" voting regardless of whether the process is parallel or sequential.
3. **Scaling:** Accuracy scales effectively with the number of generations, showing no signs of a hard plateau within the $2^6$ (64) generation limit on the logarithmic chart.