Image 40302e1999fa...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview
INTEL_VERIFIED
# Technical Document Extraction: Compute Optimal Search Chart

## 1. Metadata and Header Information
*   **Title:** Compute Optimal Search
*   **Language:** English (100%)
*   **Image Type:** Line Graph with markers
*   **Primary Subject:** Performance comparison of different search/ranking methods for mathematical problem solving across varying computational budgets.

## 2. Axis and Scale Identification
*   **Y-Axis (Vertical):** 
    *   **Label:** MATH Test Accuracy (%)
    *   **Range:** 10 to 40
    *   **Markers:** 10, 15, 20, 25, 30, 35, 40
*   **X-Axis (Horizontal):** 
    *   **Label:** Generation Budget
    *   **Scale:** Logarithmic (Base 2)
    *   **Markers:** $2^1, 2^3, 2^5, 2^7, 2^9$ (with intermediate grid lines representing $2^0, 2^2, 2^4, 2^6, 2^8$)

## 3. Legend and Component Isolation
The legend is located in the bottom-right quadrant of the main chart area.

| Legend Label | Color | Marker Style |
| :--- | :--- | :--- |
| **Majority** | Red | Solid line with circle |
| **ORM Best-of-N Weighted** | Purple | Solid line with circle |
| **PRM Best-of-N Weighted** | Green | Solid line with circle |
| **PRM Compute Optimal Oracle** | Blue | Solid line with circle |
| **PRM Compute Optimal Predicted** | Orange | Solid line with circle |

## 4. Trend Verification and Data Extraction

All data series exhibit a positive correlation: as the "Generation Budget" increases, the "MATH Test Accuracy (%)" also increases, though most series show diminishing returns at higher budgets.

### Data Series Analysis

#### A. Majority (Red Line)
*   **Trend:** The lowest performing baseline. It shows a steady, nearly linear increase on the log scale but remains significantly below all other methods.
*   **Approximate Data Points:**
    *   $2^0$: ~10.5%
    *   $2^2$: ~14%
    *   $2^4$: ~23%
    *   $2^9$: ~29%

#### B. ORM Best-of-N Weighted (Purple Line)
*   **Trend:** Slopes upward sharply until $2^4$, then decelerates, ending as the second-lowest performer at high budgets.
*   **Approximate Data Points:**
    *   $2^0$: ~10.5%
    *   $2^4$: ~28%
    *   $2^9$: ~34.5%

#### C. PRM Best-of-N Weighted (Green Line)
*   **Trend:** Consistently outperforms the ORM and Majority baselines. It maintains a steady upward trajectory throughout the budget range.
*   **Approximate Data Points:**
    *   $2^0$: ~10.5%
    *   $2^4$: ~29%
    *   $2^9$: ~38%

#### D. PRM Compute Optimal Oracle (Blue Line)
*   **Trend:** The highest performing series. It shows a very steep initial climb and reaches the highest recorded accuracy (~39.5%) at a budget of $2^8$. Note: This line ends at $2^8$.
*   **Approximate Data Points:**
    *   $2^0$: ~10.5%
    *   $2^2$: ~27%
    *   $2^4$: ~33.5%
    *   $2^8$: ~39.5%

#### E. PRM Compute Optimal Predicted (Orange Line)
*   **Trend:** Closely tracks the "Oracle" (Blue) line at lower budgets ($2^0$ to $2^4$). Between $2^4$ and $2^8$, it plateaus significantly compared to the Oracle, eventually converging with the PRM Best-of-N Weighted (Green) line.
*   **Approximate Data Points:**
    *   $2^0$: ~10.5%
    *   $2^2$: ~27%
    *   $2^4$: ~33%
    *   $2^8$: ~37%

## 5. Summary of Findings
The chart demonstrates that Process-based Reward Models (PRM) significantly outperform Outcome-based Reward Models (ORM) and simple Majority voting. The "Compute Optimal Oracle" suggests that with perfect selection, accuracy can reach nearly 40% within a $2^8$ budget. The "Predicted" model successfully mimics the Oracle at low budgets but loses its competitive edge as the budget exceeds $2^4$, falling back toward the standard PRM weighted performance.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

40302e1999fac65d1a8f4688

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1