Image c049751975b8...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview

INTEL_VERIFIED

# Technical Data Extraction: RL Training Performance with RSA

## 1. Document Overview
This image contains a series of five line charts comparing the performance of three different Reinforcement Learning (RL) training configurations combined with Recursive Search Algorithm (RSA). The charts measure the "Pass@1" metric across various benchmarks as the number of "RSA Steps" increases.

### Central Annotation Box
Located in the top-center of the image:
> "Aggregation-aware RL training leads to substantial gains with RSA. Standard RL, on the other hand, hurts RSA performance."

### Global Legend
Located in the center, below the annotation box:
*   **Blue Circle Line (●):** Base + RSA
*   **Green Square Line (■):** Standard RL + RSA
*   **Orange Diamond Line (◆):** Aggregation-aware RL + RSA

---

## 2. Component Analysis

### Shared Axis Definitions
*   **X-Axis:** RSA Step (Values: 2, 4, 6, 8, 10)
*   **Y-Axis:** Pass@1 (Scale varies by chart)

### Chart 1: HMMT-25 (Top Left)
*   **Y-Axis Range:** 0.30 to 0.50+
*   **Trend Analysis:**
    *   **Aggregation-aware RL + RSA (Orange):** Slopes sharply upward, significantly outperforming other methods. Ends near 0.55.
    *   **Base + RSA (Blue):** Slopes upward moderately. Ends near 0.48.
    *   **Standard RL + RSA (Green):** Slopes upward but plateaus early. Ends near 0.45.
*   **Key Data Points (Approximate):**
    *   Step 1: All start near 0.28.
    *   Step 10: Orange (~0.56), Blue (~0.48), Green (~0.45).

### Chart 2: Reasoning Gym Games (Top Right)
*   **Y-Axis Range:** 0.55 to 0.70+
*   **Trend Analysis:**
    *   **Aggregation-aware RL + RSA (Orange):** Rapid initial climb, maintains highest performance.
    *   **Base + RSA (Blue):** Steady climb, remains in the middle.
    *   **Standard RL + RSA (Green):** Lowest performance throughout, plateaus after step 4.
*   **Key Data Points (Approximate):**
    *   Step 1: Orange (~0.58), Blue (~0.54), Green (~0.53).
    *   Step 10: Orange (~0.71), Blue (~0.69), Green (~0.66).

### Chart 3: AIME-25 (Bottom Left)
*   **Y-Axis Range:** 0.50 to 0.70+
*   **Trend Analysis:**
    *   **Base + RSA (Blue):** Slopes upward and becomes the top performer after Step 4.
    *   **Aggregation-aware RL + RSA (Orange):** Slopes upward but plateaus below the Base model.
    *   **Standard RL + RSA (Green):** Similar to Orange, but with a wider shaded confidence interval indicating higher variance.
*   **Key Data Points (Approximate):**
    *   Step 1: All start near 0.45.
    *   Step 10: Blue (~0.73), Orange (~0.69), Green (~0.68).

### Chart 4: LiveCodeBench-v6 (Bottom Center)
*   **Y-Axis Range:** 0.50 to 0.60
*   **Trend Analysis:**
    *   **Aggregation-aware RL + RSA (Orange):** Strongest upward slope, clear separation from other lines.
    *   **Base + RSA (Blue):** Moderate upward slope.
    *   **Standard RL + RSA (Green):** Lowest performance, plateaus early.
*   **Key Data Points (Approximate):**
    *   Step 1: Orange (~0.51), Blue/Green (~0.49).
    *   Step 10: Orange (~0.59), Blue (~0.57), Green (~0.56).

### Chart 5: Reasoning Gym Cognition + ARC (Bottom Right)
*   **Y-Axis Range:** 0.45 to 0.55+
*   **Trend Analysis:**
    *   **Aggregation-aware RL + RSA (Orange):** Highest performance, steady growth.
    *   **Standard RL + RSA (Green):** Middle performance, plateaus after Step 6.
    *   **Base + RSA (Blue):** Starts lowest, climbs, but then shows a slight decline/plateau after Step 6.
*   **Key Data Points (Approximate):**
    *   Step 1: Green (~0.47), Orange (~0.43), Blue (~0.42).
    *   Step 10: Orange (~0.55), Green (~0.54), Blue (~0.52).

---

## 3. Summary of Findings
Across four out of five benchmarks (**HMMT-25, Reasoning Gym Games, LiveCodeBench-v6, and Reasoning Gym Cognition + ARC**), the **Aggregation-aware RL + RSA** (Orange Diamond) configuration demonstrates superior performance and scaling compared to Standard RL and the Base model. 

The **AIME-25** benchmark is the sole outlier where the **Base + RSA** (Blue Circle) configuration eventually outperforms the RL-tuned versions. In most cases, **Standard RL + RSA** (Green Square) performs the worst or plateaus the earliest, supporting the claim that standard RL can negatively impact RSA performance.

DECODING INTELLIGENCE...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free

INTEL_VERIFIED

# Technical Document Extraction: Aggregation-aware RL Training Performance

## Main Title
**Aggregation-aware RL training leads to substantial gains with RSA. Standard RL, on the other hand, hurts RSA performance.**

---

### Chart Structure
- **Grid Layout**: 6 charts arranged in 2 rows × 3 columns.
- **Axes**:
  - **X-axis**: "RSA Step" (values: 2, 4, 6, 8, 10)
  - **Y-axis**: "Pass@1" (ranging from ~0.3 to ~0.7 depending on dataset)
- **Legend**:
  - **Position**: Center of the grid (spatial grounding: [x_center, y_center])
  - **Labels**:
    - **Blue**: Base + RSA
    - **Green**: Standard RL + RSA
    - **Orange**: Aggregation-aware RL + RSA

---

### Chart Analysis by Dataset

#### 1. HMMT-25
- **Title**: HMMT-25
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.3, increases steadily to ~0.48 by step 10.
  - **Standard RL + RSA (Green)**: Flat line at ~0.45.
  - **Aggregation-aware RL + RSA (Orange)**: Sharp upward trend from ~0.3 to ~0.55.
- **Key Insight**: Aggregation-aware RL + RSA outperforms others by ~10% at step 10.

#### 2. Reasoning Gym Games
- **Title**: Reasoning Gym Games
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.65, plateaus at ~0.68.
  - **Standard RL + RSA (Green)**: Flat line at ~0.65.
  - **Aggregation-aware RL + RSA (Orange)**: Starts at ~0.65, rises to ~0.72.
- **Key Insight**: Aggregation-aware RL + RSA achieves ~7% higher performance than Standard RL + RSA.

#### 3. AIME-25
- **Title**: AIME-25
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.5, increases to ~0.7.
  - **Standard RL + RSA (Green)**: Flat line at ~0.65.
  - **Aggregation-aware RL + RSA (Orange)**: Steeper rise from ~0.5 to ~0.7.
- **Key Insight**: Aggregation-aware RL + RSA matches Base + RSA performance while outperforming Standard RL + RSA.

#### 4. LiveCodeBench-v6
- **Title**: LiveCodeBench-v6
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.5, rises to ~0.62.
  - **Standard RL + RSA (Green)**: Flat line at ~0.58.
  - **Aggregation-aware RL + RSA (Orange)**: Steeper ascent from ~0.5 to ~0.65.
- **Key Insight**: Aggregation-aware RL + RSA gains ~7% over Standard RL + RSA.

#### 5. Reasoning Gym Cognition + ARC
- **Title**: Reasoning Gym Cognition + ARC
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.45, plateaus at ~0.52.
  - **Standard RL + RSA (Green)**: Flat line at ~0.5.
  - **Aggregation-aware RL + RSA (Orange)**: Sharp rise from ~0.45 to ~0.55.
- **Key Insight**: Aggregation-aware RL + RSA achieves ~10% higher performance than Standard RL + RSA.

---

### Cross-Chart Observations
1. **Legend Consistency**:
   - All charts use the same color coding (blue/green/orange) as the central legend.
   - No discrepancies between line colors and legend labels.
2. **Performance Pattern**:
   - **Aggregation-aware RL + RSA** consistently outperforms other methods across all datasets.
   - **Standard RL + RSA** shows minimal or no improvement over baseline (Base + RSA) in most cases.
3. **RSA Step Impact**:
   - Performance improves with increasing RSA steps (steps 2–10) for all methods.
   - Aggregation-aware RL + RSA demonstrates the steepest learning curve.

---

### Conclusion
Aggregation-aware RL training with RSA significantly enhances performance compared to Standard RL + RSA across diverse datasets. The orange line (Aggregation-aware RL + RSA) dominates in all charts, validating the main title's claim.

DECODING INTELLIGENCE...

TECHNICAL ASSET FINGERPRINT

c049751975b8431b30a46d26

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1

EXPERT: nemotron-free VERSION 1