Image c049751975b8...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Extraction: Aggregation-aware RL Training Performance

## Main Title
**Aggregation-aware RL training leads to substantial gains with RSA. Standard RL, on the other hand, hurts RSA performance.**

---

### Chart Structure
- **Grid Layout**: 6 charts arranged in 2 rows × 3 columns.
- **Axes**:
  - **X-axis**: "RSA Step" (values: 2, 4, 6, 8, 10)
  - **Y-axis**: "Pass@1" (ranging from ~0.3 to ~0.7 depending on dataset)
- **Legend**:
  - **Position**: Center of the grid (spatial grounding: [x_center, y_center])
  - **Labels**:
    - **Blue**: Base + RSA
    - **Green**: Standard RL + RSA
    - **Orange**: Aggregation-aware RL + RSA

---

### Chart Analysis by Dataset

#### 1. HMMT-25
- **Title**: HMMT-25
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.3, increases steadily to ~0.48 by step 10.
  - **Standard RL + RSA (Green)**: Flat line at ~0.45.
  - **Aggregation-aware RL + RSA (Orange)**: Sharp upward trend from ~0.3 to ~0.55.
- **Key Insight**: Aggregation-aware RL + RSA outperforms others by ~10% at step 10.

#### 2. Reasoning Gym Games
- **Title**: Reasoning Gym Games
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.65, plateaus at ~0.68.
  - **Standard RL + RSA (Green)**: Flat line at ~0.65.
  - **Aggregation-aware RL + RSA (Orange)**: Starts at ~0.65, rises to ~0.72.
- **Key Insight**: Aggregation-aware RL + RSA achieves ~7% higher performance than Standard RL + RSA.

#### 3. AIME-25
- **Title**: AIME-25
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.5, increases to ~0.7.
  - **Standard RL + RSA (Green)**: Flat line at ~0.65.
  - **Aggregation-aware RL + RSA (Orange)**: Steeper rise from ~0.5 to ~0.7.
- **Key Insight**: Aggregation-aware RL + RSA matches Base + RSA performance while outperforming Standard RL + RSA.

#### 4. LiveCodeBench-v6
- **Title**: LiveCodeBench-v6
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.5, rises to ~0.62.
  - **Standard RL + RSA (Green)**: Flat line at ~0.58.
  - **Aggregation-aware RL + RSA (Orange)**: Steeper ascent from ~0.5 to ~0.65.
- **Key Insight**: Aggregation-aware RL + RSA gains ~7% over Standard RL + RSA.

#### 5. Reasoning Gym Cognition + ARC
- **Title**: Reasoning Gym Cognition + ARC
- **Trends**:
  - **Base + RSA (Blue)**: Starts at ~0.45, plateaus at ~0.52.
  - **Standard RL + RSA (Green)**: Flat line at ~0.5.
  - **Aggregation-aware RL + RSA (Orange)**: Sharp rise from ~0.45 to ~0.55.
- **Key Insight**: Aggregation-aware RL + RSA achieves ~10% higher performance than Standard RL + RSA.

---

### Cross-Chart Observations
1. **Legend Consistency**:
   - All charts use the same color coding (blue/green/orange) as the central legend.
   - No discrepancies between line colors and legend labels.
2. **Performance Pattern**:
   - **Aggregation-aware RL + RSA** consistently outperforms other methods across all datasets.
   - **Standard RL + RSA** shows minimal or no improvement over baseline (Base + RSA) in most cases.
3. **RSA Step Impact**:
   - Performance improves with increasing RSA steps (steps 2–10) for all methods.
   - Aggregation-aware RL + RSA demonstrates the steepest learning curve.

---

### Conclusion
Aggregation-aware RL training with RSA significantly enhances performance compared to Standard RL + RSA across diverse datasets. The orange line (Aggregation-aware RL + RSA) dominates in all charts, validating the main title's claim.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

c049751975b8431b30a46d26

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1