# Technical Document Extraction: Image Analysis
## Chart 1: HMMMT-25
### Components
- **Title**: HMMMT-25
- **X-Axis Labels**:
- o3-mini (high)
- DeepSeek-R1
- Owen3-4B-Instruct (Base)
- Base + RSA (Ours)
- Base + RSA + RL (Ours)
- **Y-Axis Label**: Pass@1 (0–80 range)
- **Legend**:
- Teal: Base
- Orange: Base + RSA
- **Data Points**:
- o3-mini (high): 67.5 (Teal)
- DeepSeek-R1: 41.7 (Teal)
- Owen3-4B-Instruct (Base): 27.2 (Teal)
- Base + RSA (Ours): 47.6 (Orange)
- Base + RSA + RL (Ours): 55.5 (Orange)
- **Trend**: Scores increase from Owen3-4B-Instruct (27.2) to Base + RSA + RL (55.5).
## Chart 2: Reasoning Gym Games
### Components
- **Title**: Reasoning Gym Games
- **X-Axis Labels**:
- o3-mini (high)
- DeepSeek-R1
- Owen3-4B-Instruct (Base)
- Base + RSA (Ours)
- Base + RSA + RL (Ours)
- **Y-Axis Label**: Pass@1 (0–80 range)
- **Legend**:
- Teal: Base
- Orange: Base + RSA
- **Data Points**:
- o3-mini (high): 69.9 (Teal)
- DeepSeek-R1: 54.8 (Teal)
- Owen3-4B-Instruct (Base): 53.9 (Teal)
- Base + RSA (Ours): 69.0 (Orange)
- Base + RSA + RL (Ours): 70.6 (Orange)
- **Trend**: Scores rise from Owen3-4B-Instruct (53.9) to Base + RSA + RL (70.6).
## Chart 3: LiveCodeBench-v6
### Components
- **Title**: LiveCodeBench-v6
- **X-Axis Labels**: Pass@1 (0–100 range)
- **Y-Axis Labels**:
- Qwen3 Instruct (4B)
- Qwen3 Instruct (30B)
- GPT-OSS Medium (20B)
- **Legend**:
- Teal: Base
- Orange: Base + RSA
- **Data Points**:
- Qwen3 Instruct (4B):
- Base: 44.9 (Teal)
- Base + RSA: +7.1 (Orange)
- Qwen3 Instruct (30B):
- Base: 60.0 (Teal)
- Base + RSA: +7.1 (Orange)
- GPT-OSS Medium (20B):
- Base: 74.4 (Teal)
- Base + RSA: +5.6 (Orange)
- **Trend**: Base + RSA improves all models, with Qwen3 Instruct (4B) showing the largest absolute gain (+7.1).
## Chart 4: AIME-25
### Components
- **Title**: AIME-25
- **X-Axis Labels**: Pass@1 (0–100 range)
- **Y-Axis Labels**:
- Nemotron Nano (9B)
- Qwen3 Instruct (4B)
- Qwen3 Instruct (30B)
- Qwen3 Thinking (4B)
- GPT-OSS Medium (20B)
- **Legend**:
- Teal: Base
- Orange: Base + RSA
- **Data Points**:
- Nemotron Nano (9B):
- Base: 40.8 (Teal)
- Base + RSA: +32.1 (Orange)
- Qwen3 Instruct (4B):
- Base: 44.9 (Teal)
- Base + RSA: +29.9 (Orange)
- Qwen3 Instruct (30B):
- Base: 57.7 (Teal)
- Base + RSA: +27.2 (Orange)
- Qwen3 Thinking (4B):
- Base: 65.0 (Teal)
- Base + RSA: +19.4 (Orange)
- GPT-OSS Medium (20B):
- Base: 67.8 (Teal)
- Base + RSA: +22.4 (Orange)
- **Trend**: Base + RSA significantly boosts scores across all models, with Nemotron Nano (9B) showing the highest relative improvement (+32.1).
## Spatial Grounding
- **Legend Position**: Bottom-right corner of all charts.
- **Color Consistency**:
- Teal consistently represents "Base" across all charts.
- Orange consistently represents "Base + RSA" across all charts.
## Key Observations
1. **Model Performance**:
- o3-mini (high) and Qwen3 Instruct (30B) consistently achieve the highest Pass@1 scores in their respective charts.
2. **RSA Impact**:
- Base + RSA improves performance in all models, with the largest gains observed in AIME-25 (e.g., +32.1 for Nemotron Nano).
3. **RL Enhancement**:
- Adding RL (Reinforcement Learning) further improves scores in HMMMT-25 and Reasoning Gym Games (e.g., +7.9 for Base + RSA + RL vs. Base + RSA in Reasoning Gym Games).
## Language Notes
- All text is in English. No non-English content detected.