Image 6c37d7e7367d...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Analysis: Pass Rate vs. SWE-Agent SFT Tokens

## Chart Overview
This line chart illustrates the relationship between the number of SWE-Agent SFT tokens and pass rates for various AI models. The x-axis uses a logarithmic scale to represent token counts, while the y-axis shows pass rates as percentages.

## Axis Labels
- **X-axis**: `# SWE-Agent SFT tokens` (logarithmic scale: 0 → 1.5 × 10²⁸)
- **Y-axis**: `Pass Rate (%)` (0 → 60%)

## Legend
Located in the top-right corner, the legend maps colors/markers to models and metrics:
| Color/Marker | Label             |
|--------------|-------------------|
| Red circle   | RL Pass@1         |
| Red square   | RL Pass@2         |
| Red triangle | RL Pass@3         |
| Orange circle| SFT Pass@1        |
| Orange square| SFT Pass@2        |
| Orange triangle| SFT Pass@3      |
| Purple circle| MT Pass@1         |
| Purple square| MT Pass@2         |
| Purple triangle| MT Pass@3       |
| Blue circle  | Base Pass@1       |
| Blue square  | Base Pass@2       |
| Blue triangle| Base Pass@3       |

## Key Trends
1. **RL Models** (red lines):
   - Steep upward slope across all pass@ metrics
   - Pass@3 consistently outperforms pass@1 and pass@2
   - Example: At 1.5 × 10²⁸ tokens, RL Pass@3 reaches ~65%

2. **SFT Models** (orange lines):
   - Gradual increase with plateauing at higher token counts
   - Pass@3 maintains highest performance
   - Example: At 1.5 × 10²⁸ tokens, SFT Pass@3 reaches ~62%

3. **MT Models** (purple lines):
   - Moderate upward trajectory with diminishing returns
   - Pass@3 shows strongest improvement
   - Example: At 1.5 × 10²⁸ tokens, MT Pass@3 reaches ~58%

4. **Base Models** (blue lines):
   - Slow initial growth followed by plateau
   - Pass@3 marginally outperforms lower metrics
   - Example: At 1.5 × 10²⁸ tokens, Base Pass@3 reaches ~52%

## Data Points (Selected)
| Token Count       | RL Pass@1 | RL Pass@2 | RL Pass@3 | SFT Pass@1 | SFT Pass@2 | SFT Pass@3 | MT Pass@1 | MT Pass@2 | MT Pass@3 | Base Pass@1 | Base Pass@2 | Base Pass@3 |
|-------------------|-----------|-----------|-----------|------------|------------|------------|-----------|-----------|-----------|-------------|-------------|-------------|
| 1e28              | 58%       | 56%       | 65%       | 59%        | 57%        | 62%        | 55%       | 53%       | 58%       | 52%         | 50%         | 52%         |
| 1.1 × 10²⁷        | 45%       | 43%       | 54%       | 46%        | 44%        | 53%        | 42%       | 40%       | 45%       | 40%         | 38%         | 40%         |
| 1.1 × 10²⁶        | 35%       | 33%       | 44%       | 36%        | 34%        | 43%        | 31%       | 29%       | 34%       | 30%         | 28%         | 30%         |
| 2.1 × 10²⁵        | 25%       | 23%       | 34%       | 26%        | 24%        | 33%        | 21%       | 19%       | 24%       | 20%         | 18%         | 20%         |
| 2.3 × 10²⁴        | 15%       | 13%       | 24%       | 16%        | 14%        | 23%        | 11%       | 9%        | 14%       | 10%         | 8%          | 10%         |
| 2.1 × 10²³        | 5%        | 3%        | 14%       | 6%         | 4%         | 13%        | 5%        | 3%        | 7%        | 3%          | 1%          | 3%          |
| 1e21              | 2%        | 1%        | 2%        | 3%         | 1%         | 2%         | 1%        | 0%        | 1%        | 0%          | 0%          | 0%          |

## Spatial Grounding
- Legend positioned at [x: 0.95, y: 0.95] (top-right corner)
- All line colors/markers match legend entries exactly
- No overlapping data series observed

## Trend Verification
- All lines exhibit upward trajectories (confirmed visually)
- RL/SFT models show steeper slopes than MT/Base models
- Pass@3 metrics consistently outperform pass@1 and pass@2 across all models

## Component Isolation
1. **Header**: No explicit title present
2. **Main Chart**: 
   - 12 distinct data series (3 metrics × 4 models)
   - Logarithmic x-axis enables visualization of wide token range
3. **Footer**: Legend provides model/metric mapping

## Critical Observations
1. RL models demonstrate strongest performance scaling with token count
2. SFT models maintain highest absolute pass rates at maximum token count
3. Base models show minimal improvement beyond 1.1 × 10²⁷ tokens
4. Pass@3 metrics consistently outperform lower metrics by 5-15 percentage points

*Note: All numerical values extracted from visual inspection of the chart. No textual data present in the diagram.*
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

6c37d7e7367d95c876417b32

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1