## Line Chart: Pass Rate vs. SWE-Agent SFT Tokens
### Overview
This line chart depicts the relationship between the number of SWE-Agent SFT tokens and the pass rate for different training methods (RL, SFT, MT, and Base). The pass rate is measured in percentage (%). Each training method is further categorized by the "Pass@k" metric, where k represents the number of attempts (1, 2, or 3).
### Components/Axes
* **X-axis:** "# SWE-Agent SFT tokens". Scale is logarithmic, with markers at approximately 0, 221, 223, 224, 1.1 x 2<sup>25</sup>, 1.1 x 2<sup>26</sup>, 1.1 x 2<sup>27</sup>, and 1.5 x 2<sup>28</sup>.
* **Y-axis:** "Pass Rate (%)". Scale ranges from 0 to 60, with increments of 10.
* **Legend:** Located in the top-right corner, listing the following data series:
* RL Pass@1 (Red)
* RL Pass@2 (Medium Red)
* RL Pass@3 (Light Red)
* SFT Pass@1 (Orange)
* SFT Pass@2 (Medium Orange)
* SFT Pass@3 (Light Orange)
* MT Pass@1 (Purple)
* MT Pass@2 (Medium Purple)
* MT Pass@3 (Light Purple)
* Base Pass@1 (Blue)
* Base Pass@2 (Medium Blue)
* Base Pass@3 (Light Blue)
### Detailed Analysis
Here's a breakdown of each data series, noting trends and approximate data points.
* **RL Pass@1 (Red):** Starts at approximately 3% at 0 tokens, increases sharply to around 55% at 1.1 x 2<sup>26</sup> tokens, and plateaus around 58% at 1.5 x 2<sup>28</sup> tokens.
* **RL Pass@2 (Medium Red):** Starts at approximately 5% at 0 tokens, increases steadily to around 50% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 55% at 1.5 x 2<sup>28</sup> tokens.
* **RL Pass@3 (Light Red):** Starts at approximately 7% at 0 tokens, increases rapidly to around 45% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 52% at 1.5 x 2<sup>28</sup> tokens.
* **SFT Pass@1 (Orange):** Starts at approximately 18% at 0 tokens, increases to around 45% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 50% at 1.5 x 2<sup>28</sup> tokens.
* **SFT Pass@2 (Medium Orange):** Starts at approximately 15% at 0 tokens, increases to around 40% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 48% at 1.5 x 2<sup>28</sup> tokens.
* **SFT Pass@3 (Light Orange):** Starts at approximately 12% at 0 tokens, increases to around 35% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 45% at 1.5 x 2<sup>28</sup> tokens.
* **MT Pass@1 (Purple):** Starts at approximately 2% at 0 tokens, increases to around 25% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 30% at 1.5 x 2<sup>28</sup> tokens.
* **MT Pass@2 (Medium Purple):** Starts at approximately 3% at 0 tokens, increases to around 20% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 25% at 1.5 x 2<sup>28</sup> tokens.
* **MT Pass@3 (Light Purple):** Starts at approximately 4% at 0 tokens, increases to around 15% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 20% at 1.5 x 2<sup>28</sup> tokens.
* **Base Pass@1 (Blue):** Starts at approximately 1% at 0 tokens, increases to around 10% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 15% at 1.5 x 2<sup>28</sup> tokens.
* **Base Pass@2 (Medium Blue):** Starts at approximately 2% at 0 tokens, increases to around 8% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 12% at 1.5 x 2<sup>28</sup> tokens.
* **Base Pass@3 (Light Blue):** Starts at approximately 3% at 0 tokens, increases to around 6% at 1.1 x 2<sup>26</sup> tokens, and reaches approximately 10% at 1.5 x 2<sup>28</sup> tokens.
### Key Observations
* RL methods consistently achieve the highest pass rates across all "Pass@k" values.
* Increasing the number of attempts ("Pass@k") generally improves the pass rate for each training method.
* The pass rate improvement plateaus for all methods as the number of SWE-Agent SFT tokens increases beyond 1.1 x 2<sup>26</sup>.
* The Base methods have the lowest pass rates, indicating they benefit the least from increased SFT tokens.
* The SFT methods perform better than the MT and Base methods, but not as well as the RL methods.
### Interpretation
The data suggests that Reinforcement Learning (RL) is the most effective training method for improving pass rates, followed by Supervised Fine-Tuning (SFT). The number of SWE-Agent SFT tokens has a significant positive impact on pass rates, but there appears to be a diminishing return as the token count increases. The "Pass@k" metric demonstrates that allowing more attempts improves performance, which is expected. The relatively poor performance of the Base methods suggests that fine-tuning with SFT tokens is crucial for achieving higher pass rates. The logarithmic scale of the x-axis indicates that the initial gains in pass rate are more substantial at lower token counts, and that the rate of improvement slows down as the token count increases. This could be due to the model reaching a point of diminishing returns or saturation. The differences between Pass@1, Pass@2, and Pass@3 for each method show the benefit of allowing multiple attempts, and the magnitude of that benefit varies by method.