Image 42749e911b86...

EXPERT: healer-alpha-free VERSION 1

RUNTIME: free/openrouter/healer-alpha
INTEL_VERIFIED
## Line Chart: Pass@k Performance Comparison

### Overview
This is a line chart comparing the performance of four different methods (RL, SFT, MT, Base) on a metric called "Pass@k (%)". The chart plots this metric against increasing values of "k" (1, 2, and 3). All four methods show an upward trend as k increases.

### Components/Axes
*   **X-Axis:** Labeled "k". It has three discrete, evenly spaced tick marks at values 1, 2, and 3.
*   **Y-Axis:** Labeled "Pass@k (%)". The scale runs from 0.0 to 12.5, with major grid lines at intervals of 2.5 (0.0, 2.5, 5.0, 7.5, 10.0, 12.5).
*   **Legend:** Located in the bottom-right quadrant of the chart area. It contains four entries, each with a colored line and marker symbol:
    *   **RL:** Red line with a circle marker.
    *   **SFT:** Orange line with a circle marker.
    *   **MT:** Purple line with a circle marker.
    *   **Base:** Blue line with a circle marker.
*   **Data Markers:** Each data series uses an 'x' marker at k=1 and a circle marker at k=2 and k=3.

### Detailed Analysis
The chart displays the following approximate data points for each method. The trend for all series is a positive slope, indicating Pass@k increases with k.

**1. RL (Red Line):**
*   **Trend:** Slopes upward steadily.
*   **Data Points:**
    *   k=1: ~9.0% (marked with a red 'x')
    *   k=2: ~10.5% (red circle)
    *   k=3: ~12.2% (red circle)

**2. SFT (Orange Line):**
*   **Trend:** Slopes upward steadily, with a slope less steep than RL and MT.
*   **Data Points:**
    *   k=1: ~5.5% (orange 'x')
    *   k=2: ~7.6% (orange circle)
    *   k=3: ~9.6% (orange circle)

**3. MT (Purple Line):**
*   **Trend:** Slopes upward with the steepest incline of all series. It starts below RL but surpasses it by k=3.
*   **Data Points:**
    *   k=1: ~6.0% (purple 'x')
    *   k=2: ~10.5% (purple circle) - intersects with RL at this point.
    *   k=3: ~12.7% (purple circle) - the highest value on the chart.

**4. Base (Blue Line):**
*   **Trend:** Slopes upward steadily, maintaining the lowest performance across all k values.
*   **Data Points:**
    *   k=1: ~2.7% (blue 'x')
    *   k=2: ~5.0% (blue circle)
    *   k=3: ~7.9% (blue circle)

### Key Observations
1.  **Universal Improvement:** All four methods show improved Pass@k scores as k increases from 1 to 3.
2.  **MT's Steep Ascent:** The MT method exhibits the most dramatic improvement, nearly doubling its score from k=1 to k=3 and achieving the highest overall score at k=3.
3.  **RL vs. MT Crossover:** RL starts as the top performer at k=1. MT catches up to RL at k=2 (both ~10.5%) and then surpasses it at k=3.
4.  **Consistent Hierarchy at k=1:** At the starting point (k=1), the performance order from highest to lowest is clearly RL > MT > SFT > Base.
5.  **Base as Lower Bound:** The Base method consistently performs the worst but shows a similar rate of improvement to SFT.

### Interpretation
The chart demonstrates the effectiveness of different training or prompting strategies (RL, SFT, MT) compared to a Base model on a task where performance is measured by the probability of getting at least one correct answer in `k` attempts (Pass@k).

*   **The value of `k`:** Increasing `k` (allowing more attempts) universally improves the chance of success for all methods, which is an expected outcome.
*   **Method Efficacy:** All advanced methods (RL, SFT, MT) significantly outperform the Base model at every `k` value, indicating their added value.
*   **Strategic Implications:** The choice of optimal method may depend on the operational constraint for `k`. If only one attempt is allowed (`k=1`), RL is the best choice. However, if multiple attempts are feasible (`k=3`), MT becomes the most effective strategy, suggesting it may be better at generating diverse or high-quality candidate solutions that pay off when more chances are given. The steep slope of MT implies its outputs have higher variance or a better "top-k" distribution, making it more likely to contain a correct answer when more samples are drawn.
*   **SFT's Position:** SFT provides a solid improvement over Base but is consistently outperformed by RL and MT, suggesting that reinforcement learning (RL) or the specific technique used in MT offers advantages beyond supervised fine-tuning alone for this metric.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

42749e911b86b9e5e78de0ce

FOUND IN PAPERS

EXPERT: healer-alpha-free VERSION 1