Image a9af9903f5e7...

EXPERT: gemini-2.0-flash VERSION 1

RUNTIME: nugit/gemini/gemini-2.0-flash

INTEL_VERIFIED

## Stacked Bar Chart: Self-Rewarding vs. SFT Baseline Wins

### Overview
The image is a stacked bar chart comparing the performance of two self-rewarding models (M'3 and M'2) against an SFT Baseline. The chart shows the percentage of wins, ties, and SFT Baseline wins for each comparison.

### Components/Axes
*   **Y-axis Labels:**
    *   Self-Rewarding M'3 vs. SFT Baseline
    *   Self-Rewarding M'2 vs. SFT Baseline
*   **X-axis:** Implicitly represents the percentage of outcomes (Wins, Ties, SFT Baseline Wins).
*   **Legend (Top):**
    *   Light Green: Self-Rewarding Wins
    *   Light Blue: Tie
    *   Light Red: SFT Baseline Wins

### Detailed Analysis
The chart presents two horizontal stacked bars, each representing a comparison between a self-rewarding model and the SFT Baseline. Each bar is divided into three segments representing the percentage of Self-Rewarding Wins, Ties, and SFT Baseline Wins.

*   **Self-Rewarding M'3 vs. SFT Baseline (Top Bar):**
    *   Self-Rewarding Wins (Light Green): 50.4%
    *   Tie (Light Blue): 32.8%
    *   SFT Baseline Wins (Light Red): 16.8%
*   **Self-Rewarding M'2 vs. SFT Baseline (Bottom Bar):**
    *   Self-Rewarding Wins (Light Green): 46.5%
    *   Tie (Light Blue): 34.8%
    *   SFT Baseline Wins (Light Red): 18.8%

### Key Observations
*   Self-Rewarding M'3 has a higher percentage of wins (50.4%) compared to Self-Rewarding M'2 (46.5%).
*   The percentage of ties is slightly higher for Self-Rewarding M'2 (34.8%) compared to Self-Rewarding M'3 (32.8%).
*   The SFT Baseline wins a slightly higher percentage of games against Self-Rewarding M'2 (18.8%) compared to Self-Rewarding M'3 (16.8%).

### Interpretation
The data suggests that both self-rewarding models outperform the SFT Baseline in terms of win percentage. Self-Rewarding M'3 appears to be slightly better than Self-Rewarding M'2, as it has a higher win rate and a lower percentage of SFT Baseline wins. The tie percentages are relatively close between the two models. The chart demonstrates the effectiveness of self-rewarding techniques in improving performance compared to a standard SFT Baseline.

DECODING INTELLIGENCE...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview

INTEL_VERIFIED

# Technical Data Extraction: Model Performance Comparison

## 1. Component Isolation

### Header (Legend)
Located at the top of the image, spanning the width of the chart.
- **Green Box:** Self-Rewarding Wins
- **Light Blue Box:** Tie
- **Red Box:** SFT Baseline Wins

### Main Chart (Horizontal Stacked Bar Chart)
The chart consists of two horizontal bars comparing different iterations of a "Self-Rewarding" model against an "SFT Baseline."

### Y-Axis Labels (Categories)
- **Top Bar:** Self-Rewarding $M'_3$ vs. SFT Baseline
- **Bottom Bar:** Self-Rewarding $M'_2$ vs. SFT Baseline

---

## 2. Data Table Reconstruction

The following table represents the numerical values extracted from the segments of the stacked bar chart. Values represent percentages (implied by the total sum of 100 per row).

| Comparison Category | Self-Rewarding Wins (Green) | Tie (Light Blue) | SFT Baseline Wins (Red) |
| :--- | :---: | :---: | :---: |
| **Self-Rewarding $M'_3$ vs. SFT Baseline** | 50.4 | 32.8 | 16.8 |
| **Self-Rewarding $M'_2$ vs. SFT Baseline** | 46.5 | 34.8 | 18.8 |

---

## 3. Trend Verification and Analysis

### Visual Trend Analysis
- **Self-Rewarding Wins (Green):** The green segment is the largest in both rows, indicating the Self-Rewarding models win more frequently than the baseline. The segment for $M'_3$ is visually longer than for $M'_2$.
- **Ties (Blue):** The blue segment is the second largest in both rows. It is slightly larger for $M'_2$ than for $M'_3$.
- **SFT Baseline Wins (Red):** The red segment is the smallest in both rows. It is slightly smaller for $M'_3$ than for $M'_2$.

### Key Findings
1. **Iterative Improvement:** The "Self-Rewarding" model shows improvement from iteration $M'_2$ to $M'_3$. The win rate increases from **46.5%** to **50.4%**.
2. **Dominance over Baseline:** In both iterations, the Self-Rewarding model significantly outperforms the SFT Baseline. In the $M'_3$ iteration, the win rate (50.4%) is exactly triple the baseline's win rate (16.8%).
3. **Reduction in Baseline Performance:** As the Self-Rewarding model iterates from $M'_2$ to $M'_3$, the SFT Baseline win rate drops from **18.8%** to **16.8%**, and the tie rate drops from **34.8%** to **32.8%**.

DECODING INTELLIGENCE...

EXPERT: gemma-3-27b-it-free VERSION 1

RUNTIME: google-free/gemma-3-27b-it

INTEL_VERIFIED

\n
## Stacked Bar Chart: Comparison of Self-Rewarding Models vs. SFT Baseline

### Overview
The image presents a stacked bar chart comparing the performance of two self-rewarding models (M¹₂ and M²₂) against an SFT Baseline. The chart visualizes the proportion of wins, ties, and losses for each model in comparison to the baseline. Each bar represents a comparison, and is divided into three segments representing "Self-Rewarding Wins", "Tie", and "SFT Baseline Wins".

### Components/Axes
*   **Y-Axis:** Lists the two self-rewarding models being compared: "Self-Rewarding M¹₂ vs. SFT Baseline" and "Self-Rewarding M²₂ vs. SFT Baseline".
*   **X-Axis:** Represents the percentage of outcomes, with no explicit scale markings but values ranging from 0 to approximately 50.
*   **Legend:** Located at the top-left corner, defines the color coding:
    *   Green: Self-Rewarding Wins
    *   Light Blue: Tie
    *   Red: SFT Baseline Wins

### Detailed Analysis
The chart consists of two stacked bars, one for each model comparison.

**1. Self-Rewarding M¹₂ vs. SFT Baseline:**
*   **Self-Rewarding Wins (Green):** Approximately 50.4%.
*   **Tie (Light Blue):** Approximately 32.8%.
*   **SFT Baseline Wins (Red):** Approximately 16.8%.

**2. Self-Rewarding M²₂ vs. SFT Baseline:**
*   **Self-Rewarding Wins (Green):** Approximately 46.5%.
*   **Tie (Light Blue):** Approximately 34.8%.
*   **SFT Baseline Wins (Red):** Approximately 18.8%.

### Key Observations
*   Both self-rewarding models (M¹₂ and M²₂) demonstrate a higher proportion of wins compared to the SFT Baseline.
*   Model M¹₂ has a slightly higher win rate (50.4%) than Model M²₂ (46.5%).
*   The proportion of ties is relatively similar for both models, around 33-35%.
*   The SFT Baseline win rate is consistently lower for both comparisons, around 17-19%.

### Interpretation
The data suggests that both self-rewarding models outperform the SFT Baseline in this comparison. The higher win rates for the self-rewarding models indicate that they are more effective at achieving favorable outcomes against the baseline. The difference in win rates between M¹₂ and M²₂ suggests that M¹₂ is slightly more effective than M²₂. The consistent presence of ties indicates that there are scenarios where neither model nor the baseline achieves a clear win. The data does not provide information on the nature of the tasks or the criteria for determining a "win", "tie", or "loss". Further investigation would be needed to understand the specific strengths and weaknesses of each model and the reasons for the observed performance differences.

DECODING INTELLIGENCE...

EXPERT: healer-alpha-free VERSION 1

RUNTIME: free/openrouter/healer-alpha

INTEL_VERIFIED

## Horizontal Stacked Bar Chart: Self-Rewarding vs. SFT Baseline Comparison

### Overview
The image displays a horizontal stacked bar chart comparing the performance of two "Self-Rewarding" models (M³ and M²) against a "SFT Baseline" model. The chart quantifies outcomes into three categories: wins for the Self-Rewarding model, ties, and wins for the SFT Baseline. The data is presented as percentages.

### Components/Axes
*   **Legend:** Positioned at the top center of the chart. It defines three color-coded categories:
    *   **Green:** "Self-Rewarding Wins"
    *   **Light Blue:** "Tie"
    *   **Red:** "SFT Baseline Wins"
*   **Bars:** Two horizontal bars are stacked and aligned to the left.
    *   **Top Bar:** Labeled "Self-Rewarding M³ vs. SFT Baseline" on the left.
    *   **Bottom Bar:** Labeled "Self-Rewarding M² vs. SFT Baseline" on the left.
*   **Data Labels:** Numerical percentage values are embedded within each colored segment of the bars.

### Detailed Analysis
The chart presents the following precise data points for each comparison:

**1. Self-Rewarding M³ vs. SFT Baseline (Top Bar)**
*   **Self-Rewarding Wins (Green, left segment):** 50.4%
*   **Tie (Light Blue, middle segment):** 32.8%
*   **SFT Baseline Wins (Red, right segment):** 16.8%

**2. Self-Rewarding M² vs. SFT Baseline (Bottom Bar)**
*   **Self-Rewarding Wins (Green, left segment):** 46.5%
*   **Tie (Light Blue, middle segment):** 34.8%
*   **SFT Baseline Wins (Red, right segment):** 18.8%

**Visual Trend Verification:**
*   In both bars, the green segment ("Self-Rewarding Wins") is the largest, occupying roughly half or more of the total bar length.
*   The light blue segment ("Tie") is the second-largest.
*   The red segment ("SFT Baseline Wins") is the smallest in both cases.
*   The total length of each bar represents 100% (50.4 + 32.8 + 16.8 = 100.0 for M³; 46.5 + 34.8 + 18.8 = 100.1 for M², with the minor discrepancy likely due to rounding).

### Key Observations
1.  **Dominant Performance:** The Self-Rewarding model achieves a higher win rate than the SFT Baseline in both comparisons (50.4% vs. 16.8% for M³; 46.5% vs. 18.8% for M²).
2.  **Significant Tie Rate:** A substantial portion of outcomes result in a tie, ranging from approximately one-third (32.8% to 34.8%) of all comparisons.
3.  **Model Version Comparison:** The M³ version of the Self-Rewarding model shows a stronger performance (50.4% wins) against the baseline compared to the M² version (46.5% wins). Correspondingly, the M³ version has a slightly lower tie rate and a lower baseline win rate.
4.  **Consistent Hierarchy:** The order of performance (Self-Rewarding Wins > Ties > SFT Baseline Wins) is consistent across both model versions tested.

### Interpretation
This chart provides a clear, quantitative evaluation demonstrating the superiority of the "Self-Rewarding" training or modeling approach over a standard "SFT (Supervised Fine-Tuning) Baseline." The data suggests that the Self-Rewarding method leads to outcomes where it is judged as superior more than twice as often as the baseline (e.g., 50.4% vs. 16.8% for M³).

The high tie rate is a critical finding. It indicates that in a large fraction of cases (~33-35%), the two models produce outputs of comparable quality or are indistinguishable based on the evaluation criteria. This could imply that the baseline is still competitive in many scenarios, or that the evaluation metric has a threshold that both models frequently meet.

The comparison between M³ and M² suggests iterative improvement, with the M³ version yielding a higher win rate and a lower loss/tie rate against the same baseline. This chart would be essential in a technical report to justify the development of the Self-Rewarding approach, showcasing its effectiveness and providing a nuanced view that includes ties, not just a simple win/loss ratio.

DECODING INTELLIGENCE...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free

INTEL_VERIFIED

DECODING INTELLIGENCE...

TECHNICAL ASSET FINGERPRINT

a9af9903f5e7285ef554f311

FOUND IN PAPERS

EXPERT: gemini-2.0-flash VERSION 1

EXPERT: gemini-3-flash-free VERSION 1

EXPERT: gemma-3-27b-it-free VERSION 1

EXPERT: healer-alpha-free VERSION 1

EXPERT: nemotron-free VERSION 1