## Horizontal Stacked Bar Chart: Pairwise Model Performance Comparison
### Overview
The image displays a horizontal stacked bar chart comparing the performance of three different "Self-Rewarding" models (M1, M2, M3) in three pairwise matchups. The chart quantifies the win rate of the left-listed model versus the right-listed model, with categories for "Left Wins," "Tie," and "Right Wins."
### Components/Axes
* **Legend:** Positioned at the top center. It defines three categories:
* **Left Wins (in Left vs. Right):** Represented by a bright green color.
* **Tie:** Represented by a light blue color.
* **Right Wins:** Represented by a red/salmon color.
* **Y-Axis (Vertical):** Lists the three model comparison matchups. From top to bottom:
1. `Self-Rewarding M3 vs. M2`
2. `Self-Rewarding M2 vs. M1`
3. `Self-Rewarding M3 vs. M1`
* **X-Axis (Horizontal):** Implicitly represents percentage (0-100%), though no axis line or labels are drawn. The total length of each bar represents 100% of outcomes.
* **Data Labels:** Numerical percentage values are embedded directly within each colored segment of the bars.
### Detailed Analysis
Each bar is segmented into three parts corresponding to the legend. The values are as follows:
1. **Top Bar: `Self-Rewarding M3 vs. M2`**
* **Left Wins (Green):** 47.7%
* **Tie (Light Blue):** 39.8%
* **Right Wins (Red):** 12.5%
* *Trend Check:* The green segment (Left Wins) is the largest, followed by a substantial tie segment, with the red segment (Right Wins) being the smallest.
2. **Middle Bar: `Self-Rewarding M2 vs. M1`**
* **Left Wins (Green):** 55.5%
* **Tie (Light Blue):** 32.8%
* **Right Wins (Red):** 11.7%
* *Trend Check:* The green segment is larger than in the first bar, the tie segment is smaller, and the red segment is slightly smaller.
3. **Bottom Bar: `Self-Rewarding M3 vs. M1`**
* **Left Wins (Green):** 68.8%
* **Tie (Light Blue):** 22.7%
* **Right Wins (Red):** 8.6%
* *Trend Check:* The green segment is the largest of all three bars, the tie segment is the smallest, and the red segment is also the smallest.
### Key Observations
* **Clear Performance Hierarchy:** The "Left Wins" percentage increases progressively from the top bar (47.7%) to the bottom bar (68.8%). This indicates that the performance gap widens when comparing models further apart in the sequence (M3 vs. M1) compared to adjacent models (M3 vs. M2, M2 vs. M1).
* **Inverse Relationship with Ties:** As the "Left Wins" percentage increases, the "Tie" percentage decreases correspondingly (39.8% -> 32.8% -> 22.7%). This suggests that clearer victories become more common when comparing more dissimilar models.
* **Consistently Low "Right Wins":** The "Right Wins" percentage is low across all comparisons (12.5%, 11.7%, 8.6%), indicating that the model listed on the right (the older model in each pair) rarely outperforms the one on the left.
* **Spatial Layout:** The legend is centered at the top. The bars are left-aligned with their labels. The numerical data labels are centered within their respective colored segments.
### Interpretation
The data strongly suggests a consistent and improving performance trend across the Self-Rewarding model versions M1, M2, and M3. The chart demonstrates that:
1. **Iterative Improvement:** Each subsequent model (M2 > M1, M3 > M2) outperforms its predecessor in a head-to-head comparison.
2. **Magnitude of Improvement:** The improvement is not linear. The performance jump from M1 to M3 (68.8% win rate) is significantly larger than the jump from M1 to M2 (55.5% win rate) or M2 to M3 (47.7% win rate). This could indicate accelerating returns or compounding improvements in the model series.
3. **Reduction in Ambiguity:** The decreasing tie rate implies that as models evolve, their outputs become more distinctly different in quality, making it easier to determine a winner. The oldest model (M1) is almost never judged superior to the newest (M3), as shown by the minimal 8.6% "Right Wins" in that matchup.
In essence, the chart provides clear, quantitative evidence for the progressive superiority of the Self-Rewarding model line, with M3 being the most advanced and M1 the baseline.