Image df8f9c3b6d3e...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview
INTEL_VERIFIED
# Technical Data Extraction: Comparative Performance Analysis of Translation Models

This document provides a detailed extraction of data from two side-by-side bar charts analyzing translation performance based on different loss objectives and preference data sources.

## 1. General Layout and Shared Metadata

*   **Image Type:** Two grouped bar charts.
*   **Y-Axis (Shared):**
    *   **Label:** `Avg. Ref-free Eval`
    *   **Scale:** Linear, ranging from 80 to 90.
    *   **Markers:** 80, 82, 84, 86, 88, 90.
*   **Legend (Shared):**
    *   **Location:** Top-left of each chart.
    *   **Blue (Diagonal Hatching /):** `xx→en` (Translation from various languages to English).
    *   **Orange (Diagonal Hatching \):** `en→xx` (Translation from English to various languages).
*   **Visual Trend (General):** In every category across both charts, the `en→xx` (orange) series consistently outperforms the `xx→en` (blue) series.

---

## 2. Left Chart: Impact of Loss Objective

This chart evaluates performance based on three different mathematical loss functions used during training.

### Component Isolation: Loss Objective
*   **X-Axis Label:** `Loss Objective`
*   **Categories:** $\mathcal{L}_{\text{prefer}}$, $\mathcal{L}_{\text{NLL}}$, $\mathcal{L}_{\text{prefer}} + \mathcal{L}_{\text{NLL}}$

### Data Table: Loss Objective Performance

| Loss Objective | xx→en (Blue) | en→xx (Orange) |
| :--- | :---: | :---: |
| $\mathcal{L}_{\text{prefer}}$ | 82.81 | 85.50 |
| $\mathcal{L}_{\text{NLL}}$ | 83.78 | 85.84 |
| $\mathcal{L}_{\text{prefer}} + \mathcal{L}_{\text{NLL}}$ | 84.29 | 87.71 |

### Trend Analysis
*   **xx→en (Blue):** Shows a steady upward trend as the objective moves from preference-only to combined loss ($+1.48$ point total increase).
*   **en→xx (Orange):** Shows a significant jump when combining objectives, particularly between $\mathcal{L}_{\text{NLL}}$ and the combined $\mathcal{L}_{\text{prefer}} + \mathcal{L}_{\text{NLL}}$ ($+2.21$ point total increase).
*   **Conclusion:** The combination of both loss functions yields the highest performance for both translation directions.

---

## 3. Right Chart: Impact of Preference Data Source

This chart evaluates performance based on the source of the data used to determine preferences.

### Component Isolation: Preference Data
*   **X-Axis Label:** `Preference Data`
*   **Categories:** `ALMA + Ref`, `GPT-4 + Ref`, `All of them`

### Data Table: Preference Data Performance

| Preference Data | xx→en (Blue) | en→xx (Orange) |
| :--- | :---: | :---: |
| ALMA + Ref | 83.70 | 86.99 |
| GPT-4 + Ref | 84.20 | 86.66 |
| All of them | 84.29 | 87.71 |

### Trend Analysis
*   **xx→en (Blue):** Shows incremental improvement as more data sources are added, peaking at "All of them" (84.29).
*   **en→xx (Orange):** Interestingly, `ALMA + Ref` (86.99) performs slightly better than `GPT-4 + Ref` (86.66), but the combination of all sources ("All of them") results in the highest overall score (87.71).
*   **Conclusion:** Utilizing all available preference data sources provides the most robust performance across both metrics.

---

## 4. Summary of Key Findings
*   **Highest Performance:** The peak performance for both charts is achieved with the combination of $\mathcal{L}_{\text{prefer}} + \mathcal{L}_{\text{NLL}}$ using "All of them" preference data, reaching **84.29** for `xx→en` and **87.71** for `en→xx`.
*   **Directional Bias:** English-to-other-languages (`en→xx`) consistently scores approximately 2.5 to 3.5 points higher than other-languages-to-English (`xx→en`) across all tested variables.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

df8f9c3b6d3eabf92dec5040

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1