Image 9dfbfb5ba9fd...

EXPERT: gemini-3-flash-free VERSION 2

RUNTIME: google-free/gemini-3-flash-preview
INTEL_VERIFIED
## Line Chart: Sensitivity to Top-K

### Overview
This image is a line chart titled "Sensitivity to Top-K" that evaluates the performance of four different metrics—Perplexity, LN-Entropy, Lexical Similarity, and EigenScore—using the AUROC (Area Under the Receiver Operating Characteristic curve) metric. The chart tests how these metrics respond to changes in the "Top-K" parameter, ranging from 3 to 50. The overall visual impression is one of high stability across all tested methods.

### Components/Axes
*   **Title**: "Sensitivity to Top-K" (Centered at the top).
*   **Y-Axis**:
    *   **Label**: "AUROC" (Vertical, left side).
    *   **Scale**: Numerical, ranging from 35 to 90. Major tick marks are visible at 40, 50, 60, 70, 80, and 90.
*   **X-Axis**:
    *   **Label**: "Top-K" (Horizontal, bottom center).
    *   **Scale**: Categorical/Discrete values: 3, 5, 10, 20, 30, 50.
*   **Legend**: Located in the **bottom-right** quadrant within a white box with a thin grey border. It contains four entries:
    *   **Blue line with 'x' markers**: Perplexity
    *   **Grey line with diamond (♦) markers**: LN-Entropy
    *   **Teal/Cyan line with circle (●) markers**: Lexical Similarity
    *   **Orange line with star (★) markers**: EigenScore
*   **Line Style**: All data series use a dash-dot line pattern.

### Content Details

#### Data Table (Approximate Values)
The following table reconstructs the data points based on visual alignment with the Y-axis scale. Values are estimated with an uncertainty of approximately ±0.5 units.

| Top-K | Perplexity (Blue x) | LN-Entropy (Grey ♦) | Lexical Similarity (Teal ●) | EigenScore (Orange ★) |
| :--- | :---: | :---: | :---: | :---: |
| **3** | ~64.0 | ~67.0 | ~74.0 | ~79.0 |
| **5** | ~64.0 | ~67.5 | ~75.0 | ~80.5 |
| **10** | ~64.0 | ~68.5 | ~75.0 | ~79.0 |
| **20** | ~64.0 | ~68.0 | ~73.5 | ~80.0 |
| **30** | ~64.0 | ~68.5 | ~74.0 | ~80.5 |
| **50** | ~64.0 | ~69.0 | ~76.0 | ~80.0 |

#### Trend Verification
*   **EigenScore (Orange)**: Positioned at the top of the chart. The line is relatively flat with minor fluctuations, maintaining a high AUROC around 80.
*   **Lexical Similarity (Teal)**: Positioned second from the top. It shows slight volatility, with a small dip at K=20 and a peak at K=50, generally staying between 73 and 76.
*   **LN-Entropy (Grey)**: Positioned third from the top. It exhibits a very slight upward trend as K increases, moving from ~67 to ~69.
*   **Perplexity (Blue)**: Positioned at the bottom. The line is perfectly horizontal, indicating zero sensitivity to the Top-K parameter within this range.

### Key Observations
*   **Performance Ranking**: There is a clear and consistent hierarchy across all values of K: EigenScore > Lexical Similarity > LN-Entropy > Perplexity.
*   **Parameter Robustness**: All four methods demonstrate remarkable stability. The AUROC scores do not significantly degrade or improve as the Top-K value increases from 3 to 50.
*   **Perplexity Invariance**: The Perplexity metric appears completely unaffected by the Top-K setting, suggesting its calculation might be independent of this specific parameter in the context of this experiment.

### Interpretation
The data suggests that the choice of the "Top-K" hyperparameter is not critical for the performance of these specific metrics in the evaluated task. This is a positive finding for practitioners, as it implies that the models are robust and do not require extensive hyperparameter tuning for the Top-K value to achieve optimal AUROC.

**EigenScore** is the superior metric among those tested, consistently outperforming the others by a significant margin (approximately 5 points higher than Lexical Similarity and 15 points higher than Perplexity). The stability of EigenScore at the ~80 AUROC level indicates it is a reliable and high-performing choice regardless of whether a narrow (K=3) or wide (K=50) context is considered. The slight upward trend in **LN-Entropy** suggests it might benefit marginally from larger K values, whereas **Lexical Similarity** shows more noise, possibly due to the inherent variability in lexical overlap as the sample size (K) changes.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

9dfbfb5ba9fdadbb7cf80686

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 2