Image d2eb8f591f5b...

EXPERT: healer-alpha-free VERSION 1

RUNTIME: free/openrouter/healer-alpha
INTEL_VERIFIED
## Bar Chart: Refusal Ratio by Training Set and Testing Condition

### Overview
This is a grouped bar chart comparing the "Refusal Ratio (%)" of a system (likely an AI model) across two different training set conditions ("UH Only" and "AH Only") when evaluated on three distinct testing sets. The chart visualizes how the model's tendency to refuse requests varies based on its training data and the type of hallucination present in the test prompt.

### Components/Axes
*   **Chart Type:** Grouped Bar Chart.
*   **X-Axis (Horizontal):** Labeled **"Training Set"**. It contains two categorical groups:
    1.  **"UH Only"** (Left group)
    2.  **"AH Only"** (Right group)
*   **Y-Axis (Vertical):** Labeled **"Refusal Ratio (%)"**. It is a linear scale ranging from 0 to 100, with major gridlines at intervals of 20 (0, 20, 40, 60, 80, 100).
*   **Legend:** Located in the **top-right corner** of the chart area, titled **"Testing set"**. It defines three data series by color:
    *   **Green square:** **"Factual Asso."** (Factual Association)
    *   **Blue square:** **"Asso. Hallu."** (Associated Hallucination)
    *   **Red/Salmon square:** **"Unasso. Halluc."** (Unassociated Hallucination)

### Detailed Analysis
The chart presents the following approximate refusal ratios for each training set and testing condition combination. Values are estimated based on bar height relative to the y-axis gridlines.

**1. Training Set: "UH Only" (Left Group)**
*   **Testing Set: Factual Asso. (Green Bar):** The bar height is approximately **30%**. The trend is a moderate refusal rate.
*   **Testing Set: Asso. Hallu. (Blue Bar):** The bar height is approximately **28%**, slightly lower than the green bar. The trend is a refusal rate similar to, but marginally less than, the factual condition.
*   **Testing Set: Unasso. Halluc. (Red Bar):** The bar height is approximately **82%**. This is the tallest bar in the entire chart, showing a very strong upward trend compared to the other two conditions in this group.

**2. Training Set: "AH Only" (Right Group)**
*   **Testing Set: Factual Asso. (Green Bar):** The bar height is approximately **22%**. The trend is a lower refusal rate compared to the "UH Only" training set for the same test.
*   **Testing Set: Asso. Hallu. (Blue Bar):** The bar height is approximately **33%**. This is the tallest bar within the "AH Only" group, showing an upward trend relative to the other conditions in this group.
*   **Testing Set: Unasso. Halluc. (Red Bar):** The bar height is approximately **24%**. The trend is a refusal rate much lower than the corresponding condition in the "UH Only" set and comparable to the factual condition within its own group.

### Key Observations
1.  **Dominant Outlier:** The refusal ratio for **"Unasso. Halluc."** when the model is trained on **"UH Only"** (~82%) is dramatically higher than any other data point in the chart. It is more than triple the value of the same test condition under "AH Only" training.
2.  **Training Set Impact:** The training set fundamentally alters the model's refusal profile:
    *   **"UH Only" Training:** Creates a model that is highly sensitive and refuses overwhelmingly to "Unasso. Halluc." prompts, while maintaining moderate, similar refusal rates for "Factual Asso." and "Asso. Hallu.".
    *   **"AH Only" Training:** Creates a model with a more balanced refusal profile across all test types, with the highest refusal rate (~33%) directed at "Asso. Hallu." prompts.
3.  **Reversal of Hallucination Sensitivity:** The model's sensitivity to hallucination type flips based on training. "UH Only" training leads to extreme sensitivity to *Unassociated* Hallucinations. "AH Only" training leads to the highest sensitivity to *Associated* Hallucinations.

### Interpretation
This chart demonstrates a clear case of **training data bias shaping model behavior**. The "Refusal Ratio" likely measures how often a model declines to answer a prompt, possibly due to safety filters or uncertainty.

*   **What the data suggests:** The model's refusal mechanism is not general but is specifically tuned to the type of hallucinations it encountered during training. Training on "Unassociated Hallucinations" (UH) appears to create an over-correction, making the model hyper-vigilant and prone to refusing similar prompts during testing. Conversely, training on "Associated Hallucinations" (AH) results in a more calibrated response, with a slight increase in caution towards the specific type of hallucination it was trained on.
*   **Relationship between elements:** The stark contrast between the two red bars ("Unasso. Halluc.") across the two training sets is the central finding. It indicates that the "UH Only" training method may be less robust or lead to more brittle behavior compared to "AH Only" training, which yields more consistent performance across different test scenarios.
*   **Implication:** For developers, this highlights the critical importance of **training data composition**. To build a model that refuses appropriately and consistently, the training data must carefully represent the spectrum of issues (like different hallucination types) the model will face. Relying on a narrow set of negative examples (like only UH) can create unintended and extreme behaviors.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

d2eb8f591f5bfcbff360dd2b

FOUND IN PAPERS

EXPERT: healer-alpha-free VERSION 1