\n
## Histograms: Probability of Correctness for Answerable and Unanswerable Questions
### Overview
The image presents two histograms, stacked vertically. The top histogram represents the distribution of the probability of correctness (P(correct)) for "Answerable" questions, while the bottom histogram shows the distribution for "Unanswerable" questions. Each histogram displays two data series: "Zero-Shot" (pink) and "Trained" (purple), representing the performance of a model in these two scenarios. The x-axis represents P(correct) ranging from 30% to 90%, and the y-axis represents Density, ranging from 1 to 5.
### Components/Axes
* **Title (Top):** Answerable
* **Title (Bottom):** Unanswerable
* **X-axis Label:** P(correct)
* **X-axis Scale:** 30%, 50%, 70%, 90%
* **Y-axis Label:** Density
* **Y-axis Scale:** 1, 2, 3, 4, 5
* **Legend:**
* Zero-Shot: Pink color
* Trained: Purple color
### Detailed Analysis or Content Details
**Top Histogram (Answerable):**
* **Trained (Purple):** The distribution is unimodal, peaking around 70-80% P(correct). The density rises from approximately 1.5 at 30% to a maximum of approximately 4.8 at around 75%, then declines to approximately 1.5 at 90%.
* **Zero-Shot (Pink):** The distribution is also unimodal, but is more spread out and peaks at a lower P(correct) value, around 80-85%. The density rises from approximately 0.5 at 30% to a maximum of approximately 3.5 at around 85%, then declines to approximately 0.5 at 90%.
**Bottom Histogram (Unanswerable):**
* **Trained (Purple):** The distribution is unimodal, peaking around 30-40% P(correct). The density rises from approximately 0 at 30% to a maximum of approximately 5 at around 35%, then declines to approximately 0.5 at 90%.
* **Zero-Shot (Pink):** The distribution is unimodal, peaking around 50-60% P(correct). The density rises from approximately 0 at 30% to a maximum of approximately 3 at around 55%, then declines to approximately 0.5 at 90%.
### Key Observations
* For "Answerable" questions, the "Trained" model consistently outperforms the "Zero-Shot" model, achieving higher probabilities of correctness.
* For "Unanswerable" questions, the "Trained" model has a lower peak probability of correctness compared to the "Zero-Shot" model.
* The distributions for both models are skewed towards higher P(correct) values for "Answerable" questions and lower P(correct) values for "Unanswerable" questions.
* The "Trained" model shows a sharper peak in the "Answerable" histogram, indicating a more concentrated performance around a specific P(correct) value.
### Interpretation
The data suggests that training the model significantly improves its performance on answerable questions, leading to a higher probability of correctness. However, when faced with unanswerable questions, the trained model appears to be less confident in its incorrect answers, resulting in a lower peak probability of correctness compared to the zero-shot model. This could indicate that the training process has taught the model to recognize when a question is unanswerable and to avoid making confident, but incorrect, predictions. The difference in distribution shapes between answerable and unanswerable questions highlights the model's ability to differentiate between the two types of questions, with the trained model demonstrating a stronger ability to do so. The zero-shot model, lacking this training, appears to attempt to answer all questions, even those that are unanswerable, leading to a broader, but less accurate, distribution of probabilities.