## Scatter Plots: JS Divergence vs. Entropy for ProofWriter - Llama Models
### Overview
The image presents two scatter plots comparing JS Divergence against Entropy for two Llama models: Llama3.1-8B (top plot) and Llama3.1-70B (bottom plot). Each plot visualizes the relationship between these two metrics for correct (T) and incorrect (F) predictions. A dashed horizontal line is present in both plots, likely representing a threshold. Vertical dashed lines are also present, labeled "Lowest Entropy (Incorrect)".
### Components/Axes
Both plots share the following components:
* **X-axis:** Labeled "Entropy", ranging from 0 to approximately 40 in the top plot and 0 to 70 in the bottom plot.
* **Y-axis:** Labeled "JS Divergence", ranging from 0.04 to 0.12 in the top plot and 0.04 to 0.14 in the bottom plot.
* **Title:** Top plot: "ProofWriter - Llama3.1-8B". Bottom plot: "ProofWriter - Llama3.1-70B".
* **Legend:** Located in the top-right corner of each plot.
* Blue circles: "Correct (T)"
* Red circles: "Incorrect (F)"
* **Horizontal Dashed Line:** Present in both plots, approximately at JS Divergence = 0.075.
* **Vertical Dashed Line:** Present in both plots, labeled "Lowest Entropy (Incorrect)". The line is positioned at approximately Entropy = 10 in the top plot and Entropy = 20 in the bottom plot.
### Detailed Analysis or Content Details
**Plot 1: ProofWriter - Llama3.1-8B**
* **Correct (T) - Blue Circles:** The data points generally cluster in the lower-left region of the plot (low Entropy, low JS Divergence). The points are densely packed between Entropy 0-10 and JS Divergence 0.04-0.08. As Entropy increases beyond 10, the points become more scattered, with some extending to Entropy ~40 and JS Divergence ~0.11.
* **Incorrect (F) - Red Circles:** These points are more dispersed across the plot. A significant number of points are located in the region of higher Entropy (10-40) and higher JS Divergence (0.07-0.12). There is a noticeable concentration of points around Entropy 10-20 and JS Divergence 0.08-0.10.
* **Trend:** The blue points show a slight upward trend, indicating that as Entropy increases, JS Divergence tends to increase, but the correlation is weak. The red points are more scattered, with no clear trend.
**Plot 2: ProofWriter - Llama3.1-70B**
* **Correct (T) - Blue Circles:** Similar to the 8B model, the blue points are concentrated in the lower-left region (low Entropy, low JS Divergence). The density is highest between Entropy 0-15 and JS Divergence 0.04-0.08. The points extend to Entropy ~60 and JS Divergence ~0.13.
* **Incorrect (F) - Red Circles:** These points are also dispersed, but appear to be more concentrated at higher JS Divergence values compared to the 8B model. There's a cluster around Entropy 20-40 and JS Divergence 0.09-0.13.
* **Trend:** The blue points exhibit a similar slight upward trend as in the 8B model. The red points are scattered, with no strong trend.
### Key Observations
* **Model Size Impact:** The 70B model appears to have a wider distribution of both correct and incorrect predictions across the Entropy and JS Divergence space compared to the 8B model.
* **Threshold:** The horizontal dashed line seems to separate a region of generally correct predictions (below the line) from a region of more incorrect predictions (above the line).
* **Lowest Entropy:** The vertical dashed line highlights that incorrect predictions tend to occur at lower entropy values.
* **Overlap:** There is significant overlap between the correct and incorrect predictions in both plots, indicating that neither Entropy nor JS Divergence alone are perfect predictors of correctness.
### Interpretation
These plots explore the relationship between Entropy and JS Divergence as indicators of prediction correctness for the ProofWriter task using two different sizes of the Llama model. Entropy, in this context, likely represents the confidence or predictability of the model's output. JS Divergence measures the difference between the model's predicted probability distribution and the true distribution.
The data suggests that lower Entropy and lower JS Divergence are generally associated with correct predictions. However, the overlap between the correct and incorrect predictions indicates that these metrics are not definitive. The 70B model's wider distribution suggests that it explores a broader range of possibilities, potentially leading to both more accurate and more diverse (and sometimes incorrect) predictions.
The "Lowest Entropy (Incorrect)" line suggests that when the model is *too* confident (low Entropy) in an incorrect prediction, it is particularly problematic. This could indicate that the model is overconfident in its errors. The horizontal dashed line acts as a threshold for JS Divergence, above which predictions are more likely to be incorrect. The fact that the 70B model has more points above this threshold suggests that while it can explore more possibilities, it also has a higher chance of making more divergent (and incorrect) predictions.
The plots provide insights into the model's behavior and potential areas for improvement. Focusing on reducing JS Divergence, especially for low-entropy predictions, could lead to more reliable results.