## Heatmap: Confidence Progression Across Question IDs and Iterations
### Overview
The image is a horizontal heatmap visualizing confidence progression for 10 questions (Question IDs 1–10) across 10 iterations. Confidence levels are color-coded, with a vertical color bar indicating percentages from -100% (purple) to 100% (red). A green "terminated" region occupies the rightmost portion of the chart, suggesting early termination of confidence tracking for some questions.
---
### Components/Axes
- **Y-Axis (Question ID)**: Labeled "Question ID" with discrete categories 1–10.
- **X-Axis (Number of Iterations)**: Labeled "Number of Iterations" with values 1–10.
- **Color Bar**: Vertical gradient from red (100%) to purple (-100%), with intermediate labels at 50%, 0%, and -50%. The green "terminated" region is separate from the color bar.
- **Legend**: Explicitly maps colors to confidence levels:
- Red: 100%
- Orange: 50%
- Yellow: 0%
- Purple: -50%
- Dark Purple: -100%
- Green: "terminated"
---
### Detailed Analysis
- **Question 1**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 2**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 3**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 4**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 5**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 6**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 7**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 8**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 9**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
- **Question 10**:
- Iteration 1: Red (100%)
- Iteration 2: Orange (50%)
- Iteration 3: Yellow (0%)
- Iteration 4: Purple (-50%)
- Iteration 5: Dark Purple (-100%)
- Iterations 6–10: Green ("terminated")
---
### Key Observations
1. **Early Termination**: All questions show termination (green) by Iteration 6, indicating confidence dropped below -100%.
2. **Consistent Decline**: Confidence for all questions follows a similar trajectory: starting at 100% (red), dropping to 50% (orange) by Iteration 2, and reaching -100% (dark purple) by Iteration 5.
3. **Uniformity**: No question deviates from the pattern, suggesting systemic issues in confidence tracking across all questions.
---
### Interpretation
The chart reveals a critical failure in confidence progression for all questions. The uniform decline to termination suggests:
- **Systemic Model Instability**: The model’s confidence collapses rapidly across iterations, regardless of question ID.
- **Data Quality Issues**: The consistent drop may indicate flawed training data or overfitting.
- **Threshold Sensitivity**: The termination threshold (-100%) is reached early, highlighting overly strict confidence requirements.
This pattern underscores the need for model retraining, data augmentation, or confidence calibration to prevent premature termination.