# Technical Document Extraction: Heatmap Analysis
## Overview
The image contains two side-by-side heatmaps comparing accuracy metrics across different configurations. The heatmaps use a color gradient from blue (low values) to red (high values) to represent performance.
---
## Left Heatmap: TriviaQA Accuracy
### Axes Labels
- **X-Axis (Horizontal):** Document Counts (`1-Doc`, `2-Doc`, `5-Doc`, `10-Doc`, `20-Doc`, `50-Doc`, `100-Doc`)
- **Y-Axis (Vertical):** Shot Sizes (`2^0-Shot`, `2^2-Shot`, `2^4-Shot`, `2^6-Shot`, `2^8-Shot`)
### Key Data Points
| Shot Size | 1-Doc | 2-Doc | 5-Doc | 10-Doc | 20-Doc | 50-Doc | 100-Doc |
|-------------|-------|-------|-------|--------|--------|--------|---------|
| 2^0-Shot | 63.6 | 64.7 | 66.6 | 67.6 | 68.6 | 65.9 | 68.2 |
| 2^2-Shot | 64.0 | 65.5 | 66.9 | 67.7 | 68.5 | 69.0 | 65.7 |
| 2^4-Shot | 65.2 | 66.5 | 67.5 | 67.9 | 68.6 | 66.8 | - |
| 2^6-Shot | 65.5 | 66.1 | 67.3 | 67.5 | 68.2 | - | - |
| 2^8-Shot | 65.5 | 66.4 | 67.2 | 67.6 | - | - | - |
### Trends
- **General Pattern:** Accuracy improves with increasing document counts up to 20-Doc, then declines at 50-Doc and 100-Doc for most shot sizes.
- **Peak Performance:**
- `2^2-Shot` achieves the highest accuracy (69.0) at 50-Doc.
- `2^0-Shot` shows the lowest baseline accuracy (63.6) at 1-Doc.
- **Anomaly:** `2^2-Shot` at 100-Doc drops to 65.7, breaking the upward trend.
---
## Right Heatmap: NaturalQ Accuracy
### Axes Labels
- **X-Axis (Horizontal):** Document Counts (`1-Doc`, `2-Doc`, `5-Doc`, `10-Doc`, `20-Doc`, `50-Doc`, `100-Doc`)
- **Y-Axis (Vertical):** Shot Sizes (`2^0-Shot`, `2^2-Shot`, `2^4-Shot`, `2^6-Shot`, `2^8-Shot`)
### Key Data Points
| Shot Size | 1-Doc | 2-Doc | 5-Doc | 10-Doc | 20-Doc | 50-Doc | 100-Doc |
|-------------|-------|-------|-------|--------|--------|--------|---------|
| 2^0-Shot | 44.8 | 49.3 | 53.5 | 54.0 | 54.6 | 42.3 | 45.6 |
| 2^2-Shot | 45.2 | 48.5 | 52.0 | 53.2 | 53.0 | 41.5 | 44.0 |
| 2^4-Shot | 45.3 | 49.0 | 51.7 | 52.3 | 52.7 | 44.2 | - |
| 2^6-Shot | 45.8 | 49.4 | 52.2 | 52.7 | 52.0 | - | - |
| 2^8-Shot | 45.2 | 48.4 | 51.2 | 50.7 | - | - | - |
### Trends
- **General Pattern:** Accuracy peaks at 20-Doc for most shot sizes, then declines sharply at 50-Doc and 100-Doc.
- **Peak Performance:**
- `2^0-Shot` achieves the highest accuracy (54.6) at 20-Doc.
- `2^2-Shot` shows the lowest baseline accuracy (41.5) at 50-Doc.
- **Anomaly:** `2^8-Shot` at 100-Doc is missing data (marked as `-`).
---
## Comparative Analysis
1. **TriviaQA vs. NaturalQ:**
- TriviaQA consistently outperforms NaturalQ across all shot sizes and document counts.
- Example: At 20-Doc, TriviaQA's `2^0-Shot` (68.6) vs. NaturalQ's `2^0-Shot` (54.6).
2. **Document Count Impact:**
- Both datasets show diminishing returns beyond 20-Doc, with significant drops at 50-Doc and 100-Doc.
- NaturalQ exhibits more pronounced declines (e.g., `2^0-Shot` drops from 54.6 at 20-Doc to 42.3 at 50-Doc).
3. **Shot Size Efficiency:**
- Higher shot sizes (`2^6-Shot`, `2^8-Shot`) show marginal improvements over lower shot sizes in TriviaQA but not in NaturalQ.
---
## Color Legend (Inferred)
- **Blue:** Low accuracy (40-50 range)
- **Red:** High accuracy (60-70 range)
- **Gradient:** Intermediate values (50-60 range)
---
## Missing Data
- Right heatmap (NaturalQ) has missing values for:
- `2^4-Shot` at 100-Doc
- `2^6-Shot` at 50-Doc and 100-Doc
- `2^8-Shot` at 20-Doc, 50-Doc, and 100-Doc
---
## Conclusion
The heatmaps reveal that TriviaQA generally achieves higher accuracy than NaturalQ, with both datasets showing optimal performance at 20-Doc configurations. Further investigation is needed to address data gaps in the NaturalQ heatmap.