# Technical Data Extraction: Normalized Performance vs. Number of Shots
## 1. Document Overview
This image contains four line/scatter plots arranged horizontally, comparing the "Normalized Performance" (y-axis) against the "Number of Shots" (x-axis) across four different datasets. Each plot includes data for four different document conditions (0-Doc, 1-Doc, 10-Doc, 100-Doc).
## 2. Global Metadata and Legend
* **Legend Location:** Top center of the image, above the charts.
* **Legend Categories:**
* **0-Doc:** Blue dashed line with blue circular markers.
* **1-Doc:** Orange dashed line with orange circular markers.
* **10-Doc:** Purple dashed line with purple circular markers.
* **100-Doc:** Grey dashed line with grey circular markers.
* **Common Y-Axis:** "Normalized Performance" (Range: -3 to 2, increments of 1).
* **Common X-Axis:** "Number of Shots" (Logarithmic scale: 0, $10^0$, $10^1$, $10^2$).
* **Visual Features:** Each data series includes a shaded confidence interval or variance band of the same color as the line.
---
## 3. Component Analysis by Dataset
### A. Bamboogle (First Plot)
* **Trend Analysis:** All series show an upward trend as the number of shots increases. The 100-Doc (Grey) series maintains the highest performance but plateaus/dips slightly after 10 shots. The 0-Doc (Blue) and 1-Doc (Orange) series start significantly lower (approx. -1.5 to -3.2) and converge toward -0.5 at higher shot counts.
* **Key Data Points (Approximate):**
* **0-Doc (Blue):** Starts at ~-1.4 (0 shots), rises to ~-0.3 (256 shots).
* **1-Doc (Orange):** Starts at ~-3.2 (0 shots), rises to ~-0.7 (256 shots).
* **10-Doc (Purple):** Starts at ~-1.2, rises to ~0.2 (256 shots).
* **100-Doc (Grey):** Starts at ~-0.7, peaks at ~1.6 (4 shots), ends at ~1.1 (32 shots).
### B. HotpotQA (Second Plot)
* **Trend Analysis:** Strong linear-log growth for all series. The performance gap between 0-Doc and 100-Doc is consistently wide (approx. 2 units of normalized performance).
* **Key Data Points (Approximate):**
* **0-Doc (Blue):** Starts at ~-3.2, rises to ~-0.8.
* **1-Doc (Orange):** Starts at ~-1.8, rises to ~0.4.
* **10-Doc (Purple):** Starts at ~-1.0, rises to ~1.0.
* **100-Doc (Grey):** Starts at ~-0.7, rises to ~0.7 (data ends at 16 shots).
### C. MuSiQue (Third Plot)
* **Trend Analysis:** Similar to HotpotQA, but the 0-Doc (Blue) series plateaus much earlier (around 1 shot) and remains relatively flat between -1.5 and -1.2.
* **Key Data Points (Approximate):**
* **0-Doc (Blue):** Starts at ~-2.3, plateaus around -1.3.
* **1-Doc (Orange):** Starts at ~-1.9, rises to ~-0.4.
* **10-Doc (Purple):** Starts at ~-1.6, rises to ~1.0.
* **100-Doc (Grey):** Starts at ~-1.2, rises to ~1.1 (data ends at 16 shots).
### D. 2WikiMultiHopQA (Fourth Plot)
* **Trend Analysis:** The 0-Doc (Blue) series shows a sharp initial increase from 0 to 1 shot, then remains flat. The 10-Doc (Purple) and 1-Doc (Orange) series show steady improvement.
* **Key Data Points (Approximate):**
* **0-Doc (Blue):** Starts at ~-2.6, rises to ~-1.8 at 1 shot, ends at ~-1.6.
* **1-Doc (Orange):** Starts at ~-2.1, rises to ~-0.1.
* **10-Doc (Purple):** Starts at ~-1.5, rises to ~0.8.
* **100-Doc (Grey):** Starts at ~-1.4, rises to ~0.9 (data ends at 32 shots).
---
## 4. Summary of Findings
1. **Document Count Impact:** There is a clear ordinal relationship where `100-Doc > 10-Doc > 1-Doc > 0-Doc` in terms of normalized performance across all datasets.
2. **Few-Shot Learning:** Increasing the "Number of Shots" generally improves performance for all document conditions, though the 0-Doc condition often plateaus earlier than conditions with more documents.
3. **Data Density:** The 100-Doc (Grey) series consistently has fewer data points on the x-axis (stopping between 16 and 32 shots) compared to the other series which extend to 256 shots.
4. **Starting Points:** At 0 shots, the performance varies wildly by dataset, with MuSiQue and 2WikiMultiHopQA showing much lower baseline performance for the 0-Doc condition compared to Bamboogle.