# Technical Analysis of Multi-Dataset Performance Graphs
## Key Components Extracted
### Legend
- **Position**: Top-right corner
- **Labels & Colors**:
- Recall: Purple (▲)
- NDCG: Blue (▲)
- MRR: Green (▲)
### Axis Labels
- **X-axis**: "Number of Documents" (logarithmic scale: 10⁰ to 10³)
- **Y-axis**: "Scores" (0.0 to 0.9)
---
## Dataset-Specific Analysis
### 1. Bamboogle
**Trend Verification**:
- Recall: Sharp upward slope (0.15 → 0.85)
- NDCG: Gradual increase (0.18 → 0.36)
- MRR: Flat line (0.16 → 0.24)
**Data Points**:
| Documents | Recall | NDCG | MRR |
|-----------|--------|------|-----|
| 10⁰ | 0.15 | 0.18 | 0.16|
| 10¹ | 0.45 | 0.28 | 0.22|
| 10² | 0.75 | 0.33 | 0.24|
| 10³ | 0.85 | 0.36 | 0.24|
---
### 2. HotpotQA
**Trend Verification**:
- Recall: Steep upward trajectory (0.35 → 0.92)
- NDCG: Moderate increase (0.38 → 0.55)
- MRR: Flat line (0.40 → 0.45)
**Data Points**:
| Documents | Recall | NDCG | MRR |
|-----------|--------|------|-----|
| 10⁰ | 0.35 | 0.38 | 0.40|
| 10¹ | 0.60 | 0.45 | 0.42|
| 10² | 0.80 | 0.50 | 0.45|
| 10³ | 0.92 | 0.55 | 0.45|
---
### 3. MuSiQue
**Trend Verification**:
- Recall: Rapid increase (0.18 → 0.80)
- NDCG: Steady rise (0.19 → 0.29)
- MRR: Flat line (0.17 → 0.19)
**Data Points**:
| Documents | Recall | NDCG | MRR |
|-----------|--------|------|-----|
| 10⁰ | 0.18 | 0.19 | 0.17|
| 10¹ | 0.40 | 0.22 | 0.18|
| 10² | 0.60 | 0.25 | 0.19|
| 10³ | 0.80 | 0.29 | 0.19|
---
### 4. 2WikiMultiHopQA
**Trend Verification**:
- Recall: Exponential growth (0.20 → 0.95)
- NDCG: Gradual increase (0.21 → 0.45)
- MRR: Flat line (0.20 → 0.33)
**Data Points**:
| Documents | Recall | NDCG | MRR |
|-----------|--------|------|-----|
| 10⁰ | 0.20 | 0.21 | 0.20|
| 10¹ | 0.40 | 0.28 | 0.22|
| 10² | 0.70 | 0.35 | 0.30|
| 10³ | 0.95 | 0.45 | 0.33|
---
## Cross-Validation Summary
1. **Legend Consistency**: All line colors match legend entries (purple=Recall, blue=NDCG, green=MRR)
2. **Axis Scaling**: Logarithmic x-axis confirmed via 10⁰-10³ markers
3. **Data Integrity**: All y-axis values align with visual trends (e.g., Recall lines show steepest slopes)
## Conclusion
The graphs demonstrate that Recall consistently outperforms NDCG and MRR across all datasets, with performance gains accelerating as document count increases. MRR shows minimal improvement beyond 10¹ documents in most cases.