# Technical Document Extraction: GPT-4 vs Prometheus Eval
## 1. Labels and Axis Titles
- **X-Axis**: Labeled "Score" with discrete categories: 1, 2, 3, 4, 5.
- **Y-Axis**: Labeled "Response Length" (logarithmic scale, 100–1000).
- **Title**: "GPT-4 vs Prometheus Eval (Feedback Collection Seen Rubric Testset)".
## 2. Legend
- **Placement**: Top-left corner.
- **Colors**:
- Blue: GPT-4
- Orange: Prometheus
## 3. Box Plot Details
### Score 1
- **GPT-4 (Blue)**:
- Median: ~550
- IQR: ~500–650
- Min: ~200
- Max: ~850
- Outliers: ~250, ~300
- **Prometheus (Orange)**:
- Median: ~580
- IQR: ~520–680
- Min: ~220
- Max: ~800
- Outliers: ~200, ~350
### Score 2
- **GPT-4 (Blue)**:
- Median: ~580
- IQR: ~530–670
- Min: ~220
- Max: ~850
- Outliers: ~250, ~300
- **Prometheus (Orange)**:
- Median: ~590
- IQR: ~540–690
- Min: ~200
- Max: ~800
- Outliers: ~220, ~380
### Score 3
- **GPT-4 (Blue)**:
- Median: ~560
- IQR: ~510–660
- Min: ~200
- Max: ~850
- Outliers: ~250, ~300
- **Prometheus (Orange)**:
- Median: ~570
- IQR: ~520–680
- Min: ~220
- Max: ~800
- Outliers: ~200, ~350
### Score 4
- **GPT-4 (Blue)**:
- Median: ~550
- IQR: ~500–650
- Min: ~200
- Max: ~850
- Outliers: ~250, ~300
- **Prometheus (Orange)**:
- Median: ~580
- IQR: ~530–680
- Min: ~220
- Max: ~800
- Outliers: ~200, ~350
### Score 5
- **GPT-4 (Blue)**:
- Median: ~600
- IQR: ~550–650
- Min: ~200
- Max: ~1000
- Outliers: ~250, ~300
- **Prometheus (Orange)**:
- Median: ~610
- IQR: ~560–660
- Min: ~220
- Max: ~1000
- Outliers: ~200, ~350
## 4. Key Trends
- **Median Response Length**:
- Both models show similar medians (~550–600) across all scores.
- Prometheus exhibits slightly higher medians in Scores 2–5.
- **Variability**:
- GPT-4 has tighter interquartile ranges (IQR) in Scores 1–3.
- Prometheus shows broader IQRs in Scores 4–5.
- **Outliers**:
- Both models have outliers below 300 and above 800, with GPT-4 occasionally exceeding 1000 in Score 5.
## 5. Spatial Grounding
- **Legend Coordinates**: Top-left corner (exact [x, y] unspecified in image).
- **Color Consistency**: Blue consistently represents GPT-4; orange represents Prometheus.
## 6. Missing Elements
- No data table or numerical values explicitly provided in the image.
- No textual annotations within the box plots (e.g., exact median values).
## 7. Language
- All text is in English. No non-English content detected.