# Technical Document Extraction: GPT-4 vs Prometheus Eval (Vicuna)
## 1. Title
- **Title**: "GPT-4 vs Prometheus Eval (Vicuna)"
- **Language**: English
## 2. Axes and Labels
- **X-Axis**:
- Label: "Score"
- Categories: 1, 2, 3, 4, 5 (ordinal)
- **Y-Axis**:
- Label: "Response Length"
- Units: Implicit (numeric, no explicit units provided)
## 3. Legend
- **Position**: Top-left corner
- **Labels**:
- Blue: "GPT-4"
- Orange: "Prometheus"
- **Color Verification**:
- Blue boxes correspond to GPT-4 data.
- Orange boxes correspond to Prometheus data.
## 4. Box Plot Components
### General Structure
- **Components per Score**:
- Median (green line)
- Interquartile Range (IQR) (box)
- Whiskers (min/max excluding outliers)
- Outliers (black dots)
### Score-Specific Data
#### Score 1
- **GPT-4**:
- Median: ~750
- IQR: ~500–800
- Whiskers: 400–1200
- Outliers: 400, 1200
- **Prometheus**:
- Median: ~800
- IQR: ~550–850
- Whiskers: 450–1250
- Outliers: None
#### Score 2
- **GPT-4**:
- Median: ~700
- IQR: ~400–850
- Whiskers: 300–950
- Outliers: 300, 950
- **Prometheus**:
- Median: ~600
- IQR: ~450–700
- Whiskers: 350–900
- Outliers: None
#### Score 3
- **GPT-4**:
- Median: ~800
- IQR: ~550–850
- Whiskers: 300–1100
- Outliers: 300, 1100
- **Prometheus**:
- Median: ~750
- IQR: ~600–800
- Whiskers: 400–1050
- Outliers: None
#### Score 4
- **GPT-4**:
- Median: ~800
- IQR: ~550–850
- Whiskers: 200–1300
- Outliers: 200, 1300
- **Prometheus**:
- Median: ~750
- IQR: ~600–800
- Whiskers: 400–1050
- Outliers: None
#### Score 5
- **GPT-4**:
- Median: ~900
- IQR: ~650–950
- Whiskers: 100–1400
- Outliers: 100, 1400
- **Prometheus**:
- Median: ~900
- IQR: ~650–950
- Whiskers: 400–1050
- Outliers: None
## 5. Outliers
- **Distribution**:
- GPT-4: Outliers present in all scores (1–5).
- Prometheus: Outliers only in Scores 1 and 5.
- **Extreme Values**:
- Minimum: 100 (Score 5, GPT-4)
- Maximum: 1400 (Score 5, GPT-4)
## 6. Trends
- **GPT-4**:
- Median response length increases with score (750 → 900).
- Whisker spread widens at higher scores (e.g., Score 5: 100–1400).
- **Prometheus**:
- Median response length remains stable (~750–900).
- Whisker spread narrows at higher scores (e.g., Score 5: 400–1050).
## 7. Spatial Grounding
- **Legend**: Top-left corner (x=0, y=0 in normalized coordinates).
- **Box Plots**: Centered under each score category (x=1–5).
## 8. Missing Elements
- **Data Table**: Not present in the image.
- **Additional Text**: No embedded text outside the legend and axis labels.
## 9. Validation
- **Color Consistency**: Confirmed legend colors match box plot colors.
- **Trend Verification**:
- GPT-4’s upward trend aligns with increasing medians.
- Prometheus’ stability matches consistent medians across scores.