# Technical Document Extraction: Few-shot - Core Generalization - GPT-4o
## 1. Labels and Axis Titles
- **Title**: "Few-shot - Core Generalization - GPT-4o"
- **X-axis**: "Length" (values: 0 to 19)
- **Y-axis**: "Type" (values: 1 to 7)
- **Colorbar**: "Accuracy (%)" (range: 0% to 100%)
## 2. Data Table Structure
The heatmap represents accuracy percentages for different combinations of **Type** (rows) and **Length** (columns). Below is the reconstructed table:
| Type \ Length | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
|---------------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|
| 1 | 64.0 | 33.0 | 27.0 | 21.0 | 22.0 | 23.0 | 16.0 | 28.0 | 27.0 | 30.0 | | | | | | | | | | |
| 2 | | 73.0 | 89.0 | 91.0 | 86.0 | 84.0 | 81.0 | 78.0 | 74.0 | 63.0 | 66.0 | | | | | | | | | |
| 3 | 42.0 | 53.0 | 46.0 | 44.0 | 35.0 | 18.0 | 16.0 | 25.0 | 20.0 | 18.0 | 13.0 | 17.0 | 17.0 | 18.0 | 13.0 | 17.0 | 11.0 | 14.0 | 10.0 | 11.0 |
| 4 | 68.0 | 67.0 | | 64.0 | 47.0 | 45.0 | 29.0 | 30.0 | 37.0 | 40.0 | 41.0 | 35.0 | | | | | | | | |
| 5 | | | | | | | | 11.0 | 25.0 | 21.0 | 18.0 | 17.0 | 25.0 | 20.0 | 25.0 | 15.0 | 24.0 | 26.0 | 20.0 | 27.0 |
| 6 | 89.0 | 75.0 | 66.0 | 54.0 | 51.0 | 48.0 | 44.0 | 49.0 | 42.0 | 52.0 | 46.0 | 51.0 | 40.0 | 44.0 | 32.0 | 37.0 | 38.0 | 32.0 | 39.0 | |
| 7 | 91.0 | 76.0 | 63.0 | 53.0 | 41.0 | 36.0 | 34.0 | 33.0 | 39.0 | 26.0 | 33.0 | 34.0 | 32.0 | 26.0 | | | | | | |
## 3. Key Trends and Observations
- **Type 1**: Accuracy declines sharply with increasing Length (64.0% at Length 0 → 30.0% at Length 9).
- **Type 2**: Peaks at Length 2 (89.0%) and declines steadily after Length 7 (66.0% at Length 10).
- **Type 3**: Highest accuracy at Length 0 (42.0%), with significant drops at Lengths 5–19 (11.0% at Length 19).
- **Type 4**: Moderate accuracy across Lengths 0–9 (40.0% at Length 9), with no data beyond Length 11.
- **Type 5**: Low accuracy overall (11.0–27.0%), with no data for Lengths 0–4.
- **Type 6**: High accuracy at Length 0 (89.0%), declining to 32.0% at Length 16.
- **Type 7**: Highest accuracy at Length 0 (91.0%), with gradual declines to 26.0% at Length 13.
## 4. Legend and Color Mapping
- **Colorbar**: Located on the right side of the heatmap.
- **Color Gradient**:
- Light blue: Low accuracy (0–20%)
- Dark blue: High accuracy (80–100%)
- **Example**:
- Type 7, Length 0 (91.0%) is dark blue.
- Type 5, Length 19 (27.0%) is light blue.
## 5. Spatial Grounding
- **Legend Position**: Right side of the heatmap.
- **Data Point Verification**:
- Type 2, Length 2 (89.0%) matches dark blue.
- Type 3, Length 19 (11.0%) matches light blue.
## 6. Missing Data
- **Type 5**: No data for Lengths 0–4.
- **Type 4**: No data for Lengths 12–19.
- **Type 7**: No data for Lengths 14–19.
## 7. Summary
The heatmap illustrates how accuracy varies with **Type** and **Length** for GPT-4o's few-shot core generalization. High accuracy is observed for shorter lengths (0–10) across most types, with significant declines for longer lengths (11–19). Type 7 consistently shows the highest accuracy at Length 0 (91.0%), while Type 5 exhibits the lowest performance overall.