# Technical Document Analysis of Accuracy Chart
## Chart Type
Bar chart comparing accuracy percentages across four categories and multiple models/methods.
## Axes
- **X-axis**: Categories (Movement, Extension, Recolor, Others)
- **Y-axis**: Accuracy on _t_ (%) ranging from 0 to 40%
## Legend
Located on the right side of the chart. Color-coded models/methods:
- **Blue**: GPT-o3-mini RSPC
- **Light Blue**: GPT-o3-mini KAAR
- **Green**: Gemini-2.0 RSPC
- **Light Green**: Gemini-2.0 KAAR
- **Purple**: QwQ-32B RSPC
- **Light Purple**: QwQ-32B KAAR
- **Orange**: DeepSeek-R1-70B RSPC
- **Light Orange**: DeepSeek-R1-70B KAAR
## Categories & Data Points
### Movement (Total: 55)
- **GPT-o3-mini RSPC**: 41.8% (Blue)
- **GPT-o3-mini KAAR**: 20.0% (Light Blue)
- **Gemini-2.0 RSPC**: 18.2% (Green)
- **Gemini-2.0 KAAR**: 10.9% (Light Green)
- **QwQ-32B RSPC**: 12.7% (Purple)
- **QwQ-32B KAAR**: 14.5% (Light Purple)
- **DeepSeek-R1-70B RSPC**: 9.1% (Orange)
### Extension (Total: 129)
- **GPT-o3-mini RSPC**: 38.8% (Blue)
- **GPT-o3-mini KAAR**: 0.8% (Light Blue)
- **Gemini-2.0 RSPC**: 19.4% (Green)
- **Gemini-2.0 KAAR**: 1.6% (Light Green)
- **QwQ-32B RSPC**: 17.8% (Purple)
- **QwQ-32B KAAR**: 2.3% (Light Purple)
- **DeepSeek-R1-70B RSPC**: 7.8% (Orange)
### Recolor (Total: 115)
- **GPT-o3-mini RSPC**: 24.3% (Blue)
- **GPT-o3-mini KAAR**: 6.1% (Light Blue)
- **Gemini-2.0 RSPC**: 13.9% (Green)
- **Gemini-2.0 KAAR**: 10.4% (Light Green)
- **QwQ-32B RSPC**: 7.8% (Purple)
- **QwQ-32B KAAR**: 7.0% (Light Purple)
- **DeepSeek-R1-70B RSPC**: 4.3% (Orange)
### Others (Total: 101)
- **GPT-o3-mini RSPC**: 21.8% (Blue)
- **GPT-o3-mini KAAR**: 5.0% (Light Blue)
- **Gemini-2.0 RSPC**: 14.9% (Green)
- **Gemini-2.0 KAAR**: 11.9% (Light Green)
- **QwQ-32B RSPC**: 7.9% (Purple)
- **QwQ-32B KAAR**: 5.0% (Light Purple)
- **DeepSeek-R1-70B RSPC**: 9.9% (Orange)
## Key Trends
1. **Dominance of GPT-o3-mini RSPC**:
- Highest accuracy in all categories (Movement: 41.8%, Extension: 38.8%, Recolor: 24.3%, Others: 21.8%).
- Consistently outperforms other models/methods by margins of 10-30% in most cases.
2. **KAAR Method Performance**:
- Generally lower accuracy than RSPC across all models.
- Notable exceptions: QwQ-32B KAAR (14.5% in Movement) and DeepSeek-R1-70B KAAR (9.9% in Others).
3. **Model-Specific Patterns**:
- **Gemini-2.0**: Strongest in Movement (18.2% RSPC) and Recolor (13.9% RSPC).
- **QwQ-32B**: Highest KAAR performance in Movement (14.5%) and Recolor (7.0%).
- **DeepSeek-R1-70B**: Best KAAR result in Others (9.9%).
4. **Segmentation Observations**:
- RSPC methods dominate the top segments of each bar.
- KAAR methods occupy lower segments, with minimal overlap in top-tier performance.
## Spatial Grounding
- Legend positioned on the **right** of the chart.
- Color consistency verified: All segments match legend labels (e.g., GPT-o3-mini RSPC = Blue).
## Data Table Reconstruction
| Category | Model/Method | Accuracy (%) |
|--------------|----------------------------|--------------|
| Movement | GPT-o3-mini RSPC | 41.8 |
| Movement | GPT-o3-mini KAAR | 20.0 |
| Movement | Gemini-2.0 RSPC | 18.2 |
| Movement | Gemini-2.0 KAAR | 10.9 |
| Movement | QwQ-32B RSPC | 12.7 |
| Movement | QwQ-32B KAAR | 14.5 |
| Movement | DeepSeek-R1-70B RSPC | 9.1 |
| Extension | GPT-o3-mini RSPC | 38.8 |
| Extension | GPT-o3-mini KAAR | 0.8 |
| Extension | Gemini-2.0 RSPC | 19.4 |
| Extension | Gemini-2.0 KAAR | 1.6 |
| Extension | QwQ-32B RSPC | 17.8 |
| Extension | QwQ-32B KAAR | 2.3 |
| Extension | DeepSeek-R1-70B RSPC | 7.8 |
| Recolor | GPT-o3-mini RSPC | 24.3 |
| Recolor | GPT-o3-mini KAAR | 6.1 |
| Recolor | Gemini-2.0 RSPC | 13.9 |
| Recolor | Gemini-2.0 KAAR | 10.4 |
| Recolor | QwQ-32B RSPC | 7.8 |
| Recolor | QwQ-32B KAAR | 7.0 |
| Recolor | DeepSeek-R1-70B RSPC | 4.3 |
| Others | GPT-o3-mini RSPC | 21.8 |
| Others | GPT-o3-mini KAAR | 5.0 |
| Others | Gemini-2.0 RSPC | 14.9 |
| Others | Gemini-2.0 KAAR | 11.9 |
| Others | QwQ-32B RSPC | 7.9 |
| Others | QwQ-32B KAAR | 5.0 |
| Others | DeepSeek-R1-70B RSPC | 9.9 |
## Notes
- All percentages are visually labeled on top of respective bar segments.
- Totals under each category (e.g., Movement: 55) likely represent the number of data points evaluated, not summed percentages.
- No textual information in non-English languages detected.