# Technical Data Extraction: Performance Metrics by Model Size
## 1. Image Overview
This image is a grouped bar chart comparing the performance of three different model sizes across eleven distinct technical benchmarks. The chart uses a color-coded system to represent model parameters and includes precise numerical data labels above each bar.
## 2. Chart Components
### Axis Information
* **Y-Axis Title:** Metric value
* **Y-Axis Scale:** 0 to 100 (with major gridlines at 0, 50, and 100). Note: One data point exceeds the 100 mark.
* **X-Axis Labels:** Eleven benchmark categories (listed in the data table below).
### Legend (Spatial Placement: Top Right)
* **Blue Bar:** 670M (670 Million parameters)
* **Orange Bar:** 2B (2 Billion parameters)
* **Green Bar:** 5B (5 Billion parameters)
## 3. Data Extraction Table
| Benchmark Category | 670M (Blue) | 2B (Orange) | 5B (Green) |
| :--- | :---: | :---: | :---: |
| **Screen Annotation** | 48.2 | 61.1 | 81.9 |
| **Ref Exp** | 77.4 | 83.9 | 86.3 |
| **SQA Short** | 70.0 | 84.8 | 94.6 |
| **Complex SQA** | 28.4 | 29.4 | 42.4 |
| **MoTIF** | 83.5 | 86.8 | 87.4 |
| **Screen2Words** | 97.4 | 99.9 | 120.8 |
| **Chart QA** | 54.0 | 55.8 | 76.6 |
| **DocVQA** | 50.7 | 59.3 | 87.5 |
| **Infographics VQA** | 19.6 | 24.0 | 61.4 |
| **OCR VQA** | 54.8 | 62.8 | 76.2 |
## 4. Trend Analysis and Observations
### General Trends
* **Positive Correlation with Scale:** In every single benchmark, the performance follows a strict upward trend: **670M < 2B < 5B**. Increasing the model size consistently results in a higher metric value.
* **Significant Scaling Gains:** The jump from 2B to 5B is particularly pronounced in categories like **Infographics VQA** (more than doubling the score) and **DocVQA**.
### Benchmark Specifics
* **Highest Performance:** **Screen2Words** shows the highest overall values, with the 5B model reaching a peak of **120.8**, the only value to exceed the 100-point grid line.
* **Lowest Performance:** **Infographics VQA** and **Complex SQA** represent the most challenging tasks for these models, with the 670M model scoring as low as **19.6** in Infographics VQA.
* **Smallest Variance:** The **MoTIF** benchmark shows the smallest relative gains between model sizes (83.5 to 87.4), suggesting a possible performance plateau or a task less sensitive to parameter scaling.