# Technical Document Analysis of Image
## Overview
The image contains **four grouped bar charts** comparing performance metrics across different model configurations and plugins. The charts are organized in a 2x2 grid, with two primary categories: **BLEU** (top-left and bottom-left) and **Rouge-L** (top-right and bottom-right). Each chart has three subplots representing different evaluation metrics: **Web NLG**, **CommonGen**, and **Adidas**.
---
## Chart Structure
### Axes and Labels
1. **X-Axis (Categories)**:
- Labels: `No plugin`, `1-layer`, `2-layer`, `4-layer`, `8-layer`, `12-layer`, `GPT2 Small`
- Represents model complexity or plugin configurations.
2. **Y-Axis (Scores)**:
- **Top-left (BLEU) and Top-right (Rouge-L)**:
- Range: `0.00` to `0.50` (increments of `0.10`).
- Labels: `Score` (e.g., "Web NLG Score", "CommonGen Score", "Adidas Score").
- **Bottom-left (BLEU) and Bottom-right (Rouge-L)**:
- Range: `0.00` to `0.25` (increments of `0.05`).
- Labels: `Score` (same metrics as above).
3. **Legends**:
- Located on the **right side** of each chart.
- Colors:
- **Web NLG**: Light purple (`#E6E6FA`).
- **CommonGen**: Medium purple (`#9370DB`).
- **Adidas**: Dark purple (`#8A2BE2`).
---
## Data Trends and Observations
### BLEU Metrics
#### (a) Web NLG Score
- **Trend**: Scores increase monotonically with model complexity.
- `No plugin`: ~0.02
- `1-layer`: ~0.12
- `2-layer`: ~0.10
- `4-layer`: ~0.10
- `8-layer`: ~0.10
- `12-layer`: ~0.08
- **GPT2 Small**: Peaks at ~0.18.
- **Key Insight**: GPT2 Small significantly outperforms other configurations.
#### (b) CommonGen Score
- **Trend**: Scores rise sharply from `No plugin` to `1-layer`, then plateau until `GPT2 Small`.
- `No plugin`: ~0.01
- `1-layer`: ~0.14
- `2-layer`: ~0.12
- `4-layer`: ~0.11
- `8-layer`: ~0.11
- `12-layer`: ~0.09
- **GPT2 Small**: Peaks at ~0.20.
- **Key Insight**: GPT2 Small achieves the highest score, doubling the `1-layer` baseline.
#### (c) Adidas Score
- **Trend**: Scores increase gradually with model complexity.
- `No plugin`: ~0.005
- `1-layer`: ~0.05
- `2-layer`: ~0.04
- `4-layer`: ~0.035
- `8-layer`: ~0.03
- `12-layer`: ~0.025
- **GPT2 Small**: Peaks at ~0.07.
- **Key Insight**: GPT2 Small outperforms all other configurations by ~40%.
### Rouge-L Metrics
#### (a) Web NLG Score
- **Trend**: Scores rise sharply from `No plugin` to `1-layer`, then plateau until `GPT2 Small`.
- `No plugin`: ~0.18
- `1-layer`: ~0.35
- `2-layer`: ~0.32
- `4-layer`: ~0.32
- `8-layer`: ~0.32
- `12-layer`: ~0.30
- **GPT2 Small**: Peaks at ~0.50.
- **Key Insight**: GPT2 Small doubles the `1-layer` score.
#### (b) CommonGen Score
- **Trend**: Scores increase steadily with model complexity.
- `No plugin`: ~0.18
- `1-layer`: ~0.38
- `2-layer`: ~0.35
- `4-layer`: ~0.35
- `8-layer`: ~0.34
- `12-layer`: ~0.33
- **GPT2 Small**: Peaks at ~0.50.
- **Key Insight**: GPT2 Small achieves a 30% improvement over `1-layer`.
#### (c) Adidas Score
- **Trend**: Scores rise gradually with model complexity.
- `No plugin`: ~0.14
- `1-layer`: ~0.17
- `2-layer`: ~0.16
- `4-layer`: ~0.15
- `8-layer`: ~0.15
- `12-layer`: ~0.14
- **GPT2 Small**: Peaks at ~0.25.
- **Key Insight**: GPT2 Small outperforms all other configurations by ~40%.
---
## Spatial Grounding and Color Verification
- **Legend Placement**: Right-aligned in all charts.
- **Color Consistency**:
- **Web NLG**: Light purple (`#E6E6FA`) matches all instances.
- **CommonGen**: Medium purple (`#9370DB`) matches all instances.
- **Adidas**: Dark purple (`#8A2BE2`) matches all instances.
---
## Conclusion
The charts demonstrate that **GPT2 Small** consistently outperforms other configurations across all metrics (Web NLG, CommonGen, Adidas) for both BLEU and Rouge-L evaluations. Scores generally increase with model complexity, with the largest gains observed in the `GPT2 Small` configuration. No non-English text is present in the image.