# Technical Document Analysis: BLEU and Rouge-L Performance Comparison
## Chart Structure Overview
The image contains two side-by-side bar charts comparing performance metrics (BLEU and Rouge-L) across different GPT-2-M configurations. Each chart uses a dual-color legend system to distinguish between "Base" and "Plugin" implementations.
---
## BLEU Chart Analysis
### Spatial Components
- **Legend**: Top-right corner (pink = Base, purple = Plugin)
- **X-axis**: GPT-2-M configurations (zeroshot, 1FT, 2FT, 5FT)
- **Y-axis**: Value scale (0.00–0.30)
### Key Data Points
| Configuration | Base Value | Plugin Value |
|-------------------|------------|--------------|
| zeroshot | 0.025 | 0.19 |
| 1FT | 0.22 | 0.30 |
| 2FT | 0.22 | 0.32 |
| 5FT | 0.25 | 0.33 |
### Trend Verification
- **Base**: Gradual upward trend (0.025 → 0.25)
- **Plugin**: Steeper upward trajectory (0.19 → 0.33)
- **Consistent Pattern**: Plugin values exceed Base values by 0.16–0.08 across all configurations
---
## Rouge-L Chart Analysis
### Spatial Components
- **Legend**: Top-right corner (pink = Base, purple = Plugin)
- **X-axis**: GPT-2-M configurations (zeroshot, 1FT, 2FT, 5FT)
- **Y-axis**: Value scale (0.00–0.50)
### Key Data Points
| Configuration | Base Value | Plugin Value |
|-------------------|------------|--------------|
| zeroshot | 0.25 | 0.40 |
| 1FT | 0.39 | 0.48 |
| 2FT | 0.41 | 0.49 |
| 5FT | 0.44 | 0.50 |
### Trend Verification
- **Base**: Steady increase (0.25 → 0.44)
- **Plugin**: Rapid initial growth (0.40 → 0.48), plateauing at 0.50
- **Consistent Pattern**: Plugin maintains 0.15–0.06 advantage over Base
---
## Cross-Chart Observations
1. **Performance Correlation**: Both metrics show similar improvement patterns with increased fine-tuning
2. **Plugin Dominance**: Plugin consistently outperforms Base by 60–70% across all configurations
3. **Divergence Point**: Largest performance gap occurs in BLEU zeroshot (0.165 difference)
4. **Saturation Effect**: Rouge-L Plugin values approach maximum scale (0.50) in 5FT configuration
---
## Technical Validation
- **Legend Accuracy**: All bar colors match legend specifications (pink = Base, purple = Plugin)
- **Axis Consistency**: X-axis categories identical across both charts
- **Scale Normalization**: Y-axis ranges appropriately scaled for metric-specific value distributions
This structured analysis provides complete extraction of all textual elements, spatial relationships, and quantitative data from the image. The dual-chart comparison reveals consistent performance advantages for the Plugin implementation across both evaluation metrics, with particularly notable improvements in Rouge-L scores during extended fine-tuning.