# Technical Document Extraction: Scatter Plots for Qwen3 32B and Gemma3 12B
## Image Description
The image contains two scatter plots comparing **Final Task Accuracy** against **Average Output Tokens** for two language models: **Qwen3 32B** (left) and **Gemma3 12B** (right). Each plot uses color-coded data points to represent different experimental configurations, with numerical labels indicating specific metrics (e.g., steps, accuracy values).
---
## Key Components
### 1. **Axis Labels and Titles**
- **Left Plot (Qwen3 32B):**
- **X-axis**: "Average Output Tokens" (range: 0 to 60,000)
- **Y-axis**: "Final Task Accuracy after 180 steps" (range: 0.0 to 1.0)
- **Title**: "Qwen3 32B"
- **Right Plot (Gemma3 12B):**
- **X-axis**: "Average Output Tokens" (range: 0 to 25,000)
- **Y-axis**: "Final Task Accuracy after 120 steps" (range: 0.0 to 1.0)
- **Title**: "Gemma3 12B"
---
### 2. **Legend and Data Points**
#### **Qwen3 32B (Left Plot)**
- **Legend**: Located on the left side of the plot. Colors correspond to numerical labels (steps or accuracy values):
- **Purple**: 180 steps
- **Blue**: 120 steps
- **Green**: 60 steps
- **Yellow**: 30 steps
- **Teal**: 20 steps
- **Dark Blue**: 10 steps
- **Data Points**:
- **Purple (180 steps)**:
- [10,000 tokens, 0.55 accuracy]
- [12,000 tokens, 0.60 accuracy]
- [15,000 tokens, 0.45 accuracy]
- **Blue (120 steps)**:
- [15,000 tokens, 0.40 accuracy]
- [18,000 tokens, 0.35 accuracy]
- **Green (60 steps)**:
- [20,000 tokens, 0.45 accuracy]
- [25,000 tokens, 0.30 accuracy]
- **Yellow (30 steps)**:
- [30,000 tokens, 0.20 accuracy]
- [40,000 tokens, 0.10 accuracy]
- **Teal (20 steps)**:
- [20,000 tokens, 0.35 accuracy]
- [25,000 tokens, 0.12 accuracy]
- **Dark Blue (10 steps)**:
- [30,000 tokens, 0.20 accuracy]
#### **Gemma3 12B (Right Plot)**
- **Legend**: Located on the right side of the plot. Colors correspond to numerical labels (steps or accuracy values):
- **Yellow**: 1 step
- **Green**: 2 steps
- **Blue**: 3 steps
- **Purple**: 4 steps
- **Dark Blue**: 5 steps
- **Teal**: 6 steps
- **Light Blue**: 8 steps
- **Dark Green**: 10 steps
- **Light Yellow**: 12 steps
- **Dark Purple**: 15 steps
- **Dark Teal**: 20 steps
- **Dark Green**: 24 steps
- **Dark Blue**: 30 steps
- **Purple**: 40 steps
- **Yellow**: 120 steps
- **Data Points**:
- **Yellow (1 step)**:
- [14,000 tokens, 1.0 accuracy]
- **Green (2 steps)**:
- [16,000 tokens, 0.90 accuracy]
- **Blue (3 steps)**:
- [18,000 tokens, 0.85 accuracy]
- **Purple (4 steps)**:
- [15,000 tokens, 0.60 accuracy]
- **Dark Blue (5 steps)**:
- [20,000 tokens, 0.55 accuracy]
- **Teal (6 steps)**:
- [22,000 tokens, 0.40 accuracy]
- **Light Blue (8 steps)**:
- [24,000 tokens, 0.25 accuracy]
- **Dark Green (10 steps)**:
- [25,000 tokens, 0.15 accuracy]
- **Light Yellow (12 steps)**:
- [14,000 tokens, 0.20 accuracy]
- **Dark Purple (15 steps)**:
- [16,000 tokens, 0.30 accuracy]
- **Dark Teal (20 steps)**:
- [18,000 tokens, 0.60 accuracy]
- **Dark Green (24 steps)**:
- [20,000 tokens, 0.50 accuracy]
- **Dark Blue (30 steps)**:
- [22,000 tokens, 0.40 accuracy]
- **Purple (40 steps)**:
- [24,000 tokens, 0.30 accuracy]
- **Yellow (120 steps)**:
- [14,000 tokens, 0.20 accuracy]
---
## Trends and Observations
### **Qwen3 32B**
- **Accuracy vs. Tokens**:
- Higher token counts (e.g., 30k–60k) generally correlate with lower accuracy, especially for configurations with fewer steps (e.g., 10–30 steps).
- Configurations with more steps (e.g., 180 steps) show moderate accuracy (0.45–0.60) at lower token counts (10k–15k).
- **Notable Outlier**: A purple point at 15k tokens with 0.60 accuracy (180 steps) stands out as a high-performing configuration.
### **Gemma3 12B**
- **Accuracy vs. Tokens**:
- Accuracy decreases as token counts increase, particularly for configurations with fewer steps (e.g., 1–10 steps).
- Configurations with more steps (e.g., 20–40 steps) maintain higher accuracy (0.40–0.60) across token ranges.
- **Notable Outlier**: A yellow point at 14k tokens with 1.0 accuracy (1 step) indicates perfect performance for a minimal configuration.
---
## Spatial Grounding and Color Matching
- **Qwen3 32B Legend**:
- Position: Left side of the plot.
- Color-to-label mapping confirmed for all data points (e.g., purple = 180 steps).
- **Gemma3 12B Legend**:
- Position: Right side of the plot.
- Color-to-label mapping confirmed for all data points (e.g., yellow = 1 step).
---
## Final Notes
- No non-English text or additional languages are present.
- All data points and labels are explicitly transcribed, with trends verified against visual patterns.
- The plots emphasize the trade-off between **output token count** and **task accuracy**, with step count acting as a critical variable.