# Technical Document Analysis of Image
## Section (a): Line Graphs Comparing Model Performance Under Error Rates
### Key Components:
1. **Graph Titles**:
- All graphs labeled "Turn Accuracy vs Task Length"
- Subtitles indicate specific models: Qwen-3-14B, Qwen-3-32B, Gemma-3-12B, Gemma-3-27B
2. **Axes**:
- **X-axis**: Task Length (0–100)
- **Y-axis**: Turn Accuracy (0–1.0)
3. **Legend**:
- Located at bottom-left corner
- Colors correspond to:
- **Blue**: Original Run
- **Red**: 100% Error Rate
- **Orange**: 75% Error Rate
- **Yellow**: 50% Error Rate
- **Green**: 25% Error Rate
- **Dark Green**: 0% Error Rate
4. **Trends**:
- All models show **decreasing Turn Accuracy** as Task Length increases
- **Error Rate Impact**:
- Higher error rates (e.g., 100% red) consistently show **lower accuracy** than lower error rates
- Example: Qwen-3-14B (blue) starts at ~0.8 accuracy at Task Length 0, drops to ~0.6 at Task Length 100 with 100% error
- Gemma-3-12B (orange) starts at ~0.75 accuracy, drops to ~0.5 with 100% error
### Spatial Grounding:
- Legend colors match line colors exactly (e.g., red lines = 100% error rate)
- All graphs share identical axis ranges and formatting
---
## Section (b): Bar Charts Comparing Model Robustness
### Key Components:
1. **Left Chart**:
- **Title**: Turn Accuracy vs Model
- **X-axis**: Models (Qwen-3-32B, Qwen-3-14B, Qwen-3-8B, Qwen-4B, Gemma-12B, Gemma-4B)
- **Y-axis**: Turn Accuracy (0–1.0)
- **Trends**:
- Qwen-3-32B (blue) shows highest baseline accuracy (~0.8)
- Qwen-3-14B (light blue) follows (~0.75)
- Gemma-4B (pink) has lowest baseline (~0.6)
2. **Right Chart**:
- **Title**: Turn Accuracy vs Induced Error Rate
- **X-axis**: Induced Error Rate (0.00–1.00 in 0.25 increments)
- **Y-axis**: Turn Accuracy (0–1.0)
- **Models**:
- **DeepSeek-chat-3-0524** (purple)
- **Kimi K2** (orange)
- **Qwen-3-2507-235B** (brown)
- **Trends**:
- All models show **linear decline** in accuracy as error rate increases
- Example: DeepSeek starts at ~0.95 accuracy at 0% error, drops to ~0.35 at 100% error
- Kimi K2 shows steepest decline (~0.9 to ~0.25)
### Spatial Grounding:
- Bar colors match legend labels exactly
- Error rate increments (0.00, 0.25, 0.50, 0.75, 1.00) are evenly spaced on X-axis
---
## Cross-Section Analysis
1. **Error Rate Sensitivity**:
- All models degrade performance as error rates increase
- Qwen-3-32B (section a) and DeepSeek (section b) show highest baseline accuracy
- Gemma-4B (section a) and Kimi K2 (section b) exhibit steepest declines
2. **Model Robustness**:
- Larger models (e.g., Qwen-3-32B) maintain higher accuracy than smaller variants (Qwen-3-14B)
- Gemma-3-12B (section a) and Kimi K2 (section b) show moderate robustness
3. **Error Rate Impact**:
- 100% error rate reduces accuracy by ~30–40% across all models
- 0% error rate preserves near-optimal performance (0.8–0.95 range)
---
## Data Table Reconstruction (Section a)
| Model | Task Length | Original Run | 100% Error | 75% Error | 50% Error | 25% Error | 0% Error |
|----------------|-------------|--------------|------------|-----------|-----------|-----------|----------|
| Qwen-3-14B | 0 | 0.8 | 0.6 | 0.7 | 0.75 | 0.78 | 0.82 |
| Qwen-3-14B | 100 | 0.6 | 0.4 | 0.5 | 0.55 | 0.58 | 0.62 |
| Qwen-3-32B | 0 | 0.85 | 0.65 | 0.72 | 0.76 | 0.79 | 0.83 |
| Qwen-3-32B | 100 | 0.62 | 0.45 | 0.51 | 0.54 | 0.57 | 0.61 |
| Gemma-3-12B | 0 | 0.75 | 0.55 | 0.62 | 0.65 | 0.68 | 0.71 |
| Gemma-3-12B | 100 | 0.5 | 0.35 | 0.42 | 0.45 | 0.48 | 0.51 |
| Gemma-3-27B | 0 | 0.78 | 0.58 | 0.64 | 0.67 | 0.70 | 0.73 |
| Gemma-3-27B | 100 | 0.55 | 0.4 | 0.46 | 0.49 | 0.52 | 0.55 |
---
## Notes
- All data points extracted from visual trends; exact values approximated
- No non-English text detected
- All legends and axis labels transcribed verbatim
- Spatial relationships (e.g., legend placement) confirmed for accuracy