# Technical Document Extraction: Turn Accuracy Analysis
## Image Description
The image is a **line graph** titled **"Turn Accuracy"**, comparing the performance of two models across varying task lengths. The graph includes two data series, axis labels, a legend, and numerical markers.
---
## Key Components
### 1. **Axes and Labels**
- **X-Axis**:
- **Title**: `Task Length`
- **Range**: `0` to `1000`
- **Markers**: `0`, `200`, `400`, `600`, `800`, `1000`
- **Y-Axis**:
- **Title**: `Turn Accuracy`
- **Range**: `0.00` to `1.00`
- **Markers**: `0.00`, `0.25`, `0.50`, `0.75`, `1.00`
### 2. **Legend**
- **Placement**: Bottom of the graph
- **Labels**:
- `Gemma3-27b` (Red line)
- `Qwen3-32b` (Blue line)
### 3. **Data Series**
#### **Gemma3-27b (Red Line)**
- **Trend**:
- Starts at approximately `0.95` turn accuracy at `Task Length = 0`.
- Declines steadily, reaching `~0.10` at `Task Length = 1000`.
- Data points exhibit increasing variability (error bars) as task length increases.
- **Key Data Points**:
- `Task Length = 0`: `0.95`
- `Task Length = 200`: `~0.75`
- `Task Length = 400`: `~0.50`
- `Task Length = 600`: `~0.35`
- `Task Length = 800`: `~0.25`
- `Task Length = 1000`: `~0.10`
#### **Qwen3-32b (Blue Line)**
- **Trend**:
- Starts at approximately `0.90` turn accuracy at `Task Length = 0`.
- Remains relatively flat, ending at `~0.75` at `Task Length = 1000`.
- Data points show minimal variability across task lengths.
- **Key Data Points**:
- `Task Length = 0`: `0.90`
- `Task Length = 200`: `~0.85`
- `Task Length = 400`: `~0.80`
- `Task Length = 600`: `~0.78`
- `Task Length = 800`: `~0.75`
- `Task Length = 1000`: `~0.75`
---
## Observations
1. **Model Performance**:
- `Gemma3-27b` experiences a significant drop in turn accuracy as task length increases, suggesting reduced robustness for longer tasks.
- `Qwen3-32b` maintains higher and more stable accuracy across all task lengths, indicating better scalability.
2. **Error Bars**:
- `Gemma3-27b` error bars grow larger at longer task lengths, reflecting higher variance in performance.
- `Qwen3-32b` error bars remain consistent, indicating stable performance.
3. **Legend Accuracy**:
- Red line corresponds to `Gemma3-27b` (confirmed via legend).
- Blue line corresponds to `Qwen3-32b` (confirmed via legend).
---
## Conclusion
The graph highlights a clear divergence in performance between the two models. `Qwen3-32b` outperforms `Gemma3-27b` in maintaining turn accuracy for longer tasks, with less variability. This suggests `Qwen3-32b` may be more suitable for applications requiring consistent performance across varying task complexities.