Image 05cd108090ab...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Extraction: Model Accuracy Analysis

## Chart 1: Step Accuracy vs. Task Length
### Axes and Labels
- **X-axis**: Task Length (0 to 100)
- **Y-axis**: Step Accuracy (0.0 to 1.0)
- **Title**: Step Accuracy

### Legend
- **Gemma3-4B**: Light orange
- **Gemma3-12B**: Orange
- **Gemma3-27B**: Red
- **Qwen3-4B**: Light blue

### Key Trends
1. **Gemma3-4B** (light orange): 
   - Starts at ~0.8 accuracy, declines steadily to ~0.2 by Task Length 100.
   - Slope: Gradual decline.
2. **Gemma3-12B** (orange): 
   - Starts at ~0.9, declines to ~0.4 by Task Length 100.
   - Slope: Moderate decline.
3. **Gemma3-27B** (red): 
   - Starts at ~0.95, declines to ~0.5 by Task Length 100.
   - Slope: Steeper decline than 12B.
4. **Qwen3-4B** (light blue): 
   - Starts at ~0.7, declines to ~0.1 by Task Length 100.
   - Slope: Sharpest decline among all models.

## Chart 2: Task Accuracy vs. Task Length
### Axes and Labels
- **X-axis**: Task Length (0 to 40)
- **Y-axis**: Task Accuracy (0.0 to 1.0)
- **Title**: Task Accuracy

### Legend
- **Qwen3-8B**: Light blue
- **Qwen3-14B**: Blue
- **Qwen3-32B**: Dark blue
- **Gemma3-4B**: Light orange

### Key Trends
1. **Qwen3-8B** (light blue): 
   - Starts at ~0.95, drops to ~0.3 by Task Length 40.
   - Slope: Steep decline.
2. **Qwen3-14B** (blue): 
   - Starts at ~0.9, drops to ~0.2 by Task Length 40.
   - Slope: Moderate decline.
3. **Qwen3-32B** (dark blue): 
   - Starts at ~0.85, drops to ~0.1 by Task Length 40.
   - Slope: Steepest decline.
4. **Gemma3-4B** (light orange): 
   - Starts at ~0.95, drops to ~0.05 by Task Length 40.
   - Slope: Gradual decline.

## Spatial Grounding
- **Legend Placement**: Bottom of each chart.
- **Color Consistency**: 
  - Left Chart: Light orange (Gemma3-4B) matches light orange lines.
  - Right Chart: Light orange (Gemma3-4B) matches light orange lines.

## Component Isolation
- **Left Chart**: Focuses on step accuracy across longer task lengths (0–100).
- **Right Chart**: Focuses on task accuracy across shorter task lengths (0–40).

## Observations
- **Model Performance**: 
  - Larger models (e.g., Gemma3-27B, Qwen3-32B) maintain higher accuracy longer but decline sharply.
  - Smaller models (e.g., Qwen3-4B, Gemma3-4B) degrade faster but retain some accuracy at longer task lengths.
- **Task Length Impact**: 
  - Accuracy degrades non-linearly as task length increases.
  - Qwen3 models exhibit steeper declines compared to Gemma3 models.

## Data Extraction
### Left Chart Data Points (Approximate)
| Model         | Task Length 0 | Task Length 25 | Task Length 50 | Task Length 75 | Task Length 100 |
|---------------|---------------|----------------|----------------|----------------|-----------------|
| Gemma3-4B     | 0.8           | 0.6            | 0.4            | 0.2            | 0.1             |
| Gemma3-12B    | 0.9           | 0.7            | 0.5            | 0.3            | 0.2             |
| Gemma3-27B    | 0.95          | 0.8            | 0.6            | 0.4            | 0.3             |
| Qwen3-4B      | 0.7           | 0.5            | 0.3            | 0.1            | 0.05            |

### Right Chart Data Points (Approximate)
| Model         | Task Length 0 | Task Length 10 | Task Length 20 | Task Length 30 | Task Length 40 |
|---------------|---------------|----------------|----------------|----------------|----------------|
| Qwen3-8B      | 0.95          | 0.7            | 0.4            | 0.2            | 0.1            |
| Qwen3-14B     | 0.9           | 0.6            | 0.3            | 0.15           | 0.05           |
| Qwen3-32B     | 0.85          | 0.5            | 0.2            | 0.08           | 0.02           |
| Gemma3-4B     | 0.95          | 0.75           | 0.5            | 0.25           | 0.05           |

## Notes
- All values are approximate due to visual interpretation of the chart.
- No non-English text detected.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

05cd108090aba1298d8d67d3

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1