Image 6eb241572b69...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Extraction: Model Performance Benchmark Analysis

## Chart Structure Overview
- **Chart Type**: Grouped Bar Chart
- **X-Axis**: Benchmark Categories (12 total)
- **Y-Axis**: Performance Scores (0-100 scale)
- **Legend**: Model Identifiers with Color Codes (6 models)
- **Legend Position**: Bottom of chart

## Legend Details
| Color Code       | Model Identifier               |
|------------------|--------------------------------|
| Lightest Gray    | Qwen3-4B-2507                  |
| Medium Gray      | Qwen3-8B                       |
| Dark Gray        | Qwen3-14B                      |
| Very Dark Gray   | Qwen3-32B                      |
| Black            | Qwen3-30B-A3B-2507             |
| Teal             | Nanbeige4.1-3B                 |

## Benchmark Categories (X-Axis)
1. Code
   - Live-Code-Bench-V6
   - Live-Code-Bench-Pro-Easy
2. Math
   - AIME 2026
   - IMO-Answer-Bench
3. Science
   - GPQA
   - HLE-text-only (w/o tool)
4. Alignment
   - Arena-Hard-v2
   - Multi-Challenge
5. Tool Use
   - BFCL-V4
   - Tau-Bench
6. Deep Search
   - xbench-DeepSearch-2510
   - GAIA (Text-only)

## Data Extraction Protocol
1. **Spatial Grounding**: All bars ordered left-to-right per legend sequence
2. **Color Verification**: Cross-referenced legend colors with bar colors
3. **Trend Analysis**: 
   - Most benchmarks show increasing performance from left-to-right
   - Exceptions noted where scores decrease between models

## Full Data Table
| Benchmark Category          | Qwen3-4B-2507 | Qwen3-8B | Qwen3-14B | Qwen3-32B | Qwen3-30B-A3B-2507 | Nanbeige4.1-3B |
|-----------------------------|---------------|----------|-----------|-----------|---------------------|----------------|
| Live-Code-Bench-V6          | 57.4          | 49.4     | 55.9      | 55.7      | 66.0                | 76.9           |
| Live-Code-Bench-Pro-Easy    | 40.2          | 41.2     | 33.0      | 42.3      | 60.8                | 81.4           |
| AIME 2026                   | 81.5          | 70.4     | 76.5      | 75.8      | 87.3                | 87.4           |
| IMO-Answer-Bench            | 48.0          | 36.6     | 41.8      | 43.9      | 54.3                | 53.4           |
| GPQA                        | 65.8          | 62.0     | 63.4      | 68.4      | 73.4                | 83.8           |
| HLE-text-only (w/o tool)    | 6.7           | 5.3      | 7.0       | 9.3       | 11.8                | 12.6           |
| Arena-Hard-v2               | 34.9          | 26.3     | 36.9      | 56.0      | 60.2                | 73.2           |
| Multi-Challenge             | 41.1          | 36.3     | 37.0      | 38.7      | 49.4                | 52.2           |
| BFCL-V4                     | 44.9          | 42.2     | 45.1      | 47.9      | 48.6                | 56.5           |
| Tau-Bench                   | 44.9          | 42.1     | 45.0      | 45.3      | 47.7                | 48.6           |
| xbench-DeepSearch-2510      | 5.0           | 2.0      | 9.0       | 8.0       | 10.0                | 39.0           |
| GAIA (Text-only)            | 28.3          | 19.5     | 30.2      | 30.2      | 31.6                | 69.9           |

## Key Observations
1. **Performance Trends**:
   - Nanbeige4.1-3B consistently highest performer across most benchmarks
   - Qwen3-30B-A3B-2507 shows strong performance in Code and Math
   - Qwen3-4B-2507 performs best in Deep Search (xbench-DeepSearch-2510)

2. **Notable Outliers**:
   - HLE-text-only (w/o tool): Significant performance gap between models
   - GAIA (Text-only): Nanbeige4.1-3B shows 2.8x improvement over Qwen3-4B-2507

3. **Model Specialization**:
   - Qwen3-32B excels in Science (GPQA) and Alignment (Arena-Hard-v2)
   - Qwen3-30B-A3B-2507 dominates in Code and Math benchmarks

## Language Notes
- All text in English
- No non-English content detected
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

6eb241572b69845aebcccaea

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1