Image ea83d6e8685e...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Extraction: Design and Scaling of Native MLLMs

## 1. Left Line Chart: "Design Choices of Native MLLMs"
### Axes and Labels
- **X-axis**: Training Data Size (log scale, markers: 10⁷, 10⁸)
- **Y-axis**: Validation Loss (range: 0.8–2.2)
- **Legend**: Located at bottom-left
  - `base` (light green, triangle markers)
  - `w/ MoE` (green, circle markers)
  - `w/ V_d,w*(·)` (dark green, square markers)

### Trends and Data Points
- **All lines show decreasing validation loss** as training data size increases.
  - **Base line**: Starts at ~2.25 (10⁷ data) and decreases to ~1.8 (10⁸ data).
  - **w/ MoE**: Starts at ~2.0 (10⁷ data) and decreases to ~1.0 (10⁸ data).
  - **w/ V_d,w*(·)**: Starts at ~2.1 (10⁷ data) and decreases to ~0.8 (10⁸ data).
- **Spatial grounding**: Legend positioned at bottom-left corner.

## 2. Right Line Chart: "Scaling Properties of Native MLLMs"
### Axes and Labels
- **X-axis**: Training Data Size (log scale, markers: 10⁷, 10⁸)
- **Y-axis**: Validation Loss (range: 0.75–2.25)
- **Legend**: Located at bottom-right
  - `0.5B parameters` (light green, triangle markers)
  - `2B parameters` (green, circle markers)
  - `7B parameters` (dark green, square markers)

### Trends and Data Points
- **All lines show decreasing validation loss** as training data size increases.
  - **0.5B parameters**: Starts at ~2.25 (10⁷ data) and decreases to ~1.0 (10⁸ data).
  - **2B parameters**: Starts at ~2.0 (10⁷ data) and decreases to ~0.75 (10⁸ data).
  - **7B parameters**: Starts at ~1.75 (10⁷ data) and decreases to ~0.75 (10⁸ data).
- **Spatial grounding**: Legend positioned at bottom-right corner.

## 3. Radar Chart: "Scaling Properties of Native MLLMs"
### Axes and Labels
- **Axes**: Performance metrics for datasets (clockwise from top):
  - ChartQA (79.2)
  - MMVet (78.3)
  - MMU (43.6)
  - OCRBench (80.4)
  - MathVista (51.3)
  - CCBench (83.9)
  - DocVQA (88.7)
  - InfoVQA (60.9)
  - TextVQA (76.9)
  - SQA-I (96.2)
  - GQA (62.9)
  - AI2D (74.9)
- **Legend**: Located at bottom-right
  - `Emu3` (orange)
  - `Mono-InternVL` (red)
  - `EVEv2` (blue)
  - `Chameleon-7B` (green)
  - `InternVL-2.5-2B` (purple)
  - `NaViL-2B (Ours)` (pink)

### Trends and Data Points
- **Performance scores** are plotted radially for each dataset.
  - **NaViL-2B (Ours)** (pink) achieves the highest scores in most datasets:
    - DocVQA: 88.7
    - CCBench: 83.9
    - OCRBench: 80.4
  - **Chameleon-7B** (green) performs well in SQA-I (96.2) and MMVet (78.3).
  - **Emu3** (orange) scores moderately across datasets (e.g., ChartQA: 79.2).
- **Spatial grounding**: Legend positioned at bottom-right corner.

## 4. General Observations
- **Trend verification**: All line charts show inverse relationships between training data size and validation loss.
- **Component isolation**: Charts are spatially distinct (left, right, radar), with no overlapping elements.
- **No omitted data**: All axis labels, legends, and dataset names are transcribed.

## 5. Language Declaration
- **Primary language**: English.
- **No other languages detected**.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

ea83d6e8685e46fd625a8bc4

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1