Image 6e14289765da...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Extraction: Safety Evaluation Charts

## Spatial Layout
- **Legend Position**: Bottom right corner ([x, y] = [bottom-right])
- **Color Legend**:
  - Blue: GPT-5.2
  - Light Blue: Gemini 3 Pro
  - Red: Owen3-VL
  - Dark Red: Grok 4.1 Fast
  - Light Blue (Secondary): Nano Banana Pro
  - Dark Blue: Seedream 4.5

---

## Section A: Language Safety

### Chart Structure
- **X-Axis**: Safe Score (%) [0-100]
- **Y-Axis**: Rank [1-4] (Top to Bottom)
- **Categories**:
  1. Benchmark Evaluation
  2. Adversarial Evaluation
  3. Multilingual Safety
  4. Regulatory Compliance

### Data Trends & Values
1. **Benchmark Evaluation**
   - GPT-5.2: 91.59% (Blue)
   - Gemini 3 Pro: 88.06% (Light Blue)
   - Owen3-VL: 80.19% (Red)
   - Grok 4.1 Fast: 66.60% (Dark Red)
   - *Trend*: Scores decrease from Rank 1 to 4

2. **Adversarial Evaluation**
   - GPT-5.2: 54.26% (Blue)
   - Grok 4.1 Fast: 46.39% (Dark Red)
   - Gemini 3 Pro: 41.17% (Light Blue)
   - Owen3-VL: 33.42% (Red)
   - *Trend*: Scores decrease from Rank 1 to 4

3. **Multilingual Safety**
   - GPT-5.2: 77.50% (Blue)
   - Gemini 3 Pro: 70.00% (Light Blue)
   - Owen3-VL: 64.00% (Red)
   - Grok 4.1 Fast: 61.75% (Dark Red)
   - *Trend*: Scores decrease from Rank 1 to 4

4. **Regulatory Compliance**
   - GPT-5.2: 90.22% (Blue)
   - Gemini 3 Pro: 77.11% (Light Blue)
   - Owen3-VL: 77.54% (Red)
   - Grok 4.1 Fast: 45.97% (Dark Red)
   - *Trend*: Scores decrease from Rank 1 to 4

---

## Section B: Vision-Language Safety

### Chart Structure
- **X-Axis**: Safe Score (%) [0-100]
- **Y-Axis**: Rank [1-4] (Top to Bottom)
- **Categories**:
  1. Benchmark Evaluation
  2. Adversarial Evaluation

### Data Trends & Values
1. **Benchmark Evaluation**
   - GPT-5.2: 92.14% (Blue)
   - Owen3-VL: 83.32% (Red)
   - Gemini 3 Pro: 82.53% (Light Blue)
   - Grok 4.1 Fast: 67.97% (Dark Red)
   - *Trend*: Scores decrease from Rank 1 to 4

2. **Adversarial Evaluation**
   - GPT-5.2: 97.24% (Blue)
   - Owen3-VL: 78.89% (Red)
   - Gemini 3 Pro: 75.44% (Light Blue)
   - Grok 4.1 Fast: 68.34% (Dark Red)
   - *Trend*: Scores decrease from Rank 1 to 4

---

## Section C: Image Generation Safety

### Chart Structure
- **X-Axis**: Safe Score (%) [0-100]
- **Y-Axis**: Rank [1-2] (Top to Bottom)
- **Categories**:
  1. Benchmark Evaluation
  2. Adversarial Evaluation
  3. Regulatory Compliance

### Data Trends & Values
1. **Benchmark Evaluation**
   - Nano Banana Pro: 60.00% (Light Blue)
   - Seedream 4.5: 47.94% (Dark Blue)
   - *Trend*: Scores decrease from Rank 1 to 2

2. **Adversarial Evaluation**
   - Nano Banana Pro: 54.00% (Light Blue)
   - Seedream 4.5: 19.67% (Dark Blue)
   - *Trend*: Scores decrease from Rank 1 to 2

3. **Regulatory Compliance**
   - Nano Banana Pro: 65.59% (Light Blue)
   - Seedream 4.5: 57.53% (Dark Blue)
   - *Trend*: Scores decrease from Rank 1 to 2

---

## Cross-Validation Summary
- **Color Consistency**: All legend colors match bar colors across all charts
- **Rank Correlation**: Higher ranks (1-2) consistently show higher safe scores
- **Model Performance**:
  - GPT-5.2 dominates in Language Safety (91.59% avg)
  - Nano Banana Pro leads in Image Generation Safety (60% avg)
  - Grok 4.1 Fast shows lowest scores across all categories (66.60% avg)

## Key Observations
1. **Model Specialization**:
   - GPT-5.2 excels in language-based safety metrics
   - Nano Banana Pro performs best in image generation safety
   - Grok 4.1 Fast shows consistent underperformance

2. **Adversarial Vulnerability**:
   - All models show significant score drops in adversarial evaluations
   - Seedream 4.5 has extreme vulnerability (19.67% in image generation)

3. **Regulatory Compliance**:
   - GPT-5.2 maintains highest compliance (90.22%)
   - Grok 4.1 Fast fails regulatory metrics (45.97%)

## Data Table Reconstruction
### Language Safety (Section A)
| Rank | Model          | Benchmark | Adversarial | Multilingual | Regulatory |
|------|----------------|-----------|-------------|--------------|------------|
| 1    | GPT-5.2        | 91.59%    | 54.26%      | 77.50%       | 90.22%     |
| 2    | Gemini 3 Pro   | 88.06%    | 46.39%      | 70.00%       | 77.11%     |
| 3    | Owen3-VL       | 80.19%    | 41.17%      | 64.00%       | 77.54%     |
| 4    | Grok 4.1 Fast  | 66.60%    | 33.42%      | 61.75%       | 45.97%     |

### Vision-Language Safety (Section B)
| Rank | Model          | Benchmark | Adversarial |
|------|----------------|-----------|-------------|
| 1    | GPT-5.2        | 92.14%    | 97.24%      |
| 2    | Owen3-VL       | 83.32%    | 78.89%      |
| 3    | Gemini 3 Pro   | 82.53%    | 75.44%      |
| 4    | Grok 4.1 Fast  | 67.97%    | 68.34%      |

### Image Generation Safety (Section C)
| Rank | Model          | Benchmark | Adversarial | Regulatory |
|------|----------------|-----------|-------------|------------|
| 1    | Nano Banana Pro| 60.00%    | 54.00%      | 65.59%     |
| 2    | Seedream 4.5   | 47.94%    | 19.67%      | 57.53%     |
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

6e14289765da8dcadf9dbbfc

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1