Image 6e14289765da...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview
INTEL_VERIFIED
# Technical Data Extraction: AI Safety Performance Benchmarks

This document provides a comprehensive extraction of data from a series of horizontal bar charts comparing various AI models across three primary safety categories: **(a) Language Safety**, **(b) Vision-Language Safety**, and **(c) Image Generation Safety**.

---

## General Chart Structure
*   **X-Axis:** "Safe Score (%)" ranging from 0 to 100.
*   **Y-Axis:** "Rank" ranging from 1 (highest score) to 4 (lowest score).
*   **Color Coding:**
    *   **Dark Blue:** Top-performing model (typically GPT-5.2).
    *   **Light Blue/Teal:** Mid-tier models (Gemini 3 Pro, Qwen3-VL, Nano Banana Pro).
    *   **Red/Pink:** Lower-performing models (Grok 4.1 Fast, Seedream 4.5).

---

## (a) Language Safety
This section contains four sub-charts evaluating language-based safety.

### 1. Benchmark Evaluation
*   **Trend:** GPT-5.2 leads, followed closely by Gemini and Qwen, with Grok trailing significantly.
*   **Rank 1:** GPT-5.2 | Score: 91.59
*   **Rank 2:** Gemini 3 Pro | Score: 88.06
*   **Rank 3:** Qwen3-VL | Score: 80.19
*   **Rank 4:** Grok 4.1 Fast | Score: 66.60

### 2. Adversarial Evaluation
*   **Trend:** Significant performance drop across all models compared to standard benchmarks. Grok moves to Rank 2 despite a lower score than its benchmark performance.
*   **Rank 1:** GPT-5.2 | Score: 54.26
*   **Rank 2:** Grok 4.1 Fast | Score: 46.39
*   **Rank 3:** Gemini 3 Pro | Score: 41.17
*   **Rank 4:** Qwen3-VL | Score: 33.42

### 3. Multilingual Safety
*   **Trend:** GPT-5.2 maintains a lead; Gemini and Qwen are nearly tied for 2nd and 3rd.
*   **Rank 1:** GPT-5.2 | Score: 77.50
*   **Rank 2:** Gemini 3 Pro | Score: 67.00
*   **Rank 3:** Qwen3-VL | Score: 64.00
*   **Rank 4:** Grok 4.1 Fast | Score: 61.75

### 4. Regulatory Compliance
*   **Trend:** High performance for GPT-5.2; Qwen and Gemini show strong compliance, while Grok lags.
*   **Rank 1:** GPT-5.2 | Score: 90.22
*   **Rank 2:** Qwen3-VL | Score: 77.11
*   **Rank 3:** Gemini 3 Pro | Score: 73.54
*   **Rank 4:** Grok 4.1 Fast | Score: 45.97

---

## (b) Vision-Language Safety
This section evaluates models on multimodal (image + text) safety.

### 1. Benchmark Evaluation
*   **Trend:** GPT-5.2 dominates with a score above 90.
*   **Rank 1:** GPT-5.2 | Score: 92.14
*   **Rank 2:** Qwen3-VL | Score: 83.32
*   **Rank 3:** Gemini 3 Pro | Score: 82.53
*   **Rank 4:** Grok 4.1 Fast | Score: 67.97

### 2. Adversarial Evaluation
*   **Trend:** GPT-5.2 shows exceptional resilience in adversarial vision tasks (97.24).
*   **Rank 1:** GPT-5.2 | Score: 97.24
*   **Rank 2:** Qwen3-VL | Score: 78.89
*   **Rank 3:** Gemini 3 Pro | Score: 75.44
*   **Rank 4:** Grok 4.1 Fast | Score: 68.34

---

## (c) Image Generation Safety
This section compares two specific models: **Nano Banana Pro** (Blue) and **Seedream 4.5** (Pink).

### 1. Benchmark Evaluation
*   **Rank 1:** Nano Banana Pro | Score: 60.00
*   **Rank 2:** Seedream 4.5 | Score: 47.94

### 2. Adversarial Evaluation
*   **Trend:** Significant failure for Seedream 4.5, dropping below 20%.
*   **Rank 1:** Nano Banana Pro | Score: 54.00
*   **Rank 2:** Seedream 4.5 | Score: 19.67

### 3. Regulatory Compliance
*   **Rank 1:** Nano Banana Pro | Score: 65.59
*   **Rank 2:** Seedream 4.5 | Score: 57.53

---

## Summary Table of Model Performance (Rank 1 Scores)

| Category | Sub-Category | Top Model | Score (%) |
| :--- | :--- | :--- | :--- |
| Language Safety | Benchmark | GPT-5.2 | 91.59 |
| Language Safety | Adversarial | GPT-5.2 | 54.26 |
| Language Safety | Multilingual | GPT-5.2 | 77.50 |
| Language Safety | Regulatory | GPT-5.2 | 90.22 |
| Vision-Language | Benchmark | GPT-5.2 | 92.14 |
| Vision-Language | Adversarial | GPT-5.2 | 97.24 |
| Image Generation | Benchmark | Nano Banana Pro | 60.00 |
| Image Generation | Adversarial | Nano Banana Pro | 54.00 |
| Image Generation | Regulatory | Nano Banana Pro | 65.59 |
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

6e14289765da8dcadf9dbbfc

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1