# Technical Document Extraction: AI Model Safety Performance Chart
## 1. Component Isolation
* **Header/Legend Region:** Located at the top left of the chart area. Contains four model identifiers with corresponding color/texture swatches.
* **Main Chart Region:** A grouped bar chart showing "Safe Rate (%)" across four distinct benchmarks.
* **Axes:**
* **Y-Axis (Vertical):** Labeled "Safe Rate (%)", ranging from 0 to 100 with increments of 20. Horizontal grid lines are present at each 20-unit interval.
* **X-Axis (Horizontal):** Categorical axis containing four benchmark names.
## 2. Legend and Data Series Identification
The legend is positioned in the upper-left quadrant of the plot area.
| Model Name | Visual Identifier | Color/Texture Description |
| :--- | :--- | :--- |
| **GPT-5.2** | Solid Lightest Blue/White | Very pale, almost white solid fill. |
| **Qwen3-VL** | Solid Light Blue | Light lavender/blue solid fill. |
| **Gemini 3 Pro** | Solid Medium Blue | Medium-tone periwinkle/blue solid fill. |
| **Grok 4.1 Fast** | Striped Blue | Medium blue with white diagonal hatching (sloping down-right). |
## 3. Trend Verification and Data Extraction
The chart compares the safety performance of four AI models across four benchmarks.
**General Trends:**
* **GPT-5.2** consistently maintains a high safe rate (approx. 88-97%) across all benchmarks.
* **Grok 4.1 Fast** consistently shows the lowest safe rate in every category, though it improves significantly on the SIUO benchmark.
* **SIUO** is the benchmark where all models perform at their highest levels (all above 85%).
* **MemeSafetyBench** shows the widest disparity between the top-performing model and the bottom-performing model.
### Data Table (Estimated Values based on Y-Axis Scale)
| Benchmark | GPT-5.2 | Qwen3-VL | Gemini 3 Pro | Grok 4.1 Fast |
| :--- | :---: | :---: | :---: | :---: |
| **MemeSafetyBench** | ~88% | ~81% | ~73% | ~55% |
| **MIS** | ~90% | ~74% | ~80% | ~65% |
| **USB-SafeBench** | ~92% | ~80% | ~82% | ~64% |
| **SIUO** | ~97% | ~98% | ~95% | ~87% |
## 4. Detailed Category Analysis
### MemeSafetyBench
* **GPT-5.2** leads significantly.
* **Qwen3-VL** outperforms Gemini 3 Pro.
* **Grok 4.1 Fast** has its lowest performance here, barely exceeding the 50% mark.
### MIS
* **GPT-5.2** remains the leader.
* **Gemini 3 Pro** outperforms Qwen3-VL in this specific category (the only category where it clearly beats Qwen3-VL).
* **Grok 4.1 Fast** shows a slight improvement over its MemeSafetyBench score.
### USB-SafeBench
* **GPT-5.2** continues to lead, exceeding 90%.
* **Gemini 3 Pro** and **Qwen3-VL** perform similarly, with Gemini having a very slight edge (approx. 82% vs 80%).
* **Grok 4.1 Fast** remains the outlier at approximately 64%.
### SIUO
* This benchmark shows the highest overall safety rates for all models.
* **Qwen3-VL** reaches its peak performance here, appearing to slightly edge out GPT-5.2 for the top spot (approx. 98%).
* **Grok 4.1 Fast** sees its best performance here, climbing to approximately 87%.
## 5. Summary of Findings
The data indicates that **GPT-5.2** is the most consistently "safe" model across the tested benchmarks, typically staying above the 90% threshold. **Qwen3-VL** and **Gemini 3 Pro** compete for the second position, with Qwen3-VL showing a notable strength in the SIUO benchmark. **Grok 4.1 Fast** lags behind the other three models in safety rate across all tested scenarios, particularly in MemeSafetyBench.