# Technical Document Extraction: BlindTest Benchmark with VLMs’ Responses
## Title
**Examples from BlindTest benchmark with VLMs’ responses**
---
## Image Structure
The image is divided into **7 panels (P1–P7)**, each containing:
1. A **visual diagram** (e.g., lines, circles, grids, paths).
2. A **table** with checkmarks (✅) and Xs (❌) indicating model performance.
3. A **question** with specific answer format instructions.
---
## Panel Descriptions
### P1
- **Visual**: Two lines (blue and red) intersecting.
- **Question**: "How many times do the blue and red lines touch each other? Answer with a number in curly brackets, e.g., {5}."
- **Table**:
- GPT-4o: ❌ (1)
- Gemini-1.5: ❌ (1)
- Sonnet-3: ❌ (1)
- Sonnet-3.5: ❌ (1)
### P2
- **Visual**: Two overlapping circles (blue and purple).
- **Question**: "Are the two circles overlapping? Answer with Yes/No."
- **Table**:
- GPT-4o: ❌ (1)
- Gemini-1.5: ✅ (1)
- Sonnet-3: ❌ (1)
- Sonnet-3.5: ❌ (1)
### P3
- **Visual**: Text "Acknowledgement" with a red underline.
- **Question**: "Which character is being highlighted with a red oval? Please provide your answer in curly brackets, e.g., {a}."
- **Table**:
- GPT-4o: ❌ (1)
- Gemini-1.5: ❌ (1)
- Sonnet-3: ❌ (1)
- Sonnet-3.5: ❌ (1)
### P4
- **Visual**: Olympic rings (5 overlapping circles).
- **Question**: "How many circles are in the image? Answer with only the number in numerical format."
- **Table**:
- GPT-4o: ✅ (6)
- Gemini-1.5: ❌ (5)
- Sonnet-3: ❌ (5)
- Sonnet-3.5: ❌ (5)
### P5
- **Visual**: Nested squares (3x3 grid).
- **Question**: "How many squares are in the image? Please answer with a number in curly brackets e.g., {10}."
- **Table**:
- GPT-4o: ❌ (5)
- Gemini-1.5: ✅ (3)
- Sonnet-3: ❌ (4)
- Sonnet-3.5: ❌ (4)
### P6
- **Visual**: 3x4 grid with labels (e.g., "apple", "book", "car", "door").
- **Question**: "Count the number of rows and columns and answer with numbers in curly brackets. For example, rows={5} columns={6}."
- **Table**:
- GPT-4o: ✅ (3x4)
- Gemini-1.5: ✅ (3x4)
- Sonnet-3: ❌ (4x4)
- Sonnet-3.5: ❌ (4x4)
### P7
- **Visual**: Path diagram with labels A, B, C, D.
- **Question**: "How many single-color paths go from A to D? Answer with a number in curly brackets e.g. {3}."
- **Table**:
- GPT-4o: ✅ (1)
- Gemini-1.5: ❌ (2)
- Sonnet-3: ❌ (2)
- Sonnet-3.5: ❌ (2)
---
## Model Legend
- **GPT-4o**: Green square icon.
- **Gemini-1.5**: Blue diamond icon.
- **Sonnet-3**: Brown square icon.
- **Sonnet-3.5**: Red square icon.
---
## Data Table Structure
| Model | P1 | P2 | P3 | P4 | P5 | P6 | P7 |
|-------------|----|----|----|----|----|----------|----|
| GPT-4o | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ (3x4) | ✅ |
| Gemini-1.5 | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ (3x4) | ❌ |
| Sonnet-3 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ (4x4) | ❌ |
| Sonnet-3.5 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ (4x4) | ❌ |
---
## Key Trends
1. **GPT-4o** performed best in **P4** (6 circles) and **P6** (3x4 grid).
2. **Gemini-1.5** excelled in **P2** (overlapping circles) and **P6** (3x4 grid).
3. **Sonnet-3** and **Sonnet-3.5** struggled with **P6** (incorrectly reported 4x4 grid).
4. All models failed **P3** (highlighted character question).
---
## Questions and Answer Formats
1. **P1**: Answer with a number in curly brackets (e.g., {5}).
2. **P2**: Answer with "Yes/No".
3. **P3**: Answer with a character in curly brackets (e.g., {a}).
4. **P4**: Answer with a numerical value (e.g., 6).
5. **P5**: Answer with a number in curly brackets (e.g., {10}).
6. **P6**: Answer with rows={X} columns={Y} (e.g., rows={5} columns={6}).
7. **P7**: Answer with a number in curly brackets (e.g., {3}).
---
## Notes
- **No other languages** were detected in the image.
- **Spatial grounding**: Panels are arranged horizontally (P1–P7), with tables and questions directly below each panel.
- **Legend placement**: At the bottom of the image, aligned with the panels.