# Technical Document Extraction: VirtualHome Runtime Errors
## Document Overview
The image depicts a structured technical document analyzing runtime errors in a virtual home environment. It is divided into two primary sections:
1. **VirtualHome: Trajectory – Runtime Error**
2. **BEHAVIOR: Trajectory – Runtime Error**
Each section contains four error categories with detailed error descriptions, model information, and state conditions.
---
## Section 1: VirtualHome: Trajectory – Runtime Error
### 1.1 Wrong Order
- **Model**: Gemini 1.5 Flash
- **Task Name**: Drink
- **Task ID**: `scene_1_171_2`
- **Scene ID**: `scene_1_171_2`
- **Error Details**:
```plaintext
PUTBACK(cup.100, 0, sink.231)
DRINK(cup.1000)
```
- **Precondition**:
`holds(cup.1000) = False` (❌)
- **Historical State**:
`holds(cup.1000) = False` (✅)
### 1.2 Missing Step
- **Model**: Gemini 1.5 Flash
- **Task Name**: Wash hands
- **Task ID**: `scene_1_813_2`
- **Scene ID**: `scene_1_813_2`
- **Error Details**:
```plaintext
WALK(bathroom.1)
RINSE(hands_both.1000)
```
- **Precondition**:
`next_to(sink.42) = False` (❌)
`holds(soap.100) = False` (❌)
- **Historical State**:
`next_to(sink.42) = False` (❌)
`holds(soap.100) = False` (❌)
### 1.3 Affordance Error
- **Model**: Mixtral 8x22b MOE
- **Task Name**: Work
- **Task ID**: `scene_1_670_2`
- **Scene ID**: `scene_1_670_2`
- **Error Details**:
```plaintext
WALK(home_office.319)
FIND(mouse.413)
TYPE(mouse.413)
```
- **Affordance**:
`mouse.413 can't be typed` (❌)
### 1.4 Additional Step
- **Model**: Mistral Large
- **Task Name**: Set up table
- **Task ID**: `scene_1_93_1`
- **Scene ID**: `scene_1_93_1`
- **Error Details**:
```plaintext
STANDUP()
```
- **Current State**:
`stand_up(character.45) = True` (✅)
- **Expected State**:
`stand_up(character.45) = False` (❌)
---
## Section 2: BEHAVIOR: Trajectory – Runtime Error
### 2.1 Wrong Order
- **Model**: GPT-4o
- **Task Name**: Chopping Vegetables
- **Error Details**:
```plaintext
RIGHT_GRASP(carving_knife_69)
LEFT_GRASP(tomato_61)
SLICE(tomato_61)
```
- **Precondition**:
`not_in_hand(tomato_61) = False` (❌)
- **Historical State**:
`not_in_hand(tomato_61) = True` (✅)
### 2.2 Missing Step
- **Model**: GPT-4o
- **Task Name**: Cleaning bathtubs
- **Error Details**:
```plaintext
stained(bathtub_35)
RIGHT_GRASP(scrub_brush_0)
CLEAN(bathtub_35)
```
- **Precondition**:
`soaked(scrub_brush_0) = False` (❌)
- **Historical State**:
`soaked(scrub_brush_0) = False` (❌)
### 2.3 Affordance Error
- **Model**: Claude-3 Sonnet
- **Task Name**: Bottling fruit
- **Error Details**:
```plaintext
sliced(strawberry_0)
RIGHT_TRANSFER_CONTENTS_INSIDE(strawberry_0)
```
- **Affordance**:
`strawberry_0 is sliced and not interactive. Should interact with strawberry_0_part0 and strawberry_0_part1` (❌)
### 2.4 Additional Step
- **Model**: Claude-3 Opus
- **Task Name**: Cleaning up the kitchen
- **Error Details**:
```plaintext
OPEN(top_cabinet_27)
OPEN(top_cabinet_27)
```
- **Current State**:
`open(top_cabinet_27) = True` (✅)
- **Expected State**:
`open(top_cabinet_27) = False` (❌)
---
## Key Observations
1. **Error Types**:
- **Wrong Order**: Incorrect sequence of actions (e.g., `DRINK` before `PUTBACK`).
- **Missing Step**: Omitted critical actions (e.g., `RINSE` in handwashing).
- **Affordance Error**: Invalid interactions (e.g., typing a mouse).
- **Additional Step**: Unnecessary actions (e.g., redundant `OPEN` commands).
2. **Model-Specific Errors**:
- **Gemini 1.5 Flash**: Struggles with object state tracking (e.g., `holds(cup.1000)`).
- **Mixtral 8x22b MOE**: Fails to recognize object affordances (e.g., `mouse.413`).
- **Claude-3 Sonnet**: Misjudges object interactivity (e.g., sliced strawberry).
3. **State Management**:
- Preconditions and historical states are critical for error detection.
- Cross-referencing current vs. expected states highlights logical inconsistencies.
---
## Notes
- All text is in **English**. No other languages are present.
- No charts, diagrams, or numerical data are included.
- The document focuses on textual error analysis with structured error descriptions and model-specific failures.