## Comparison of Generative Stereo Methods
### Overview
The image presents a side-by-side comparison of four generative stereo (GS) methods applied to urban street scenes, alongside ground truth (GT) images. Each method's output is annotated with colored bounding boxes (red, green, yellow) to highlight regions of interest or discrepancies. The comparison is structured in two rows, with the top row focusing on a street scene with vehicles and the bottom row on a building facade with a motorcycle.
### Components/Axes
- **Labels**:
- Top row: "Hierarchical-GS," "Hierarchical-GS (τ2)," "Our-3D-GS," "Our-Scaffold-GS," "GT."
- Bottom row: Same labels as the top row.
- **Annotations**:
- **Red boxes**: Highlight regions where the method's output appears superior to GT (e.g., sharper textures, clearer details).
- **Green/yellow boxes**: Indicate regions where the method's output diverges from GT (e.g., blurring, artifacts, missing details).
- **Text in Boxes**:
- Top row: "BRAVA INJAGE DANS" (inside red/yellow boxes).
- Bottom row: No explicit text in boxes, but annotations focus on structural details (e.g., building windows, motorcycle).
### Detailed Analysis
- **Top Row (Street Scene)**:
- **Hierarchical-GS**: Red box on the rear of a dark car; GT has a yellow box in the same area.
- **Hierarchical-GS (τ2)**: Red box on the car's rear; GT has a yellow box.
- **Our-3D-GS**: Red box on the car's rear; GT has a yellow box.
- **Our-Scaffold-GS**: Green box on the car's rear; GT has a yellow box.
- **GT**: Yellow box on the car's rear, labeled "BRAVA INJAGE DANS."
- **Bottom Row (Building Facade)**:
- **Hierarchical-GS**: Red box on the building's wall; GT has a yellow box.
- **Hierarchical-GS (τ2)**: Red box on the wall; GT has a yellow box.
- **Our-3D-GS**: Red box on the wall; GT has a yellow box.
- **Our-Scaffold-GS**: Green box on the wall; GT has a yellow box.
- **GT**: Yellow box on the wall, highlighting structural details.
### Key Observations
1. **Red Boxes**: Consistently appear in the first three methods (Hierarchical-GS, Hierarchical-GS (τ2), Our-3D-GS), suggesting these methods preserve certain details (e.g., car textures, building edges) better than GT in specific regions.
2. **Green/Yellow Boxes**: Dominant in "Our-Scaffold-GS," indicating significant discrepancies in texture or structure compared to GT.
3. **GT Annotations**: Yellow boxes in GT images highlight ground truth details (e.g., "BRAVA INJAGE DANS" text), serving as a reference for evaluating method performance.
### Interpretation
The comparison demonstrates that:
- **Hierarchical-GS and τ2** methods show moderate alignment with GT, with red boxes indicating localized improvements (e.g., sharper car details).
- **Our-3D-GS** performs similarly to Hierarchical methods but with slightly fewer red boxes, suggesting comparable but less consistent performance.
- **Our-Scaffold-GS** exhibits the most divergence from GT, as evidenced by green/yellow boxes, potentially due to over-smoothing or artifact introduction.
- The GT annotations ("BRAVA INJAGE DANS") confirm that the methods are evaluated against real-world text and structural details, emphasizing the importance of fidelity in urban scenes.
This analysis underscores the trade-offs between different GS approaches, with some methods excelling in specific regions while others introduce artifacts. The red/green/yellow annotations provide a visual guide to method strengths and weaknesses, critical for refining generative stereo algorithms.