## Visual Comparison Diagram: 3D Gaussian Splatting Reconstruction Quality
### Overview
This image is a qualitative comparison of different 3D Gaussian Splatting (GS) methods for novel view synthesis. It presents a side-by-side visual evaluation across two different scenes (top row: street scene, bottom row: building facade). The comparison aims to demonstrate the visual fidelity and detail preservation of the proposed methods ("Our-3D-GS" and "Our-Scaffold-GS") against baseline methods ("Hierarchical-GS" and "Hierarchical-GS (T2)") and the Ground Truth ("GT").
### Components/Axes
* **Structure:** A 2x5 grid of sub-images.
* **Column Headers (Method Labels):** Centered above each column.
1. `Hierarchical-GS`
2. `Hierarchical-GS (T2)`
3. `Our-3D-GS`
4. `Our-Scaffold-GS`
5. `GT` (Ground Truth)
* **Visual Annotations:** Colored bounding boxes highlight specific regions of interest for comparison.
* **Red Boxes:** Used for `Hierarchical-GS`, `Hierarchical-GS (T2)`, and `Our-3D-GS`.
* **Green Box:** Used for `Our-Scaffold-GS`.
* **Yellow Box:** Used for `GT`.
* **Scenes:**
* **Top Row:** A street view with parked cars, buildings, and a prominent sign on a car's rear window.
* **Bottom Row:** A close-up view of a building facade with windows, a scooter, and architectural details.
### Detailed Analysis
**Top Row - Street Scene:**
* **Focus Area:** A sign on the rear window of a dark blue car.
* **Text Transcription (Visible in GT):** The sign contains French text. The clearest words are "BRAYA", "INAGE", and "DANS". The full text is partially obscured but appears to be an advertisement or notice.
* **Method Comparison (Left to Right):**
* `Hierarchical-GS`: The text within the red box is heavily blurred and illegible.
* `Hierarchical-GS (T2)`: The text is extremely blurred, appearing as a smudge with no discernible characters.
* `Our-3D-GS`: The text is clearer than the previous two but still blurry. Some letter shapes are vaguely visible.
* `Our-Scaffold-GS`: The text within the green box is significantly sharper. The words "BRAYA", "INAGE", and "DANS" are readable, though not perfectly crisp.
* `GT`: The text within the yellow box is sharp and fully legible, serving as the reference.
**Bottom Row - Building Facade:**
* **Focus Areas:** Two regions are highlighted: a window on the left and a section of the facade/awning on the right.
* **Method Comparison (Left to Right):**
* `Hierarchical-GS`: Both red-boxed regions are very blurry. The window pane details and facade texture are lost.
* `Hierarchical-GS (T2)`: Similar severe blurriness as the first column.
* `Our-3D-GS`: Moderate improvement. Some structural lines are visible, but fine details and textures remain smeared.
* `Our-Scaffold-GS`: Notable improvement in the green-boxed regions. The window frame and the vertical lines on the facade are much sharper and more defined, approaching the GT.
* `GT`: The yellow-boxed regions show crisp edges, clear window panes, and distinct architectural details.
### Key Observations
1. **Progressive Improvement:** There is a clear visual trend of improving reconstruction quality from left to right across the columns, culminating in the `GT`.
2. **Text as a Key Differentiator:** The ability to reconstruct legible text (top row) is a strong differentiator. `Our-Scaffold-GS` performs markedly better than the Hierarchical baselines and `Our-3D-GS` in this regard.
3. **Detail Preservation:** The bottom row demonstrates that `Our-Scaffold-GS` preserves high-frequency details (edges, lines, textures) much better than the other non-GT methods, which produce smoothed-out or blurred results.
4. **Failure Case of `Hierarchical-GS (T2)`:** The `(T2)` variant appears to perform worse than the standard `Hierarchical-GS` in these examples, producing the most blurred results.
### Interpretation
This diagram serves as visual evidence for a research paper, arguing for the superiority of the authors' proposed methods, particularly `Our-Scaffold-GS`. The comparison is designed to show that their approach better handles challenging aspects of scene reconstruction:
* **Semantic Detail:** Legible text is a high-level semantic feature. The success of `Our-Scaffold-GS` here suggests it better integrates or preserves features critical for recognition.
* **Geometric Fidelity:** The sharp edges and lines in the building facade (bottom row) indicate better geometric accuracy and less "floaters" or artifacts common in neural rendering.
* **Methodological Progress:** The progression from `Hierarchical-GS` to `Our-3D-GS` to `Our-Scaffold-GS` implies an iterative improvement in the underlying algorithm, with the scaffold-based approach yielding the most visually convincing results closest to ground truth. The use of colored boxes strategically draws the viewer's eye to the most telling differences, making the argument visually intuitive.