## Diagram: CLD Intuition and Qualitative Comparison
### Overview
The image contains two primary components:
1. **(a) CLD Intuition**: A 2D cluster visualization of digit representations with color-coded groupings and labeled examples.
2. **(b) Qualitative comparison**: A grid comparing digit reconstructions across different methods (Diff-SCM, Schut et al., Looveren & Klaise, and training).
---
### Components/Axes
#### (a) CLD Intuition
- **Legend**:
- Located on the right side.
- Colors map digits 0–9 to specific hues (e.g., blue = 0, red = 3, purple = 9).
- Example: The "factual" label (red) corresponds to digit 3, while "train" (purple) corresponds to digit 0.
- **Clusters**:
- Dots represent digit embeddings grouped by similarity.
- Arrows connect labeled examples (e.g., "do(0)" → blue cluster, "factual" → red cluster).
- **Labels**:
- "do(0)" (top-left), "factual" (top-right), "train" (bottom-left).
- Digits in clusters are annotated with their respective colors.
#### (b) Qualitative comparison
- **Rows**:
- Labeled `do(8)`, `do(3)`, `do(9)`, `do(4)` (top to bottom).
- Each row shows reconstructions of specific digits (e.g., `do(3)` includes 0 and 3).
- **Columns**:
- Labeled `orig.`, `Diff-SCM (ours)`, `Schut et al.`, `Looveren & Klaise`, `train`.
- Columns compare reconstruction quality across methods.
- **Images**:
- Digits are rendered in white on black backgrounds.
- Variations in clarity and distortion are visible (e.g., `do(4)` shows significant degradation in `Schut et al.`).
---
### Detailed Analysis
#### (a) CLD Intuition
- **Cluster distribution**:
- Digits are grouped into distinct clusters (e.g., 0s in blue, 3s in red).
- Overlaps between clusters suggest ambiguity in some representations.
- **Example placements**:
- "do(0)" (blue cluster) and "train" (purple cluster) are spatially separated, indicating distinct learned features.
- "factual" (red cluster) aligns with digit 3, confirming correct grouping.
#### (b) Qualitative comparison
- **Method performance**:
- **Diff-SCM (ours)**: Produces sharper, more accurate reconstructions (e.g., `do(3)` 3s are clearer than `Schut et al.`).
- **Schut et al.**: Introduces noise and distortion (e.g., `do(9)` 9s appear fragmented).
- **Looveren & Klaise**: Struggles with digit 1 (`do(4)` row shows blurred or incomplete strokes).
- **Training**: Balances accuracy and variability but lacks the precision of Diff-SCM.
---
### Key Observations
1. **CLD Intuition**:
- The model’s clustering aligns with digit semantics (e.g., 0s and 3s are well-separated).
- Factual examples (`do(0)`, `do(3)`) are correctly positioned in their respective clusters.
2. **Qualitative comparison**:
- **Diff-SCM (ours)** outperforms other methods in preserving digit structure across all tested cases.
- **Schut et al.** and **Looveren & Klaise** exhibit significant degradation in digit 1 and 4 reconstructions.
- Training improves over baseline methods but does not match Diff-SCM’s performance.
---
### Interpretation
- **CLD Intuition**: Demonstrates that the model learns meaningful digit representations, with factual examples reinforcing cluster boundaries. The spatial separation of "do(0)" and "train" suggests distinct feature learning for training and test-time examples.
- **Qualitative comparison**: Highlights the superiority of Diff-SCM in handling digit variations, particularly for challenging digits like 1 and 4. Other methods introduce artifacts or fail to generalize, indicating limitations in their reconstruction strategies.
- **Implications**: Diff-SCM’s performance suggests it effectively balances fidelity and robustness, making it suitable for tasks requiring precise digit reconstruction.