## Line Graphs: Distillation Methods Performance Comparison
### Overview
Three line graphs compare the performance of different distillation methods (Embedding-based, InfoNCE, Score-based) across two learning rates (1e-4 lr and 1e-5 lr) over training steps. The y-axis measures average nDCG@10, while the x-axis tracks training steps (0 to 5,000+).
---
### Components/Axes
- **X-axis**: Training Steps (0 to ~5,500)
- **Y-axis**: Average nDCG@10 (0.0 to 0.6)
- **Legends**:
- Blue circles: 1e-4 lr
- Orange squares: 1e-5 lr
- **Graph Titles**:
1. Embedding-based Distillation (L_distill)
2. InfoNCE (L_NCE)
3. Score-based Distillation (L_score)
---
### Detailed Analysis
#### 1. Embedding-based Distillation (L_distill)
- **Blue line (1e-4 lr)**:
- Starts at ~0.12 (step 0), rises sharply to ~0.52 by step 5,000, then plateaus.
- Peaks at ~0.52 (step 5,000) and stabilizes.
- **Orange line (1e-5 lr)**:
- Begins at ~0.0, rises gradually to ~0.41 by step 5,000.
- Remains below the blue line throughout.
#### 2. InfoNCE (L_NCE)
- **Blue line (1e-4 lr)**:
- Starts at ~0.42, dips to ~0.36 by step 3,000, then stabilizes.
- Ends at ~0.38 (step 5,500).
- **Orange line (1e-5 lr)**:
- Begins at ~0.32, rises to ~0.44 by step 2,000, fluctuates between ~0.42–0.45.
- Ends at ~0.44 (step 5,500).
#### 3. Score-based Distillation (L_score)
- **Blue line (1e-4 lr)**:
- Starts at ~0.45, drops to ~0.37 by step 3,000, then stabilizes.
- Ends at ~0.38 (step 5,500).
- **Orange line (1e-5 lr)**:
- Begins at ~0.4, rises steadily to ~0.5 by step 3,000, then plateaus.
- Ends at ~0.5 (step 5,500).
---
### Key Observations
1. **Learning Rate Impact**:
- 1e-4 lr (blue) outperforms 1e-5 lr (orange) in Embedding-based Distillation.
- 1e-5 lr (orange) achieves higher performance in Score-based Distillation.
2. **Stability**:
- InfoNCE (L_NCE) shows minimal divergence between learning rates after step 2,000.
3. **Performance Trends**:
- Embedding-based Distillation (L_distill) with 1e-4 lr achieves the highest peak (~0.52).
- Score-based Distillation (L_score) with 1e-5 lr maintains the highest sustained performance (~0.5).
---
### Interpretation
- **Method-Specific Optimal Learning Rates**:
- Embedding-based Distillation benefits from a higher learning rate (1e-4), suggesting faster convergence.
- Score-based Distillation performs better with a lower learning rate (1e-5), indicating stability or reduced overfitting.
- **Convergence Patterns**:
- Embedding-based Distillation converges rapidly but plateaus early.
- Score-based Distillation shows gradual improvement, suggesting a more nuanced optimization landscape.
- **Anomalies**:
- InfoNCE (L_NCE) exhibits a dip in performance for 1e-4 lr around step 3,000, possibly due to hyperparameter sensitivity.
The data highlights the importance of tuning learning rates based on the distillation method, with no universal optimal value across all approaches.