## Line Charts: Distillation Methods Comparison
### Overview
The image presents three line charts comparing the performance of different distillation methods: Embedding-based Distillation, InfoNCE, and Score-based Distillation. Each chart plots the average nDCG@10 (normalized Discounted Cumulative Gain at 10) against training steps for two different learning rates (1e-4 and 1e-5).
### Components/Axes
* **Titles:**
* Left Chart: Embedding-based Distillation (L\_distill)
* Middle Chart: InfoNCE (L\_NCE^(q→d))
* Right Chart: Score-based Distillation (L\_score)
* **X-axis (all charts):** Training Steps, ranging from 0 to 5000.
* **Y-axis (all charts):** Average nDCG@10, ranging from 0.0 to 0.6.
* **Legend (bottom-right of each chart):**
* Blue line: 1e-4 lr (learning rate)
* Orange line: 1e-5 lr (learning rate)
### Detailed Analysis
**1. Embedding-based Distillation (Left Chart):**
* **Blue Line (1e-4 lr):** The blue line starts at approximately 0.12 nDCG@10 and rapidly increases to about 0.45 by 1000 training steps. It then gradually increases, reaching approximately 0.52 by 5000 training steps.
* **Orange Line (1e-5 lr):** The orange line starts at 0.0 nDCG@10 and increases steadily, reaching approximately 0.40 by 5000 training steps.
**2. InfoNCE (Middle Chart):**
* **Blue Line (1e-4 lr):** The blue line starts at approximately 0.41 nDCG@10, increases slightly to 0.44 around 500 training steps, and then decreases gradually to approximately 0.36 by 5000 training steps.
* **Orange Line (1e-5 lr):** The orange line starts at approximately 0.32 nDCG@10, increases to approximately 0.44 by 1000 training steps, and then fluctuates slightly around 0.43, ending at approximately 0.44 by 5000 training steps.
**3. Score-based Distillation (Right Chart):**
* **Blue Line (1e-4 lr):** The blue line starts at approximately 0.45 nDCG@10, decreases to approximately 0.37 by 2000 training steps, and then fluctuates slightly around 0.38, ending at approximately 0.38 by 5000 training steps.
* **Orange Line (1e-5 lr):** The orange line starts at approximately 0.39 nDCG@10, increases to approximately 0.50 by 3000 training steps, and then remains relatively stable around 0.50, ending at approximately 0.50 by 5000 training steps.
### Key Observations
* **Embedding-based Distillation:** The 1e-4 learning rate performs better than the 1e-5 learning rate.
* **InfoNCE:** The 1e-5 learning rate performs better than the 1e-4 learning rate after approximately 1000 training steps.
* **Score-based Distillation:** The 1e-5 learning rate performs better than the 1e-4 learning rate after approximately 1000 training steps.
* The Embedding-based Distillation method shows the most significant difference in performance between the two learning rates.
### Interpretation
The charts illustrate the impact of different learning rates on the performance of three distillation methods. Embedding-based Distillation benefits from a higher learning rate (1e-4), while InfoNCE and Score-based Distillation perform better with a lower learning rate (1e-5) after a certain number of training steps. This suggests that the optimal learning rate is dependent on the specific distillation method used. The initial rapid increase in nDCG@10 for Embedding-based Distillation with a learning rate of 1e-4 indicates faster learning, but the other two methods benefit from a more gradual approach.