\n
## Line Chart: Average nDCG@10 vs. Training Steps
### Overview
This image presents a line chart illustrating the performance of "Student" and "Teacher" models, both in "frozen" and "trainable" projection states, as measured by Average nDCG@10 over a range of Training Steps. The chart aims to compare the learning curves of these different configurations.
### Components/Axes
* **X-axis:** Training Steps, ranging from 0 to approximately 6000, with tick marks at 0, 1000, 2000, 3000, 4000, 5000, and 6000.
* **Y-axis:** Average nDCG@10, ranging from 0 to 0.6, with tick marks at 0, 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6.
* **Legend:** Located at the bottom-right of the chart. It contains the following labels and corresponding colors:
* Student (proj. frozen) - Blue
* Student (proj. trainable) - Green
* Teacher (proj. frozen) - Orange
* Teacher (proj. trainable) - Red
* **Grid:** A light gray grid is present across the chart, aiding in value estimation.
### Detailed Analysis
The chart displays four distinct lines, each representing a different model configuration.
* **Student (proj. frozen) - Blue Line:** This line starts at approximately 0.18 at 0 Training Steps and rapidly increases to around 0.45 by 1000 Training Steps. It continues to rise, reaching approximately 0.53 by 4000 Training Steps, and plateaus around 0.54-0.55 for the remainder of the training period.
* **Student (proj. trainable) - Green Line:** This line begins at approximately 0.12 at 0 Training Steps and exhibits a slower initial increase compared to the frozen student. It reaches around 0.40 by 1000 Training Steps, and continues to climb, eventually surpassing the frozen student, reaching approximately 0.55 by 4000 Training Steps. It plateaus around 0.56-0.57 for the remainder of the training period.
* **Teacher (proj. frozen) - Orange Line:** This line starts at approximately 0.02 at 0 Training Steps and shows a very slow initial increase. It reaches around 0.45 by 4000 Training Steps and plateaus around 0.48-0.50 for the remainder of the training period.
* **Teacher (proj. trainable) - Red Line:** This line begins at approximately 0.03 at 0 Training Steps and exhibits a slow initial increase. It reaches around 0.47 by 4000 Training Steps and plateaus around 0.50-0.52 for the remainder of the training period.
### Key Observations
* The "trainable" projections consistently outperform the "frozen" projections for both Student and Teacher models.
* The Student model, regardless of projection state, generally outperforms the Teacher model.
* The Student (proj. trainable) line shows the highest overall performance, reaching the highest nDCG@10 value.
* The Teacher (proj. frozen) line shows the lowest overall performance, reaching the lowest nDCG@10 value.
* All lines exhibit diminishing returns in performance as training progresses beyond 4000 steps, indicating convergence.
### Interpretation
The data suggests that allowing the projection layers to be trainable during training leads to improved performance (higher nDCG@10) for both Student and Teacher models. The Student model, in general, demonstrates a stronger learning capacity than the Teacher model, potentially due to differences in model architecture or initialization. The plateauing of the lines after 4000 training steps indicates that the models are converging and further training may not yield significant improvements. The initial low performance of the Teacher (proj. frozen) model suggests that the frozen projection layers are not effectively capturing the relevant information for the task. The difference in performance between the frozen and trainable models highlights the importance of adapting the projection layers to the specific training data. This could be due to the projection layers learning to better represent the data in a way that is more suitable for the downstream task.