## Line Chart Grid: Cladder Ablation Study
### Overview
The image displays a 3x3 grid of line charts titled "Cladder Ablation Study." The charts compare the performance (Score) of three different "CAPT Settings" over training epochs across three data categories (Commonsense, Anticommonsense, Noncommonsense) and three training data percentages (1%, 2%, 90%). The overall purpose is to analyze how model performance evolves with training time under different conditions.
### Components/Axes
* **Overall Title:** "Cladder Ablation Study" (Top-left, above the grid).
* **Legend:** Located to the right of the grid, titled "CAPT Setting". It defines three lines:
* `CAPT=null` (Blue line with circle markers)
* `CAPT=order` (Green line with square markers)
* `CAPT=random` (Orange line with triangle markers)
* **Subplot Grid Structure:** 3 rows (Training Percentage) x 3 columns (Data Category).
* **Row Titles (Training Percentage):**
* Top Row: "1% Training"
* Middle Row: "2% Training"
* Bottom Row: "90% Training"
* **Column Titles (Data Category):**
* Left Column: "Commonsense"
* Middle Column: "Anticommonsense"
* Right Column: "Noncommonsense"
* **Individual Subplot Titles:** Combine Category and Percentage (e.g., "Commonsense - 1% Training").
* **Axes (Consistent across all subplots):**
* **X-axis:** "Number of Epochs". Scale ranges from 0 to 3000, with major ticks at 0, 1000, 2000, 3000.
* **Y-axis:** "Score". The scale varies by subplot:
* 1% Training row: ~50 to 70 (Commonsense, Noncommonsense), ~40 to 80 (Anticommonsense).
* 2% Training row: ~50 to 80 (Commonsense, Anticommonsense), ~50 to 75 (Noncommonsense).
* 90% Training row: ~50 to 95 (all subplots).
### Detailed Analysis
**Trend Verification & Data Point Extraction (Approximate Values):**
**Top Row (1% Training):**
* **Commonsense - 1% Training:**
* `CAPT=null` (Blue): Starts ~50, peaks ~72 at 1000 epochs, declines to ~68 by 3000 epochs. *Trend: Sharp rise then gradual decline.*
* `CAPT=order` (Green): Starts ~52, rises steadily to ~75 by 3000 epochs. *Trend: Consistent upward slope.*
* `CAPT=random` (Orange): Starts ~51, rises to ~76 by 3000 epochs, closely follows Green line. *Trend: Consistent upward slope.*
* **Anticommonsense - 1% Training:**
* `CAPT=null` (Blue): Starts ~30, rises to ~70 by 1000 epochs, plateaus/slight decline to ~68. *Trend: Sharp initial rise then plateau.*
* `CAPT=order` (Green): Starts ~45, rises to ~78 by 1000 epochs, plateaus. *Trend: Sharp rise then plateau at high level.*
* `CAPT=random` (Orange): Starts ~50, rises to ~76 by 1000 epochs, plateaus. *Trend: Sharp rise then plateau.*
* **Noncommonsense - 1% Training:**
* `CAPT=null` (Blue): Starts ~28, rises to ~65 by 1000 epochs, plateaus/slight decline to ~63. *Trend: Sharp rise then plateau/slight decline.*
* `CAPT=order` (Green): Starts ~50, rises to ~69 by 1000 epochs, plateaus. *Trend: Rise then plateau.*
* `CAPT=random` (Orange): Starts ~52, rises to ~70 by 1000 epochs, plateaus. *Trend: Rise then plateau.*
**Middle Row (2% Training):**
* **Commonsense - 2% Training:**
* `CAPT=null` (Blue): Starts ~58, rises to ~70 by 1000 epochs, plateaus/slight decline to ~69. *Trend: Rise then plateau.*
* `CAPT=order` (Green): Starts ~55, rises to ~79 by 2000 epochs, slight decline to ~76. *Trend: Rise to peak then slight decline.*
* `CAPT=random` (Orange): Starts ~50, rises to ~80 by 3000 epochs. *Trend: Steady, strong rise.*
* **Anticommonsense - 2% Training:**
* `CAPT=null` (Blue): Starts ~55, rises to ~71 by 1000 epochs, plateaus. *Trend: Rise then plateau.*
* `CAPT=order` (Green): Starts ~56, rises to ~80 by 2000 epochs, slight decline to ~78. *Trend: Rise to peak then slight decline.*
* `CAPT=random` (Orange): Starts ~52, rises to ~71 by 1000 epochs, dips to ~66 at 2000, then rises sharply to ~80 by 3000. *Trend: Volatile - rise, dip, then strong recovery.*
* **Noncommonsense - 2% Training:**
* `CAPT=null` (Blue): Starts ~52, rises to ~65 by 1000 epochs, plateaus/slight rise to ~66. *Trend: Rise then plateau.*
* `CAPT=order` (Green): Starts ~55, rises to ~73 by 2000 epochs, plateaus. *Trend: Rise then plateau.*
* `CAPT=random` (Orange): Starts ~50, rises to ~66 by 1000 epochs, dips to ~61 at 2000, then rises sharply to ~75 by 3000. *Trend: Volatile - rise, dip, then strong recovery.*
**Bottom Row (90% Training):**
* **Commonsense - 90% Training:**
* All three lines (`null`, `order`, `random`) start ~52-55, rise sharply and converge near a score of ~93-94 by 3000 epochs. *Trend: Strong, convergent upward slope.*
* **Anticommonsense - 90% Training:**
* All three lines start ~53-56, rise sharply and converge near a score of ~92 by 3000 epochs. *Trend: Strong, convergent upward slope.*
* **Noncommonsense - 90% Training:**
* All three lines start ~48-52, rise sharply and converge near a score of ~88 by 3000 epochs. *Trend: Strong, convergent upward slope.*
### Key Observations
1. **Training Data Effect:** Performance (Score) increases dramatically with more training data. Scores in the 90% training row are 15-25 points higher than in the 1% row for the same epoch.
2. **CAPT Setting Impact:** With low data (1%, 2%), `CAPT=order` (Green) and `CAPT=random` (Orange) generally outperform `CAPT=null` (Blue). With abundant data (90%), all three settings perform nearly identically.
3. **Convergence:** In the 90% training condition, all lines converge to nearly the same high score, suggesting the CAPT setting becomes less critical with sufficient data.
4. **Volatility:** The `CAPT=random` (Orange) line shows notable volatility (a mid-training dip) in the 2% training condition for Anticommonsense and Noncommonsense categories before recovering strongly.
5. **Category Differences:** The "Anticommonsense" category shows the highest peak scores in the low-data regimes (1%, 2%), particularly for the `CAPT=order` setting.
### Interpretation
This ablation study investigates the "CAPT" method's effect on model learning across different knowledge types (Commonsense, Anticommonsense, Noncommonsense) and data scarcity levels.
* **Core Finding:** The CAPT mechanism (`order` and `random` variants) provides a significant performance boost and more stable learning trajectories when training data is scarce (1-2%). This suggests it acts as a useful regularizer or guide in low-data regimes.
* **Data Sufficiency:** With abundant data (90%), the inherent knowledge in the dataset dominates, and the auxiliary CAPT signal provides negligible benefit, as all methods converge to the same high performance.
* **Robustness of `CAPT=order`:** The `order` variant appears most consistently strong across low-data conditions, while `random` shows more volatility, indicating that structured (`order`) guidance may be more reliable than unstructured (`random`) guidance when data is limited.
* **Task Difficulty:** The consistently high performance on "Anticommonsense" tasks, especially with `CAPT=order`, might indicate that this method is particularly effective at teaching models to handle counter-intuitive or non-standard knowledge patterns, which are often harder to learn from limited examples.
In summary, the charts demonstrate that the CAPT technique is a valuable tool for improving model performance and learning efficiency in data-constrained scenarios, with its benefits diminishing as the volume of training data becomes large.