# Technical Data Extraction: Principal Component Analysis and Model Performance
This document provides a comprehensive extraction of data and trends from the provided three-panel technical figure (labeled **a**, **b**, and **c**).
---
## Panel (a): Variance Explained by Principal Components
**Type:** Log-log line chart with vertical markers.
**Component Isolation:**
- **X-axis:** "Principal Component" (Log scale: $10^0$ to $10^2$).
- **Y-axis:** "Variance Explained" (Log scale: $10^{-4}$ to $10^{-1}$).
- **Visual Trend:** The main grey line slopes downward, indicating that as the index of the principal component increases, the amount of variance it explains decreases (typical of PCA).
**Key Data Points & Markers:**
- **The Curve:** Starts at approximately $10^{-1}$ for the 1st PC and decays to $10^{-4}$ by the 512th PC.
- **Vertical Dashed Lines:** These markers identify specific Principal Components (PCs) used in subsequent panels.
- **Yellow:** PC1
- **Light Green:** PC2
- **Teal:** PC4
- **Dark Teal:** PC8
- **Blue-Grey:** PC32
- **Dark Blue:** PC128
- **Purple:** PC512
- **Red 'X' Marker:** Located on the curve at approximately the 20th-30th Principal Component, likely indicating a point of interest or a specific threshold.
---
## Panel (b): Similarity: LR axis vs. PCs
**Type:** Multi-series line plot.
**Component Isolation:**
- **X-axis:** "Layer" (Linear scale: 1 to 32).
- **Y-axis:** "Similarity: LR axis vs. PCs" (Linear scale: 0.0 to 0.4).
- **Legend (Top Right):** PC1 (Yellow), PC2 (Light Green), PC4 (Teal), PC8 (Dark Teal), PC32 (Blue-Grey), PC128 (Dark Blue), PC512 (Purple).
**Trend Analysis:**
- **PC4 (Teal):** Shows the highest overall similarity, peaking near layer 2 and layer 14 with significant volatility.
- **PC2 (Light Green):** Shows moderate similarity, peaking around layer 16-20.
- **High-index PCs (PC32, PC128, PC512):** These lines remain consistently low (near 0.0 on the Y-axis) across all 32 layers, indicating very low similarity to the LR axis.
- **PC1 (Yellow):** Shows low to moderate similarity, fluctuating between 0.0 and 0.2.
---
## Panel (c): Model Performance Metrics
This panel contains two sub-plots comparing Logistic Regression (LR) against various Principal Components (PCs) over the number of training examples.
### Sub-plot 1: Accuracy
- **X-axis:** "# Examples" (0 to 600).
- **Y-axis:** "Accuracy" (0.5 to 0.8+).
- **Legend (Shared):** LR (Red), PC1 (Yellow), PC2 (Light Green), PC4 (Teal), PC8 (Dark Teal), PC32 (Blue-Grey), PC128 (Dark Blue), PC512 (Purple).
**Trend Verification:**
- **LR (Red):** Slopes sharply upward and plateaus quickly at the highest accuracy (~0.82).
- **PC Series:** Accuracy generally increases with the number of examples. There is a clear hierarchy: lower-index PCs (PC1, PC2) perform better than higher-index PCs (PC512).
- **PC1 (Yellow) & PC2 (Light Green):** Converge toward ~0.75 accuracy.
- **PC512 (Purple):** Performs the worst, staying near the baseline of 0.5 (chance level).
### Sub-plot 2: Cross-entropy (Loss)
- **X-axis:** "# Examples" (0 to 600).
- **Y-axis:** "Cross-entropy" (0.4 to 1.2).
**Trend Verification:**
- **All Lines:** Slope downward, indicating that loss decreases as more examples are provided.
- **LR (Red):** Shows the steepest decline, reaching the lowest loss (~0.4).
- **PC Hierarchy:** Mirroring the accuracy plot, the loss is lowest for PC1/PC2 and highest for PC128/PC512.
- **PC128 & PC512:** These lines are clustered at the top, showing the highest loss (~0.7), indicating poor model fit compared to lower-dimensional components.
---
## Summary Table of PC Color Coding
Used consistently across all panels:
| Label | Color | Description |
| :--- | :--- | :--- |
| **LR** | Red | Logistic Regression (Baseline/Full Model) |
| **PC1** | Yellow | 1st Principal Component |
| **PC2** | Light Green | 2nd Principal Component |
| **PC4** | Teal | 4th Principal Component |
| **PC8** | Dark Teal | 8th Principal Component |
| **PC32** | Blue-Grey | 32nd Principal Component |
| **PC128** | Dark Blue | 128th Principal Component |
| **PC512** | Purple | 512th Principal Component |