Image 062ee8d185d3...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Extraction

## Chart 1: Normalized Log-Likelihood vs. Normalized FLOPs

### Axes Labels
- **Y-Axis**: "Normalized log-likelihood (measured on separate test set)"  
  - Range: 0.990 to 1.010  
  - Increment: 0.005  
- **X-Axis**: "Normalized FLOPs per FFW pass (to the isoFLOP-optimal baseline)"  
  - Range: 0.4 to 2.0  
  - Increment: 0.2  

### Legend
- **Baseline**: Black dashed line with circular markers  
- **MoD using top-k**: Blue dotted line with circular markers  
- **MoD using predictor**: Green dotted line with circular markers  

### Data Trends
1. **Baseline (Black)**:  
   - Starts at ~1.005 log-likelihood at 0.4 FLOPs.  
   - Drops sharply to ~0.990 at 0.8 FLOPs.  
   - Rises to ~1.005 at 1.8 FLOPs.  
   - Ends at ~1.005 at 2.0 FLOPs.  

2. **MoD using top-k (Blue)**:  
   - Starts at ~0.998 log-likelihood at 0.4 FLOPs.  
   - Dips to ~0.990 at 0.8 FLOPs.  
   - Rises to ~1.005 at 1.8 FLOPs.  
   - Ends at ~1.005 at 2.0 FLOPs.  

3. **MoD using predictor (Green)**:  
   - Starts at ~1.005 log-likelihood at 0.4 FLOPs.  
   - Dips to ~0.990 at 0.8 FLOPs.  
   - Rises to ~1.005 at 1.8 FLOPs.  
   - Ends at ~1.005 at 2.0 FLOPs.  

### Shaded Region
- **X-Axis Range**: 0.8 to 1.0 FLOPs  
- **Y-Axis Range**: 0.990 to 1.000  

### Key Observations
- All methods converge to similar log-likelihood values at higher FLOPs (1.8–2.0).  
- The "MoD using predictor" method maintains higher log-likelihood than "MoD using top-k" across most FLOP ranges.  
- The shaded region highlights a performance dip for all methods between 0.8–1.0 FLOPs.  

---

## Chart 2: Top-k Prediction Accuracy vs. Training Steps

### Axes Labels
- **Y-Axis**: "Top-k prediction accuracy"  
  - Range: 0.70 to 1.00  
  - Increment: 0.05  
- **X-Axis**: "Training step"  
  - Range: 0 to 15,000  
  - Increment: 5,000  

### Data Trends
- **Line**: Solid teal line with circular markers  
  - Starts at ~0.90 accuracy at 0 training steps.  
  - Drops sharply to ~0.70 accuracy at ~500 training steps.  
  - Rises to ~0.95 accuracy at ~10,000 training steps.  
  - Plateaus at ~0.95 accuracy for the remainder of training (10,000–15,000 steps).  

### Key Observations
- Initial accuracy drop suggests a learning phase or overfitting.  
- Recovery to ~0.95 accuracy indicates stabilization after ~10,000 steps.  
- No further improvement observed beyond 10,000 steps.  

---

## Cross-Referenced Legend Consistency
- **Baseline (Black)**: Matches dashed line in Chart 1.  
- **MoD using top-k (Blue)**: Matches dotted line in Chart 1.  
- **MoD using predictor (Green)**: Matches dotted line in Chart 1.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

062ee8d185d3f1e2a7d89079

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1