Image b3d273903e61...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
## Line Chart: Model Accuracy vs. Number of Reasoning Hops

### Overview
The chart compares the accuracy of three AI models (GPT-5.2, Gemini 3 Pro, and SFT+RL) across 2–5 reasoning hops. Accuracy is plotted on the y-axis (65–95%), while the x-axis represents the number of reasoning hops (2–5). The chart is divided into two regions: "Simpler" (left, blue) and "Complex" (right, orange). The SFT+RL model (orange line) shows the highest accuracy, with notable percentage increases (+7%, +7%, +10%, +19%) annotated at each hop.

### Components/Axes
- **X-axis**: "Number of Reasoning Hops" (2, 3, 4, 5).
- **Y-axis**: "Accuracy (%)" (65–95%).
- **Legend**:
  - Gray square: GPT-5.2
  - Blue triangle: Gemini 3 Pro
  - Orange diamond: SFT+RL (14B, Ours)
- **Regions**:
  - Left (blue): "Simpler"
  - Right (orange): "Complex"

### Detailed Analysis
1. **GPT-5.2** (gray line):
   - Hop 2: ~77.5%
   - Hop 3: ~73.5%
   - Hop 4: ~77%
   - Hop 5: ~70.5%
   - Trend: Initial drop, slight recovery, then sharp decline.

2. **Gemini 3 Pro** (blue line):
   - Hop 2: ~75.5%
   - Hop 3: ~72%
   - Hop 4: ~75.5%
   - Hop 5: ~70.5%
   - Trend: Volatile, with no clear upward trajectory.

3. **SFT+RL (14B, Ours)** (orange line):
   - Hop 2: ~84.5% (+7%)
   - Hop 3: ~81.5% (+7%)
   - Hop 4: ~87% (+10%)
   - Hop 5: ~89.5% (+19%)
   - Trend: Steady upward slope in the "Complex" region.

### Key Observations
- SFT+RL consistently outperforms other models, especially in the "Complex" region (hops 4–5).
- GPT-5.2 and Gemini 3 Pro show declining accuracy with more hops, while SFT+RL improves.
- Percentage increases for SFT+RL are highest in the "Complex" region (+19% at hop 5).

### Interpretation
The data demonstrates that SFT+RL (14B) excels in complex reasoning tasks, with accuracy gains accelerating as the number of hops increases. This suggests its architecture is better suited for multi-step reasoning compared to GPT-5.2 and Gemini 3 Pro, which degrade in performance under similar conditions. The "Complex" region’s orange shading emphasizes the model’s scalability, while the "Simpler" region highlights baseline performance. The percentage annotations (+7%, +19%) underscore SFT+RL’s efficiency in leveraging additional reasoning steps for higher accuracy.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

b3d273903e61081097cf338f

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1