Image 58eef43c350e...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
## Horizontal Bar Chart: Task Success Rates of Code Generation Models

### Overview
The chart compares task success rates (%) of various code generation models, categorized by base model (Cursor, Codex, Claude Code, OpenCode) and fine-tuning method (e.g., Opus 4.5, Sonnet 4.5, GPT 5). Success rates range from 0% to 16%, with the highest performers clustered near 12%.

### Components/Axes
- **X-axis**: Task Success Rate (%) (0–16 scale)
- **Y-axis**: Model combinations (e.g., "Cursor + Opus 4.5", "Codex + GPT 5")
- **Legend**: 
  - Red = Cursor
  - Blue = Codex
  - Green = Claude Code
  - Yellow = OpenCode
- **Text Annotations**: Percentages (e.g., "12.0%") at the end of each bar

### Detailed Analysis
1. **Highest Performers (12.0%)**:
   - Cursor + Opus 4.5 (red)
   - Cursor + Sonnet 4.5 (red)
   - Codex + GLM 4.6 (blue)
2. **Mid-Tier (10.0%)**:
   - Codex + Sonnet 4.5 (blue)
   - Codex + GPT 5 (blue)
   - Claude Code + GLM 4.6 (green)
   - Claude Code + Sonnet 4.5 (green)
   - Claude Code + Opus 4.5 (green)
3. **Lower Tier (8.0%)**:
   - Cursor + GPT 5.2 (red)
   - Claude Code + Opus 4.5 (green)
   - Claude Code + Haiku (green)
   - OpenCode + GLM 4.6 (yellow)
4. **Low Performers (6.0–4.0%)**:
   - Cursor + Gemini 3 Pro (red): 6.0%
   - OpenCode + GPT 5.1 (yellow): 6.0%
   - OpenCode + Opus 4.5 (yellow): 4.0%
   - OpenCode + Sonnet 4.5 (yellow): 4.0%
   - OpenCode + Gemini 3 Pro (yellow): 4.0%
   - OpenCode + GPT 5.2 (yellow): 4.0%
5. **Outlier**:
   - Codex + GPT 5.1 (blue): 0.0%

### Key Observations
- **Dominance of Cursor Models**: Cursor combinations with Opus 4.5 and Sonnet 4.5 achieve the highest success rates (12.0%).
- **Codex Variability**: Codex models perform well with GLM 4.6 (12.0%) but fail entirely with GPT 5.1 (0.0%).
- **OpenCode Underperformance**: All OpenCode combinations score ≤6.0%, with GPT 5.1 being the worst (0.0%).
- **Claude Code Consistency**: Claude Code models maintain 8–10% success rates across multiple fine-tuning methods.

### Interpretation
The data suggests that **Cursor models fine-tuned with Opus 4.5 and Sonnet 4.5** are the most effective for code generation tasks, outperforming other base models. **Codex models show inconsistency**, excelling with GLM 4.6 but failing catastrophically with GPT 5.1. **OpenCode models underperform across the board**, indicating potential limitations in their architecture or training data. The **0.0% success rate for Codex + GPT 5.1** warrants investigation—it may reflect incompatibility between the Codex base model and GPT 5.1 fine-tuning, or data quality issues. The clustering of success rates around 10–12% implies a ceiling effect for current state-of-the-art models, while the OpenCode results highlight opportunities for improvement in open-source code generation frameworks.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

58eef43c350ea6e951027724

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1