Image b02b46e20d41...

EXPERT: healer-alpha-free VERSION 1

RUNTIME: free/openrouter/healer-alpha
INTEL_VERIFIED
## Composite Visualization: Multi-Model Performance Analysis (Radar, Training Reward, Accuracy)

### Overview  
The image contains three distinct visualizations analyzing model performance across tasks, training dynamics, and solution scaling: a **radar chart** (left), a **line graph** (middle), and a **scatter plot** (right).  


### 1. Left: Radar Chart (Multi-Task Performance)  
- **Axes & Scale**: Four radial axes: *AIME25* (top), *MATH500* (left), *AIME24* (right), *GPQA-Diamond* (bottom). Radial scale: 0.0–1.0 (markers at 0.2, 0.4, 0.6, 0.8, 1.0).  
- **Legend (Bottom)**: Six models (line styles/colors):  
  - Human Curated (orange dashed)  
  - Random (yellow dashed)  
  - Qwen2.5-Math-PRM-7B (blue dashed)  
  - Skywork-PRM-7B (green dashed)  
  - Qwen2.5-Math-PRM-72B (red dashed)  
  - ReasonFlux-PRM-7B (purple solid)  
- **Trends**:  
  - *ReasonFlux-PRM-7B* (purple) dominates across all axes (highest values on AIME25, GPQA-Diamond, etc.).  
  - *Human Curated* (orange) and *Random* (yellow) show moderate performance, while *Qwen2.5-Math-PRM-7B* (blue) and *Skywork-PRM-7B* (green) have lower scores.  


### 2. Middle: Line Graph (Training Reward vs. Step)  
- **Axes**:  
  - Y-axis: *Training Reward* (0.0–0.4).  
  - X-axis: *Step* (0–180).  
- **Legend (Bottom-Right)**: Three models:  
  - GRPO (blue, square markers)  
  - Qwen2.5-Math-PRM-7B (orange, triangle markers)  
  - ReasonFlux-PRM-7B (purple, star markers)  
- **Trends**:  
  - *GRPO* (blue): Starts low (~0.05), rises to ~0.3 by step 20, then fluctuates (0.25–0.3).  
  - *Qwen2.5-Math-PRM-7B* (orange): Starts ~0.15, rises to ~0.3, then fluctuates (similar to GRPO but slightly higher).  
  - *ReasonFlux-PRM-7B* (purple): Starts ~0.28, rises steadily to ~0.45 by step 180 (clear upward trend, outperforming others).  


### 3. Right: Scatter Plot (Accuracy vs. Number of Solutions, GPQA-Diamond)  
- **Title**: *GPQA-Diamond*  
- **Axes**:  
  - Y-axis: *Accuracy (%)* (48–54).  
  - X-axis: *Number of Solutions (N)* (2¹, 2², 2³, 2⁴ = 2, 4, 8, 16).  
- **Legend (Bottom-Right)**: Four models:  
  - Qwen2.5-Math-PRM-72B (red triangle)  
  - Skywork-PRM-7B (green triangle)  
  - Majority (blue square)  
  - ReasonFlux-PRM-7B (purple star)  
- **Data Points (Approximate)**:  
  - *ReasonFlux-PRM-7B* (purple): N=2¹ (~48.5%), N=2² (~51%), N=2³ (~54%), N=2⁴ (~55%) (highest accuracy).  
  - *Qwen2.5-Math-PRM-72B* (red): N=2¹ (~48%), N=2² (~50%), N=2³ (~52%), N=2⁴ (~54%).  
  - *Skywork-PRM-7B* (green): N=2¹ (~48%), N=2² (~49%), N=2³ (~51%), N=2⁴ (~52%).  
  - *Majority* (blue): N=2¹ (~48%), N=2² (~47.5%), N=2³ (~48.5%), N=2⁴ (~49%) (lowest, with a dip at N=2²).  


### Key Observations  
- **Radar Chart**: *ReasonFlux-PRM-7B* outperforms all models across multi-task benchmarks (AIME25, MATH500, AIME24, GPQA-Diamond).  
- **Training Reward**: *ReasonFlux-PRM-7B* shows a consistent upward trend in training reward, while GRPO and Qwen2.5-Math-PRM-7B plateau.  
- **Accuracy Scaling**: *ReasonFlux-PRM-7B* achieves the highest accuracy on GPQA-Diamond, with accuracy increasing with the number of solutions (N). *Majority* (baseline) performs poorly, especially at N=2².  


### Interpretation  
- **Multi-Task Strength**: *ReasonFlux-PRM-7B* demonstrates superior performance across diverse tasks (AIME, MATH, GPQA), suggesting robust generalization.  
- **Training Efficiency**: The upward trend in training reward for *ReasonFlux-PRM-7B* indicates effective learning over steps, outpacing GRPO and Qwen2.5-Math-PRM-7B.  
- **Solution Scaling**: For GPQA-Diamond, increasing the number of solutions (N) improves accuracy for all models, but *ReasonFlux-PRM-7B* benefits most, highlighting its ability to leverage more solutions for better performance.  


This composite visualization collectively illustrates *ReasonFlux-PRM-7B*’s dominance in multi-task performance, training dynamics, and solution scaling, outperforming baselines (GRPO, Qwen2.5-Math-PRM-7B, Majority) across all metrics.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

b02b46e20d4196d9b9fa8e0a

FOUND IN PAPERS

EXPERT: healer-alpha-free VERSION 1