Image ebc402146d12...

EXPERT: gemini-2.0-flash VERSION 1

RUNTIME: nugit/gemini/gemini-2.0-flash
INTEL_VERIFIED
## Bar Chart: Evaluation on Verification and Correction (Base Model: Qwen2.5-Math-7B)

### Overview
The image presents two bar charts comparing the performance of different models (SFT, SFT + Process-level RL, and SFT + Outcome-level RL) on self-verification and self-correction metrics. The left chart focuses on self-verification metrics (Verification Accuracy, Error Recall, Correct Precision), while the right chart focuses on self-correction metrics (Incorrect to Correct, Correct to Incorrect).

### Components/Axes

**Left Chart (Self-verification Metrics):**

*   **Title:** Self-verification Metrics
*   **X-axis:** Categorical axis with three categories: "Verification Accuracy", "Error Recall", and "Correct Precision".
*   **Y-axis:** "Value (%)", ranging from 50 to 90 in increments of 10.
*   **Legend:** Located in the top-left corner.
    *   Gray: SFT
    *   Teal: SFT + Process-level RL
    *   Peach: SFT + Outcome-level RL

**Right Chart (Self-correction Metrics):**

*   **Title:** Self-correction Metrics
*   **X-axis:** Categorical axis with two categories: "Incorrect to Correct" and "Correct to Incorrect".
*   **Y-axis:** "Value (%)", ranging from 0 to 14 in increments of 2.
*   **Legend:** (Same as left chart)
    *   Gray: SFT
    *   Teal: SFT + Process-level RL
    *   Peach: SFT + Outcome-level RL

### Detailed Analysis

**Left Chart (Self-verification Metrics):**

*   **Verification Accuracy:**
    *   SFT (Gray): 61.58%
    *   SFT + Process-level RL (Teal): 74.61%
    *   SFT + Outcome-level RL (Peach): 66.49%
    *   Trend: SFT + Process-level RL performs the best, followed by SFT + Outcome-level RL, and then SFT.
*   **Error Recall:**
    *   SFT (Gray): 66.83%
    *   SFT + Process-level RL (Teal): 64.75%
    *   SFT + Outcome-level RL (Peach): 70.11%
    *   Trend: SFT + Outcome-level RL performs the best, followed by SFT, and then SFT + Process-level RL.
*   **Correct Precision:**
    *   SFT (Gray): 84.94%
    *   SFT + Process-level RL (Teal): 90.28%
    *   SFT + Outcome-level RL (Peach): 87.85%
    *   Trend: SFT + Process-level RL performs the best, followed by SFT + Outcome-level RL, and then SFT.

**Right Chart (Self-correction Metrics):**

*   **Incorrect to Correct:**
    *   SFT (Gray): 6.52%
    *   SFT + Process-level RL (Teal): 12.22%
    *   SFT + Outcome-level RL (Peach): 13.64%
    *   Trend: SFT + Outcome-level RL performs the best, followed by SFT + Process-level RL, and then SFT.
*   **Correct to Incorrect:**
    *   SFT (Gray): 1.96%
    *   SFT + Process-level RL (Teal): 1.46%
    *   SFT + Outcome-level RL (Peach): 0.97%
    *   Trend: SFT performs the worst, followed by SFT + Process-level RL, and then SFT + Outcome-level RL.

### Key Observations

*   SFT + Process-level RL generally performs better than SFT and SFT + Outcome-level RL in self-verification metrics, especially in Verification Accuracy and Correct Precision.
*   SFT + Outcome-level RL shows the best performance in Error Recall and Incorrect to Correct metrics.
*   All models have a low percentage of "Correct to Incorrect" transitions.

### Interpretation

The charts suggest that incorporating reinforcement learning (RL) into the SFT model, either at the process level or outcome level, generally improves performance in both self-verification and self-correction tasks. The specific type of RL (process-level vs. outcome-level) seems to have a varying impact depending on the metric being evaluated. SFT + Process-level RL excels in accuracy and precision, while SFT + Outcome-level RL is better at recalling errors and correcting incorrect answers. The low "Correct to Incorrect" values across all models indicate a relatively stable performance in maintaining correct answers. The base SFT model consistently underperforms compared to the RL-enhanced models, highlighting the benefits of incorporating RL techniques for these tasks.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

ebc402146d1240392ebb4fff

FOUND IN PAPERS

EXPERT: gemini-2.0-flash VERSION 1