## [Diagram]: Comparison of Three Process Reward Model (PRM) Frameworks for Mathematical Reasoning
### Overview
The image is a technical diagram comparing three different architectural approaches for evaluating the correctness of mathematical solution steps. It is divided into three distinct panels, each illustrating a different framework: "Traditional PRM" (left), "Two-stage Retrieval-enhanced Mechanism" (center), and "Retrieval PRM Framework" (right). The diagram uses flowcharts, text boxes, arrows, and icons to explain the workflow and components of each system.
### Components/Axes
The diagram is organized into three vertical panels, each with a title bar at the top.
**Panel 1: Traditional PRM (Left)**
* **Title:** "Traditional PRM"
* **Components (Top to Bottom):**
1. **System Prompt (Green Box):** "I want you to act as a math teacher. I will provide a mathematical question and several solution steps, and it will be your job to judge whether these steps are correct or not."
2. **Target Question (Oval):** "How many seconds are in 5.5 minutes?"
3. **Solution Steps (Three Rectangles in Sequence):**
* "Step 1 : 5.5 minutes is the same as 5 minutes and 0.5 minutes."
* "Step 2 : Since there are 60 seconds in a minute, then there are 300 seconds in 5 minutes."
* "Step 3 : And since there are 60 seconds in a minute, there are 50 seconds in 0.5 minutes."
4. **Target Step (Oval):** "Is that step correct?" (Pointing to Step 3)
5. **Judgment Output:** A sad face emoji (😞) next to two boxes: "Yes" with a value of "0.9" and "No" with a value of "0.1".
**Panel 2: Two-stage Retrieval-enhanced Mechanism (Center)**
* **Title:** "Two-stage Retrieval-enhanced Mechanism"
* **Components:**
1. **Input:** "Target Question" labeled with a red "Q" pointing to a "Question Pool" database icon.
2. **Question Retrieval:** The Question Pool connects to multiple document icons, representing retrieved similar questions.
3. **Reference Question Box (Yellow):** Contains an example.
* **Header:** "Reference Question 1: What is the equivalent number of seconds in 7.8 minutes?"
* **Process:** "Since there are 60 seconds in a minute, we can find the number of seconds by multiplying the number of minutes by 60. (+) So, 7.8 minutes is equal to 7.8 * 60 = 46 seconds. The answer is: 46 (-)"
* **Note:** "Reference Question 2: Process: ..."
4. **Step Retrieval:** "Target Step" labeled with a red "S" points to a "New Step Pool" database icon.
5. **Reference Step Cloud (Light Blue):** Contains an example.
* "Reference Step 1: 0.3 hours equal to 0.3 * 60 = 18 minutes. This reference step is correct."
* "Reference Step 2: ..."
**Panel 3: Retrieval PRM Framework (Right)**
* **Title:** "Retrieval PRM Framework"
* **Components:**
1. **System Prompt (Green Box):** "I want you to act as a math teacher. I will ... judge whether these steps are correct or not. **First I will give you some similar questions and their steps for reference. For each step, if the step is correct, the step is labeled as +. If the step is wrong, the step is labeled as -. If there is no relevant or helpful information in the provided questions and steps, try to answer yourself.**"
2. **Reference Flow:**
* "Reference Question 1" (Yellow Box) points to "Reference Question 2" (Yellow Box).
* "Reference Question 1" also points down to "Step 1 : 5.5 minutes is the same as 5 minutes and 0.5 minutes." (White Box).
* "Reference Question 2" points down to the "Target Question" oval: "How many seconds are in 5.5 minutes?"
3. **Step Evaluation Flow:**
* The "Step 1" box points to "Step 2 : Since there are 60 seconds in a minute, then there are 300 seconds in 5 minutes." (White Box).
* "Step 2" points to "Step 3 : And since there are 60 seconds in a minute, there are 50 seconds in 0.5 minutes." (White Box).
* A red text note states: "I will give you some steps for reference".
* Two light blue clouds labeled "Reference Step2" and "Reference Step1" point to a decision box.
4. **Judgment:** The decision box "Is the target step correct?" (referring to Step 3) leads to:
* "Yes" with a value of "0.2"
* "No" with a value of "0.8"
* A happy face emoji (😊) is shown next to the "No" outcome.
### Detailed Analysis
* **Traditional PRM Flow:** A linear process. A system prompt defines the task. A target question is presented with its solution steps. The model must judge a specific target step (Step 3) in isolation. The output is a probability distribution (Yes: 0.9, No: 0.1), with a sad emoji suggesting an incorrect or low-confidence judgment for the given example.
* **Two-stage Retrieval Flow:** A parallel, retrieval-based process. It first retrieves similar questions from a pool based on the target question (Q). It then retrieves similar steps from a separate pool based on the target step (S). These retrieved items ("Reference Question" and "Reference Step") are provided as context, containing their own processes and correctness labels (+/-).
* **Retrieval PRM Framework Flow:** An integrated process that combines retrieval and judgment. The system prompt is modified to instruct the model to use provided references. It shows a chain where reference questions and their steps are provided alongside the target question and its steps. The model is explicitly given "Reference Steps" to aid in judging the target step. The output probability (Yes: 0.2, No: 0.8) with a happy emoji suggests a more confident and correct judgment ("No") for the same target step (Step 3) compared to the Traditional PRM.
### Key Observations
1. **Evolution of Context:** The core difference is the amount of contextual information provided to the judge. Traditional PRM uses none, the Two-stage Mechanism retrieves it separately, and the Retrieval PRM Framework integrates it directly into the prompt.
2. **Judgment Confidence:** For the identical target step ("Step 3: ... there are 50 seconds in 0.5 minutes"), the Traditional PRM assigns a high probability (0.9) to "Yes" (incorrect), while the Retrieval PRM Framework assigns a high probability (0.8) to "No" (correct). This visually demonstrates the claimed improvement of the retrieval-enhanced approach.
3. **Prompt Engineering:** The system prompt in the Retrieval PRM Framework is significantly more detailed, explicitly instructing the model on how to use the provided reference examples and their labels (+/-).
4. **Visual Cues:** The use of emojis (😞 vs. 😊) provides an immediate, non-numerical indicator of the perceived quality or correctness of the model's output in each framework.
### Interpretation
This diagram argues for the superiority of retrieval-augmented methods in training Process Reward Models for mathematical reasoning. The **Traditional PRM** is depicted as limited, making judgments in a vacuum, which can lead to confident errors (as shown by the high "Yes" probability for a wrong step). The **Two-stage Retrieval-enhanced Mechanism** introduces the concept of gathering relevant external knowledge (similar questions and steps) but presents it as a separate, preparatory stage.
The **Retrieval PRM Framework** is presented as the most advanced solution. It seamlessly integrates the retrieved references into the model's context window, transforming the task from pure judgment to **judgment-by-analogy**. The model is no longer just a math teacher but a teacher with a textbook of solved examples open in front of them. The dramatic shift in the probability distribution for the same step (from 0.9/0.1 to 0.2/0.8) is the central piece of evidence, suggesting that providing relevant, labeled reference cases significantly improves the model's ability to discern correct reasoning steps. The diagram implies that this approach leads to more reliable and accurate reward signals for training reasoning models.