## Diagram: Model Comparison for Multimodal Question Answering
### Overview
The image presents a comparative diagram illustrating three different reward models for multimodal question answering, specifically focusing on mathematical questions. The models are labeled as (a) Outcome Reward Model (ORM), (b) Multimodal Process Reward Model (PRM), and (c) GM-PRM (the authors' proposed model). The diagram highlights the flow of information, the reward mechanisms, and the key differences in their approaches.
### Components/Axes
The diagram is structured into three horizontal sections, each representing a different model. Each section contains the following components:
* **Input:** Multimodal Math Questions or a single Question.
* **Process:** A series of steps (Step 1, Step 2, ... Step T) leading to an Answer.
* **Model Representation:** A stylized robot icon representing the reward model.
* **Reward:** An arrow indicating the reward signal.
* **Annotations:** Text labels describing the model's characteristics and limitations.
### Detailed Analysis or Content Details
**(a) Outcome Reward Model (ORM)**
* **Input:** A single question represented by the equation "E=mc²" within a rectangular box.
* **Process:** The question flows directly to an "Answer" box.
* **Model:** A robot icon labeled "ORM".
* **Reward:** A yellow arrow labeled "Reward" points from the "Answer" box to the "ORM".
* **Annotation:** "ONLY Reward Final Ans" is written next to the reward arrow.
**(b) Multimodal Process Reward Model (PRM)**
* **Input:** Multiple multimodal math questions are shown within rectangular boxes: "√4", "(x+y)²", "πr²/π", "m/v".
* **Process:** The questions flow through a series of "Step 1", "Step 2", ... "Step T" boxes, culminating in an "Answer" box. The steps are connected by purple arrows.
* **Model:** A robot icon labeled "PRM".
* **Reward:** A yellow arrow labeled "Reward" points from the "Answer" box to the "PRM".
* **Annotations:** "Limited Explainability" and "No Correction Mechanism" are written to the right of the "PRM".
**(c) GM-PRM (Ours)**
* **Input:** Similar to (b), multiple multimodal math questions are shown: "√4", "(x+y)²", "πr²/π", "m/v".
* **Process:** The questions flow through a series of "Step 1", "Step 2", ... "Step T" boxes. A dashed red arrow labeled "After Correction" indicates a feedback loop from a later step to an earlier step, suggesting a correction mechanism.
* **Model:** A robot icon labeled "GM-PRM" with a refined brain (BoN) inside.
* **Reward:** A yellow arrow labeled "Reward" points from the "Answer" box to the "GM-PRM".
* **Annotations:** "Refined & Corrected Version" is written to the right of the "GM-PRM".
* **Bottom Legend:** A colored legend is present:
* **Step Intent:** Purple
* **Image Alignment:** Teal
* **Reasoning Logic:** Orange
### Key Observations
* The ORM is the simplest model, only rewarding the final answer.
* The PRM introduces a process reward but lacks explainability and a correction mechanism.
* The GM-PRM builds upon the PRM by adding a correction mechanism and incorporating analysis and judgement based on step intent, image alignment, and reasoning logic.
* The dashed red arrow in the GM-PRM indicates a key difference: the ability to revisit and correct earlier steps.
* The legend at the bottom suggests that the GM-PRM utilizes these three components (Step Intent, Image Alignment, Reasoning Logic) during the process.
### Interpretation
The diagram illustrates the evolution of reward models for multimodal question answering. The ORM represents a basic approach, while the PRM attempts to improve upon it by rewarding intermediate steps. However, the GM-PRM, proposed by the authors, addresses the limitations of the PRM by incorporating a correction mechanism and leveraging step intent, image alignment, and reasoning logic. This suggests that a more nuanced approach, which considers the entire reasoning process and allows for self-correction, is crucial for achieving better performance in multimodal question answering tasks. The use of color-coding in the GM-PRM section highlights the importance of these three components in the model's decision-making process. The diagram effectively communicates the advantages of the GM-PRM over its predecessors, positioning it as a more sophisticated and effective solution.