# Technical Document Extraction: Process Reward Mechanisms
## Overview
The image illustrates two reward processing mechanisms (Clip Mechanism and Delta Mechanism) applied to a sequence of steps (s₁ to s₄) derived from a question (q). Key components include process reward graphs, clipped reward graphs, delta reward graphs, and step-by-step annotations.
---
## Key Components
### 1. **Process Reward Graph**
- **Axes**:
- **Y-axis**: "Process Reward" (linear scale from 0 to η).
- **X-axis**: "Step" (labeled s₁ to s₄).
- **Bars**:
- **s₁**: Tallest bar (highest process reward).
- **s₂**: Shortest bar (lowest process reward).
- **s₃**: Medium-height bar.
- **s₄**: Tall bar (second-highest process reward).
- **Annotations**:
- Arrows point from each step (s₁–s₄) to their corresponding bars.
### 2. **Clip Mechanism**
- **Clipped Reward Graph**:
- **Axes**: Same as Process Reward Graph.
- **Bars**:
- **s₂**: Red bar (clipped reward).
- **s₃**: Red bar (clipped reward).
- **Trend**: Clipping occurs for steps with suboptimal rewards (s₂, s₃).
### 3. **Delta Mechanism**
- **Process Reward Graph**:
- **Arrows**:
- **s₁**: Green upward arrow (positive delta).
- **s₂**: Red downward arrow (negative delta).
- **s₃**: Green upward arrow (positive delta).
- **s₄**: Purple "X" (invalid/inapplicable delta).
- **Delta Reward Graph**:
- **Axes**:
- **Y-axis**: "Delta Reward" (linear scale from 0).
- **X-axis**: "Step" (s₁ to s₄).
- **Arrows**:
- **s₁**: Green upward arrow (positive delta).
- **s₂**: Red downward arrow (negative delta).
- **s₃**: Green upward arrow (positive delta).
- **Trend**: Delta rewards reflect changes in process rewards between steps.
### 4. **Legend**
- **Colors**:
- **Green**: Positive delta (reward increase).
- **Red**: Negative delta (reward decrease).
- **Purple**: Invalid/inapplicable delta (marked with "X").
---
## Spatial Grounding & Validation
- **Legend Placement**: Bottom-right corner of the image.
- **Color Consistency**:
- Green arrows in Process Reward Graph match green upward arrows in Delta Reward Graph.
- Red arrows in Process Reward Graph match red downward arrows in Delta Reward Graph.
- Purple "X" in s₄ Process Reward Graph corresponds to invalid delta.
---
## Trends & Data Points
1. **Process Reward**:
- **s₁ > s₄ > s₃ > s₂** (reward magnitude decreases from s₁ to s₂).
2. **Clip Mechanism**:
- Clipping applied to s₂ and s₃ (red bars indicate capped rewards).
3. **Delta Mechanism**:
- **s₁ → s₂**: Reward decrease (red arrow).
- **s₂ → s₃**: Reward increase (green arrow).
- **s₃ → s₄**: Invalid delta (purple "X").
---
## Textual Transcription
- **Question q**: Input prompt for the sequence.
- **Steps**: s₁, s₂, s₃, s₄ (labeled sequentially).
- **Mechanisms**: "Clip Mechanism" and "Delta Mechanism" (bolded labels).
---
## Conclusion
The image demonstrates how process rewards are adjusted via clipping (removing suboptimal rewards) and delta adjustments (tracking incremental changes). The Delta Mechanism introduces invalid states (e.g., s₄ marked with "X"), suggesting constraints in reward propagation.