## Line Chart: Filtering based on Process vs. Outcome
### Overview
The image is a line chart comparing the accuracy (%) of different filtering methods (Process-based and Outcome-based) against the number of beams used. A horizontal line represents the accuracy of "LLM-as-a-judge". The generator used is Llama-3.2-3B-Instruct.
### Components/Axes
* **Title:** Filtering based on Process vs. Outcome
* **Subtitle:** Generator: Llama-3.2-3B-Instruct
* **Y-axis:**
* Label: Accuracy (%)
* Scale: 56 to 68, with tick marks at every 2 units (56, 58, 60, 62, 64, 66, 68)
* **X-axis:**
* Label: Number of beams
* Scale: 2<sup>0</sup>, 2<sup>1</sup>, 2<sup>2</sup>, 2<sup>3</sup>, 2<sup>4</sup>
* **Legend:** Located at the bottom of the chart.
* Process-based (ours): Orange line with star markers.
* Outcome-based (GenRM): Yellow-orange line with circle markers.
* LLM-as-a-judge: Dashed teal line.
### Detailed Analysis
* **Process-based (ours):** (Orange line with star markers)
* Trend: Initially relatively flat, then increases significantly with the number of beams.
* Data Points:
* 2<sup>0</sup>: Approximately 61%
* 2<sup>1</sup>: Approximately 61%
* 2<sup>2</sup>: Approximately 64%
* 2<sup>3</sup>: Approximately 66%
* 2<sup>4</sup>: Approximately 68%
* **Outcome-based (GenRM):** (Yellow-orange line with circle markers)
* Trend: Decreases initially, then increases.
* Data Points:
* 2<sup>0</sup>: Approximately 58%
* 2<sup>1</sup>: Approximately 58%
* 2<sup>2</sup>: Approximately 56%
* 2<sup>3</sup>: Approximately 57%
* 2<sup>4</sup>: Approximately 59%
* **LLM-as-a-judge:** (Dashed teal line)
* Trend: Constant.
* Value: Approximately 62%
### Key Observations
* The Process-based method shows a significant improvement in accuracy as the number of beams increases.
* The Outcome-based method has lower accuracy compared to the Process-based method and LLM-as-a-judge.
* The accuracy of LLM-as-a-judge remains constant regardless of the number of beams.
* At 2<sup>4</sup> beams, the Process-based method achieves the highest accuracy.
### Interpretation
The chart suggests that the Process-based filtering method is more effective than the Outcome-based method, especially when using a higher number of beams. The LLM-as-a-judge provides a baseline accuracy, which the Process-based method eventually surpasses. The Outcome-based method's performance is consistently lower and less sensitive to the number of beams. The data indicates that focusing on the process of filtering, rather than just the outcome, leads to better accuracy in this context, particularly when combined with beam search.