## Chart: Performance Metrics vs. Draft Length
### Overview
The image presents a series of line charts comparing the performance of different models (Llama 2-7B, Llama 2-13B, Llama 2-Chat-7B, and Dolly) across four metrics: block efficiency, MBSU, token rate, and accuracy. The x-axis represents the draft length, ranging from 2 to 5. Four different methods (SD, SpecTr, RSD-C (ours), and RSD-S (ours)) are compared for each model and metric.
### Components/Axes
* **Rows:** Each row represents a different model and summarization type combination. The models are Llama 2-7B, Llama 2-13B, Llama 2-Chat-7B, and Dolly. The summarization types are WMT and XSum.
* **Columns:** Each column represents a different performance metric: block efficiency, MBSU (Modified Branching Score Unit), token rate, and accuracy.
* **X-axis:** Draft length, ranging from 2 to 5.
* **Y-axis:** The y-axis scales vary for each metric.
* Block efficiency: Ranges from approximately 1.6 to 4.2.
* MBSU: Ranges from approximately 1.5 to 4.0.
* Token rate: Ranges from approximately 0.9 to 2.0.
* Accuracy: Ranges from approximately 0.7 to 1.3.
* **Legend:** Located at the bottom of the image.
* Solid line with circles: RSD-S (ours) (Blue)
* Dashed line with pluses: SpecTr (Red)
* Dotted line with triangles: SD (Orange)
* Dash-dot line with diamonds: RSD-C (ours) (Green)
### Detailed Analysis
**Llama 2-7B**
* **WMT**
* Block efficiency: All methods show an upward trend with increasing draft length. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.8, SpecTr ~2.1, RSD-C ~2.2, RSD-S ~2.3
* Draft Length 5: SD ~2.0, SpecTr ~2.2, RSD-C ~2.5, RSD-S ~2.7
* MBSU: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.8, SpecTr ~1.9, RSD-C ~2.0, RSD-S ~2.1
* Draft Length 5: SD ~2.0, SpecTr ~2.1, RSD-C ~2.4, RSD-S ~2.5
* Token rate: RSD-S (blue) and RSD-C (green) are relatively stable. SpecTr (red) and SD (orange) decrease slightly with increasing draft length.
* Draft Length 2: SD ~1.3, SpecTr ~1.2, RSD-C ~1.2, RSD-S ~1.3
* Draft Length 5: SD ~1.1, SpecTr ~1.1, RSD-C ~1.2, RSD-S ~1.3
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
* **XSum**
* Block efficiency: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~2.8, SpecTr ~3.0, RSD-C ~3.1, RSD-S ~3.2
* Draft Length 5: SD ~3.1, SpecTr ~3.1, RSD-C ~3.6, RSD-S ~4.2
* MBSU: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~2.6, SpecTr ~2.8, RSD-C ~2.9, RSD-S ~3.0
* Draft Length 5: SD ~2.8, SpecTr ~2.9, RSD-C ~3.4, RSD-S ~4.0
* Token rate: RSD-S (blue) and RSD-C (green) are relatively stable. SpecTr (red) and SD (orange) decrease slightly with increasing draft length.
* Draft Length 2: SD ~1.6, SpecTr ~1.7, RSD-C ~1.7, RSD-S ~1.9
* Draft Length 5: SD ~1.3, SpecTr ~1.4, RSD-C ~1.6, RSD-S ~1.9
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
**Llama 2-13B**
* **WMT**
* Block efficiency: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.8, SpecTr ~2.1, RSD-C ~2.2, RSD-S ~2.3
* Draft Length 5: SD ~2.0, SpecTr ~2.2, RSD-C ~2.5, RSD-S ~2.7
* MBSU: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.9, SpecTr ~2.0, RSD-C ~2.1, RSD-S ~2.2
* Draft Length 5: SD ~2.1, SpecTr ~2.2, RSD-C ~2.5, RSD-S ~2.7
* Token rate: RSD-S (blue) and RSD-C (green) are relatively stable. SpecTr (red) and SD (orange) decrease slightly with increasing draft length.
* Draft Length 2: SD ~1.3, SpecTr ~1.2, RSD-C ~1.2, RSD-S ~1.4
* Draft Length 5: SD ~1.1, SpecTr ~1.1, RSD-C ~1.2, RSD-S ~1.4
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
* **XSum**
* Block efficiency: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~2.8, SpecTr ~3.0, RSD-C ~3.1, RSD-S ~3.2
* Draft Length 5: SD ~3.0, SpecTr ~3.1, RSD-C ~3.6, RSD-S ~4.2
* MBSU: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~2.7, SpecTr ~2.8, RSD-C ~2.9, RSD-S ~3.1
* Draft Length 5: SD ~2.9, SpecTr ~2.9, RSD-C ~3.4, RSD-S ~3.9
* Token rate: RSD-S (blue) and RSD-C (green) are relatively stable. SpecTr (red) and SD (orange) decrease slightly with increasing draft length.
* Draft Length 2: SD ~1.7, SpecTr ~1.7, RSD-C ~1.7, RSD-S ~2.0
* Draft Length 5: SD ~1.3, SpecTr ~1.4, RSD-C ~1.6, RSD-S ~1.9
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
**Llama 2-Chat-7B**
* **WMT**
* Block efficiency: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.8, SpecTr ~2.0, RSD-C ~2.1, RSD-S ~2.2
* Draft Length 5: SD ~2.0, SpecTr ~2.1, RSD-C ~2.4, RSD-S ~2.7
* MBSU: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.9, SpecTr ~1.9, RSD-C ~2.0, RSD-S ~2.1
* Draft Length 5: SD ~2.0, SpecTr ~2.1, RSD-C ~2.4, RSD-S ~2.5
* Token rate: RSD-S (blue) and RSD-C (green) are relatively stable. SpecTr (red) and SD (orange) decrease slightly with increasing draft length.
* Draft Length 2: SD ~1.3, SpecTr ~1.1, RSD-C ~1.1, RSD-S ~1.3
* Draft Length 5: SD ~0.9, SpecTr ~0.9, RSD-C ~1.1, RSD-S ~1.3
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
* **XSum**
* Block efficiency: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~2.6, SpecTr ~2.7, RSD-C ~2.8, RSD-S ~3.1
* Draft Length 5: SD ~2.7, SpecTr ~2.8, RSD-C ~3.2, RSD-S ~3.6
* MBSU: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~2.4, SpecTr ~2.4, RSD-C ~2.5, RSD-S ~2.6
* Draft Length 5: SD ~2.5, SpecTr ~2.5, RSD-C ~2.9, RSD-S ~3.2
* Token rate: RSD-S (blue) and RSD-C (green) are relatively stable. SpecTr (red) and SD (orange) decrease slightly with increasing draft length.
* Draft Length 2: SD ~1.6, SpecTr ~1.3, RSD-C ~1.3, RSD-S ~1.6
* Draft Length 5: SD ~1.0, SpecTr ~1.0, RSD-C ~1.3, RSD-S ~1.6
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
**Dolly**
* **WMT**
* Block efficiency: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.6, SpecTr ~1.7, RSD-C ~1.8, RSD-S ~2.2
* Draft Length 5: SD ~1.8, SpecTr ~1.8, RSD-C ~2.0, RSD-S ~2.8
* MBSU: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~1.7, SpecTr ~1.7, RSD-C ~1.8, RSD-S ~2.2
* Draft Length 5: SD ~1.8, SpecTr ~1.8, RSD-C ~2.0, RSD-S ~2.8
* Token rate: RSD-S (blue) and RSD-C (green) are relatively stable. SpecTr (red) and SD (orange) decrease slightly with increasing draft length.
* Draft Length 2: SD ~1.4, SpecTr ~1.3, RSD-C ~1.3, RSD-S ~1.6
* Draft Length 5: SD ~1.0, SpecTr ~1.0, RSD-C ~1.3, RSD-S ~1.6
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
* **XSum**
* Block efficiency: All methods show an upward trend. RSD-S (blue) performs the best, followed by RSD-C (green), SpecTr (red), and SD (orange).
* Draft Length 2: SD ~2.0, SpecTr ~2.6, RSD-C ~2.7, RSD-S ~3.4
* Draft Length 5: SD ~2.0, SpecTr ~2.7, RSD-C ~2.7, RSD-S ~3.4
* MBSU: SD, SpecTr, and RSD-C are relatively stable, while RSD-S increases slightly.
* Draft Length 2: SD ~2.0, SpecTr ~2.4, RSD-C ~2.6, RSD-S ~2.7
* Draft Length 5: SD ~2.0, SpecTr ~2.4, RSD-C ~2.6, RSD-S ~2.7
* Token rate: SD, SpecTr, and RSD-C are relatively stable, while RSD-S increases slightly.
* Draft Length 2: SD ~1.4, SpecTr ~1.4, RSD-C ~1.4, RSD-S ~1.6
* Draft Length 5: SD ~1.3, SpecTr ~1.4, RSD-C ~1.4, RSD-S ~1.6
* Accuracy: All methods maintain a constant accuracy of approximately 1.0 across all draft lengths.
### Key Observations
* **RSD-S (ours)** consistently outperforms the other methods (SD, SpecTr, RSD-C (ours)) in terms of block efficiency and MBSU across all models and summarization types.
* **Accuracy** remains relatively constant across all draft lengths and methods.
* **Token rate** tends to decrease slightly with increasing draft length for SD and SpecTr, while RSD-S and RSD-C remain more stable.
* The performance differences between methods are more pronounced for block efficiency and MBSU than for token rate and accuracy.
* The trends are generally consistent across different models (Llama 2-7B, Llama 2-13B, Llama 2-Chat-7B, and Dolly) and summarization types (WMT and XSum).
### Interpretation
The data suggests that the RSD-S (ours) method is the most effective in improving block efficiency and MBSU compared to the other methods. The consistent accuracy across different draft lengths indicates that increasing the draft length does not negatively impact the quality of the generated summaries. The slight decrease in token rate for SD and SpecTr with increasing draft length may indicate a trade-off between efficiency and the length of the generated summaries.
The consistent trends across different models and summarization types suggest that the observed performance differences are robust and not specific to a particular model or dataset. The RSD-S method appears to be a promising approach for improving the efficiency and quality of text summarization.