# Technical Document Extraction: Speedup on Different Model Sizes
## 1. Document Overview
This image is a grouped bar chart illustrating the performance improvements (speedup) achieved by using "Medusa-2" compared to a baseline without it ("w/o Medusa") across four different Large Language Model (LLM) configurations.
## 2. Component Isolation
### Header
* **Title:** Speedup on different model sizes
### Main Chart Area
* **Y-Axis Label:** Tokens per Second
* **Y-Axis Markers:** 0, 20, 40, 60, 80, 100, 120
* **X-Axis Label:** Model Size
* **X-Axis Categories:** Vicuna-7B, Zephyr-7B, Vicuna-13B, Vicuna-33B
* **Legend:**
* **Blue Bar:** w/o Medusa
* **Orange Bar:** Medusa-2
## 3. Data Extraction and Trend Verification
### Trend Analysis
Across all four model sizes, the baseline performance ("w/o Medusa") decreases as the model size increases (from 7B to 33B). Conversely, the "Medusa-2" configuration consistently and significantly outperforms the baseline in every category. The relative speedup (annotated above the orange bars) ranges from 2.35x to 2.83x.
### Data Table (Reconstructed)
| Model Size | w/o Medusa (Tokens/sec) | Medusa-2 (Tokens/sec) | Speedup Factor (Annotated) |
| :--- | :---: | :---: | :---: |
| **Vicuna-7B** | ~45 | ~128 | 2.83x |
| **Zephyr-7B** | ~41 | ~109 | 2.66x |
| **Vicuna-13B** | ~35 | ~98 | 2.83x |
| **Vicuna-33B** | ~18 | ~42 | 2.35x |
## 4. Detailed Observations
* **Baseline Performance:** The baseline (blue) shows a clear downward trend as model complexity increases. Vicuna-7B starts at approximately 45 tokens/sec, while the much larger Vicuna-33B drops to under 20 tokens/sec.
* **Medusa-2 Performance:** The Medusa-2 (orange) enhancement maintains a much higher throughput. Even for the largest model (Vicuna-33B), Medusa-2 achieves a throughput (~42 tokens/sec) nearly equal to the baseline performance of the smallest model (Vicuna-7B at ~45 tokens/sec).
* **Peak Speedup:** The highest relative performance gains are seen in the Vicuna-7B and Vicuna-13B models, both achieving a **2.83x** increase in tokens per second.
* **Visual Style:** The chart uses a clean, white-grid background with sans-serif typography. The bars are grouped by model size to facilitate direct comparison between the two states (with and without Medusa).